05 Jan 2018
What Healthcare Professionals Must Know to Make Their ETL Process a Success
Just like most other verticals these days, the different areas of Healthcare are firmly reliant on technology and increasingly so. Extract, Transform, Load (ETL) is an indispensable necessity for most Healthcare institutions in the age when data is beginning to shape up the future of most businesses, middle-sized or large.
However, the controversy that can thwart your ETL-related efforts and plagues the vast majority of Healthcare institutions is that they are simply in a different business to be able to have the amount and quality of resources, required to handle the full cycle of ETL, - a complex and, often, technically intricate process.
What does it take a Healthcare institution to harness the ETL technology the right way, derive insights from their data, stay fully compliant with the numerous regulations and not botch the whole thing up? What is the best way of approaching the issue? What should Healthcare decision-makers be aware of prior to the kick-off?
Here are some tips from our ETL-conversant Healthcare team.
Scale Up Your ETL Implementation Plans from Day One
When it comes to business, data as a business tool is a mercurial entity, influenced by many factors and prone to the often unexpected changes caused by them. Any change in a regulation can trigger the need to double up the amount of data you need to process. While it is, usually, quite difficult to auger and factor in all the future contingencies, make sure you have some reserve resources you will be able to bring into play in case of need.
Ensure All Parts of the ETL Process Are in Place. Enlist Professional Help, If Required
For ETL to work out, each and every aspect of the ETL process must be present, well-taken care of and well-tuned. Unless you are 100% certain that the team you have at your disposal are capable of ensuring the following, you should mull over entrusting the whole thing to an external provider:
- Naming conventions
- Error handling
- Failure and Recovery Process.
- Reusability of the code.
Ensure that Your ETL Team Has Ample, Enterprise-Wide Access to Your Data
No matter whether you’ve decided to use your own IT experts, or opted for an external provider, an ample view of all your data assets across your enterprise would be instrumental for them in succeeding. You must ensure that the members of your ETL have untrammeled access to all aspects of your data and data sources and are informed of all the potentially beneficial (from the medical viewpoint) data integrations that can be performed.
A failure to ensure this approach may result not only in missing out on an important part of your data-driven analytics, but also in the need to make a lot of amends at a much later stage. In this sense, as long-time fans of Agile, we can see significant benefits in applying some of the Agile techniques in order to achieve the above.
Your ability to ensure Enterprise-wide access to your data is central to creating a viable and efficient Data Architecture. Such an architecture is, initially, created by the ETL project team as a logical data flow and integration model based on their interactions with your business stakeholders. Thus, make sure these folks talk and hear one another out. Create an atmosphere of cooperation and don’t be frugal on time.
Don’t Skimp on the Data-Profiling Stage
The quality of data varies quite bit across the different data sources in about any business organization. Moreover, while these sources themselves can be of utmost importance to your ETL, the quality of the data they contain may prevent its further use in the process.
In this case, it is not only expedient, but also advisable to spend more time and effort during this stage. You may want to:
- Identify the level of data quality that befits your goals and define a corresponding quality standard.
- Use additional ETL methods, or, rather, see to it that your ETL provider uses some additional ETL methods and/or tools such as, for example, algorithm customization and Curve Fitting Toolbox in order to purge your data of the noise.
- Look for additional data sources that will help improve the problem data sources.
Make Your Data Transformations Reusable Wherever Possible
In ETL, data transformations are steps in a process and they play the same role as bricks play in a masonry wall. Making the shape and size of these “bricks” uniform makes them easily reusable and, correspondingly, suitable for several mappings. This both facilitates “the bricklaying” and makes the ETL process a lot more cost-effective as compared with “the cobblestone masonry”.
However, your ETL team is unlikely to be able to do much in this respect unless you are cooperative enough and lend them a hand. What can you, actually, do to help them ensure reusability and, thus, spare you some of the costs?
- First and foremost, you must provide your ETL team with a detailed and easily readable specification that provides an ample view of your organization’s data logic. The specification must include detailed field descriptions, indicating the source of the data, i.e. the source vocabularies and custom codes used. For the latter, a reference file, which can be mapped to a standard vocabulary, should also be provided.
- It would be extremely important to ensure that all the data you furnish your ETL team with is unambiguous, i.e. each table and column must represent precisely a corresponding logical and semantic entity. You should also indicate the source of the data and how it was harvested.
- The format of the data you provide and that of the data used in the reference tables (including any updates and refreshes) must be the same.
- Although ETL converts variegated data to a standard format, this conversion can still be geared toward some specific business goal.
For example, you may give priority to time distribution rather than want to focus on quality. This can be achieved if your make your preferences known to your ETL team in advance, clearly stating your preferences as project requirements.
- And, certainly, it is Agile that can add value to your ETL project and it is your Healthcare team that must, incidentally, show agility and be easy to reach any time your ETL team needs guidance with regards to your data, data sources or resulting preferences.
Choose an ETL Team that Has All the Qualifications the Healthcare ETL Process Takes and Even More
Even if you have some good IT professionals on board your Healthcare institution, it may be hard for them to help you choose a qualified and skilled ETL provider.
Are there any traits in an ETL provider (for example, a good command of some specific technology stack) that can ensure, or, at least, promote the success of your ETL project and should, thus, be sought by you in your provider candidates?
From our experience, there are several ETL must-haves and good-to-haves that, given the need and circumstances, could add more value to your project, safeguard you against some of the risks and signal an overall high level of your future team's qualifications.
Let's start with the generic ETL must-have skills that we would not expect to be much different for any business domain other than Healthcare:
- Experience with various ETL tool sets (SSIS, Amazon, HADOOP, Cloudera and so on) and with some of the presently more popular .Net languages (SPARK, Impala).
- A good grasp of modeling tools, such as Toad Data Modeler, Erwin and Embarcadero.
- A good grasp of Relational databases (SQL Server).
- Previous experience of delivering projects to various clients that could serve as proof of the team's ability to hear a client out and defer to their needs and wishes.
- Creativity and the ability to explain any project-related issues to the client in a simple and easily understandable manner.
In principle, the above skills and traits can ensure that the deliverables you receive are up to par and your cooperation with your ETL provider is otherwise fruitful and rewarding. However, as a Healthcare ETL-savvy team we would also add several more requirements.
In our reckoning, the ability of an ETL team to meet those can not only facilitate the Extract, Transform, Load process, but it can also improve the quality of the deliverables very significantly as far as Healthcare-related ETL is concerned.
What are these requirements?
Moreover, an ETL team in possession of medical knowledge can, immediately and without asking for external assistance, use this knowledge in identifying overlaps for data standardization purposes, thus achieving a much higher processing speed and precision.
- Secondly, ETL developers, engaged in Healthcare projects, must be experienced in using at least one of the standardized vocabularies, for example LOINC, CPT4, SNOMED, or RxNorm. Familiarity with such vocabularies' hierarchical capabilities allows them to achieve a much higher quality of data transformation.