Democratize Your EHR Data with a Data Lake

Your EHR system contains a wealth of data about your patient community. This data, collected across multiple points of care during the healthcare delivery process, is structured for episodic care, for very good reasons.    

EHR data access is generally not very open, with limited access to the data or API layers. Additionally, many clinical documents are scanned and stored as PDF documents that are not machine readable.

However, imagine if you could release and access that data in a safe and secure way. You could aggregate the data to gain important, actionable insights about your patient community and sub-cohorts of patients by condition or demographic. The value compounds further when you combine data from structured data sources (EHRs), pharmacy data (API or batch), clinical notes and reports  (PDFs), and personal health device data (Streams) into a single repository. Many healthcare organizations are finding that a data lake is a key component in unlocking their data’s hidden value.

Many healthcare organizations are finding that a data lake is a key component in unlocking their data’s hidden value.

What is a data lake?

A data lake stores all your structured and unstructured data in a single, centralized repository. Establishing a data lake is a great strategy to combine different data sets from unique, disconnected systems so that you can explore your data without the constraints of complex or proprietary data models. Data lakes can scale to handle data of any size or type.  Though not required, implementing some level of enrichment, standardization and transformation to the either at time of data load, or at time of access, can go a long way.

So how can you use a data lake to start unlocking those insights — insights that could be transformative – and decide how “raw” your data should truly be?

How to start with a health data lake?

From our experience,  we suggest that you do not attempt to design and implement the ultimate data lake solution in one go. Rather, start small with a flexible solution that can immediately deliver meaningful, useful data to your stakeholders. Having the agility to quickly respond to new data sources and new analytics requests can prove your solution’s value and create momentum for future needs.      

Right now, our IMT teams are engaged in multiple initiatives with healthcare delivery and technology providers. While the specifics vary, we’re consistently delivering secure, auditable environments with consolidated and cleansed health data across multiple sources.

Clinicians, researchers, and data scientists are using the resulting insights to:

  • Assess the impact of alternative lower-cost treatments for patients with specific clinical conditions by augmenting data extracted from the EHR with treatment notes previously locked in clinical document notes.
  • Analyze cohort health trends, outcomes, and costs to identify the most appropriate intervention and choose better care management options.
  • Gain a more complete picture about a patient’s medical procedures, medications, and dosages by including details previously locked in clinical notes.
  • Use complete data with their preferred BI tools, while data governance rules ensures they can only access information appropriate to their role.
  • Leverage data analytics functions with actionable intelligence by patient cohort and other groupings.

6 key elements for a successful data lake

Based on our recent projects, we’ve developed a recipe for success. Healthcare organizations that seek to use a data lake to unlock all the insights in their data need six key things:    

  1. A flexible data repository – An AWS Data Lake gives your team the best combination of structure and flexibility. It allows you to collect and store structured data from your applications alongside non-structured data from streams, documents, and data feeds. By combining a variety of data types relating to the same patient in a single place, you increase the breadth and quality of your insights and analysis.
  2. The right tools for the right analytics – Reporting tools are not one size fits all. The good news is that you can leverage different analytic toolkits against the same data lake without adjusting the data model. For example, clinicians and researchers want aggregate dashboard views with summarized statistics, while researchers and data scientists are inclined to dive into the data. AWS has solutions for both needs leveraging services like Amazon QuickSight, AWS SageMaker Studio, and machine learning models designed specifically for healthcare data challenges – no need to choose just one!
  3. A reliable and transparent data cleansing approach – EMR data can be cleansed, transformed, and classified quickly, easily, and iteratively, using AWS Glue and AWS Glue DataBrew. These steps ensure that data adheres to common formats and code sets. This is particularly important when combining data sources. A data catalog then allows multiple stakeholders to understand data provenance and context in a common way.
  4. Trainable text extraction or natural language processing –  A lot of knowledge is locked in unstructured or semi-structured medical text, such as clinical notes, discharge summaries, or referral letters. AWS Comprehend Medical allows you to extract valuable information from these texts, allowing further insight and analytics when combined with the EHR’s structured data.
  5. Cloud deployment for rapid results, impact, and adoption – Cloud applications and data storage can be ramped up in a matter of days, not months. Without the need to request, purchase, and install hardware and networking, you can focus on data integration, cleansing, refining extraction models, and analytics out of the gate. Start small and expand; working with cloud ensures you only pay for what you use. Quickly and easily build experimental solutions with proofs of concept in the cloud.      The economic benefits of cloud-based platforms for custom analytics are unparalleled.
  6. A secure “data island” with AWS Workspaces – You can be sure that your data will be secure by implementing a data island. This means that the environment hosting the data lake imposes strict limits on access to sensitive data, including measures that minimize the possibility of data exfiltration. This is accomplished by limiting data viewing and manipulation to only those facilities that reside in the same environment as the sensitive data – in your AWS Workspace. AWS Workspaces don’t allow egress to public networks or non-essential AWS services and can be further restricted using access controls, certificates, and MFA.

AWS Data Lakes are being adopted in healthcare for other innovative use cases such as  building population health dashboards,      and developing predictive disease models. IMT works closely with AWS architects and specialists to help our clients solve pressing challenges and identify opportunities within their data.

IMT is an AWS Solution Partner in Public Sector and Healthcare and can help you build your own data lake environment or can fully host and manage one for you. Our Cloud and Managed Services organization includes a Cloud Center of Excellence (CCOE). The CCOE is comprised of  AWS-certified architects, engineers, administration, and security specialists with deep experience in health data integration, management, and analytics. IMT’s unique combination of skills and expertise make us the best partner to help you navigate your data lake journey.

Ready to dip a toe into a data lake? Get in touch to learn how a data lake can help you democratize your data from your EMR.

Share