True Healthcare Data Integration: Leaving your Data Lake Behind

By Mike NoshayPublished On: March 12th, 2020

According to a recent report from Gartner, “U.S. Healthcare Payer CIOs Should Avoid Data Lake Mistakes with Clinical Data Integration,” by Mandi Bishop (October 2019), “Payer CIOs want to know whether data lakes can deliver quick wins for business leaders hungry to derive actionable insights from unstructured and nonstandard clinical data. To avoid drowning in data, CIOs must first specify goals and critically compare internal capabilities to vendor solutions.”

We agree with this assessment and believe that the same holds true for healthcare organizations as well. Following, we outline why the data lake method can no longer hold up in today’s healthcare environment, and offer as an alternative a four-step, interconnected framework that can help you make the most of your data.

1. Start with the End in Mind
The first challenge many organizations come face-to-face with when looking at a data lake is the realization that they don’t have confidence in the quality and completeness of all possible data for all possible use cases or scenarios. It’s important to think about what use cases you will want to focus on at the outset based on what’s most important to your stakeholders (whether that means the C-suite, care providers, or patients). Determine what data you are likely to need and for what purpose:

As a payer, are you focused on use cases that support population health or HEDIS^®1measures?
As an ACO, are you focused on information that will help you in risk mitigation and Medicare Shared Savings Programs?
As a hospital, are you focused on value-based care programs such as improving STAR ratings and on reducing preventable hospital readmissions?
As a healthcare system, are you focused on MIPs/MACRA and use cases to improve quality of care?

The best way to set up your data thoughtfully and strategically is to think about your end goals when you first receive the data and work backwards from there.

Data that has been ”loaded” or “integrated” into a data lake provides the illusion of an asset that you can use quickly with a high degree of confidence. Many organizations start with a data lake and assume that “someone else” – a data scientist, perhaps – will be the one sifting through the information later to find what they need for any given use case. This type of postprocessing or late-binding data science is a never-ending cycle of data quality that is both costly and potentially insurmountable given organizational resource constraints.

Click here to read the full article on Healthcare IT Today

¹HEDIS^® is a registered trademark of the National Committee for Quality Assurance (NCQA).

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
AWSELB	session	Associated with Amazon Web Services and created by Elastic Load Balancing, AWSELB cookie is used to manage sticky sessions across production servers.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
_cfuvid	session	This cookie from ZoomInfo is a part of the services provided by Cloudflare: including load-balancing, deliverance of website content and serving DNS connection for website operators.
_twitter_sess	session	This cookie is for user sessions.
csrf-token	1 day	A CSRF token is a unique, secret, and unpredictable value that is generated by the server-side application and shared with the client.
ct0	13 months	This cookie from Twitter is for authentication.
guest_id	10 months	This cookie from Twitter is for authentication.
kdt	8 months	This cookie is to authenticate a known device
SF_PHPSESSID	session	PHP session cookie because JazzHR uses PHP
twid	1 Year	This cookie from Twitter is for authentication.
visitorId	1 year	Preserves users states across page requests, by ZoomInfo

Cookie	Duration	Description
_gaexp	2 months 9 days 3 hours	Google Analytics installs this cookie to determine a user's inclusion in an experiment and the expiry of experiments a user has been included in.
AWSELBCORS	2 hours	This cookie is used by Elastic Load Balancing from Amazon Web Services to effectively balance load on the servers.

Cookie	Duration	Description
__hssc	30 minutes	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
__hstc	1 year 24 days	This is the main cookie set by Hubspot, for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_##	9 months	This cookie is for Google Analytics from Twitter.
_gat_gtag_UA_129903649_1	1 minute	Set by Google to distinguish users.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
external_referrer_url	1 day	JazzHR registers how the user has reached the website to enable pay-out of referral commission fees to partners.
GA4: ga_MYGQVQNM1J	2 years	This cookie is installed by Google Analytics.
hubspotutk	1 year 24 days	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.
internal_navigation_count	1 day	Set from the JazzHR careers widget.
trackalyzer	1 year	Leadlander sets this cookie to analyse the website visitors and monitor traffic patterns.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
ads_prefs	10 months	This cookie from Twitter is to opt-out of ads tracking.
DEVICE_INFO	5 months 27 days	Used by YouTube to track user’s interaction with embedded content.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
personalization_id	10 months	This cookie tracks activities on and off Twitter for a personalized experience.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

True Healthcare Data Integration: Leaving your Data Lake Behind

Content Types

Published on