Data governance is fundamentally about the management of tension. A system without any "data tension" is chaotic and can produce no shortage of contractual or regulatory nightmares. Too much tension, however, is a problem of another sort as data assets need to be utilized to produce value, and overly restrictive governance can grind not just progress but baseline support to a halt. Balancing between those extremes is what data governance frameworks and processes attempt to manage.
It is helpful to look at data governance from a higher-level perspective to avoid getting lost in details and other implementation weeds. There are three steps that need to be understood:
After anchoring on a set of functional use cases, get requirements from critical teams such as legal, privacy, compliance, and security (not an exhaustive list). Topics could include requirements for either contractual or regulatory usage on certain sources of data, specific usage of entities within the sources, or different user-profiles and access paths. The list of possibilities is long, and technical/delivery teams need to be a part this conversation to hear and question requirements first-hand. Identifying and centering on vetted functional cases is important because governance doesn't exist in a vacuum, and the collective intent should be identifying what is needed to deliver said cases within appropriate guardrails.
This will involve the same set of people as the prior step, but now the technical team needs to lead a design process on the guardrails, which are informed from the previously identified requirements. Nouns such as "contracts" and "customer permissions" may sound simple at first, but can often lead to a trail of OneDrive directories, Word documents, and emails in terms of authoritative sources—not exactly the kind of things that lead to easy automation. Spend the time researching and documenting the chain of custody on all things informing governance requirements before picking any "governance technologies."
The word "enable" was chosen deliberately to imply the broadest sense of the possible to provide stakeholder access to data with required guardrails. This could require iteration with the previous "Encode The Rules" step.
An expression I've used on data governance and related topics over the years identifying who is "in the boat." The developers, product managers, project managers, and leadership charged with delivering specified functionality are clearly in the delivery boat. If the effort succeeds, all should be recognized. If the effort fails, then all will go under, metaphorically speaking.
On the other hand, having a team or two viewed as generally lobbing cannonballs from the shore with late-breaking requirements and avoiding venturing towards the water out of fear of getting splashed is not conducive to collective success, even if the some of the comments may be on point. Those teams need to be In The Boat as well, and know that they are in the boat. This will likely require executive alignment, but it will be worth it for all parties involved.
The word shape in this context refers to how data is to be enabled to the stakeholders. It is currently trendy to say such things as "our data will be delivered via a governed microservices architecture," but that is a very specific implementation. There is a danger in selecting a mode of delivery too early, in the way that selecting a sports-car as a mode of transportation would be premature before understanding not only what needs to be transported, but how much of it and how often; a more appropriate option might be a van, truck, or a train, so to speak. Understand the targeted use-cases and pay attention to who will be utilizing the data, and how they plan on using it, before selecting enablement architectures. If the stakeholders cannot successfully utilize the enabled data, the effort is for naught.
All data governance approaches require some degree of configurability, but how much is enough? This is why "fine grained controls" can be another potential governance pitfall. An example from healthcare can be illustrative because everyone is a customer of the healthcare sector at some point in their lives and should have personal context, there is a lot of data to govern, and there are regulatory requirements (such as HIPAA). Imagine an electronic medical record system which has fine-grained controls not just for all providers and the datatypes they can access (e.g., demographics, diagnoses, medications, etc.), but also for specific values within each datatype. From a governance standpoint this sounds flexible and fantastic, at least on paper.
But with all due respect to patient privacy of which I value immensely, there are a variety of potential unintended consequences with fine grained controls. Is it in the patient's best interest for a provider to make decisions based on a subset of their record—such as viewing diagnoses but not medications? Medications but not procedures? Lab tests but not medications? Or even more confusingly, perhaps viewing only half of the medications or lab tests without knowing what is missing? Blind spots such as these are the stuff that preventable medical emergencies are made of.
Healthcare is complicated and there are some exceptional cases which may require filtering, but the primary point of this caution is to thoroughly explore the real-life impact of any proposed fine-grained controls. Implement such controls because they are absolutely necessary and needed by the stakeholders, not because they "sound more secure" or seem like an interesting technical challenge to throw in. They not only increase solution complexity, but also operational burden for user configuration, and could wind up burying the very information that was intended to be enabled in the first place.
Also, don't forget about exception access use-cases that could necessarily bypass any proposed fine-grained access mechanisms (such as HIPAA's "Break The Glass" case).
A basic checklist for accountability would include validation that there are logs for data access and utilization for the solution, that the logs contain enough information to be useful, that the logs are retained for required timeframes in a location that is accessible (as opposed to logs being archived on tape and then stored in a salt mine), and last but not least, that the logs are actually being regularly reviewed and analyzed. Organizations tend to spend a lot of time on approval processes for granting access, and not nearly enough on accountability of access for understanding who is actually doing what. Improving organizational capabilities in this area is—all things considered—typically going to pay for better governance dividends than adding a lot of up-front and potentially speculative fine-grained controls. At the very least, comprehensive logging can help inform—with data—which new fine-grained controls might be needed in the future.
Cycling back to the "Encode The Rules" step, configurable governance metadata will regularly need to be re-configured. Customers might become non-customers or change their contractual permissions, for example. These things happen, and there are plenty of other possibilities. The trick is to operationalize this update process to have the fewest number of handoffs with the greatest amount of automation possible. If a process for configuration updates, for example, consists of Person1 getting a phone call from a customer with a status change, who then emails Person2, who then sends a Slack message to Person3, who then creates a Jira ticket which is hopefully picked up by Person4 who happens to be a project manager, who is, in turn, dependent on Person5 do the actual work, except that Person5 is already allocated to other efforts for another month or two, it should hopefully be clear that there are some opportunities for operational improvement and automation. Irrespective of which data-enablement technologies have been utilized downstream, this update process could be a governance Achilles' heel.
In Conclusion
Good luck and good governance. Focus on the basics, they will take you far.
BLOG@CACM Related Posts
HIPAA (US Healthcare Regulations)
Google SRE Concepts
Doug Meil is a software architect in healthcare data management and analytics. He also founded the Cleveland Big Data Meetup in 2010. More of his BLOG@CACM posts can be found at https://www.linkedin.com/pulse/publications-doug-meil
No entries found