DataOps

Steven Andrews
10 min readNov 8, 2020

I tried. I tried really hard.

And yet, I failed.

Wait, don’t leave yet; this mea culpa is not a pathetic cry for compassion. My “failure” does, however, speak volumes about health care in this country. Anyone who comes into any contact with our health care system should be deeply worried by that admission — but at the same time, it should also make them optimistic, since this failure highlights a moment of great opportunity.

Health care changes at a painfully slow rate. The consequences of misdirected change can be devastating to individuals and the general public, so we want to be cautious. But we want to avoid paralysis. Health care will never be on the bleeding edge of technology, but the industry still has much to learn from leaders in the new economy. When I was offered the chance to create a DevOps team at an academic medical center just over a year ago, I viewed the opportunity as a chance to introduce IT best practices from so-called “new modern” corporations to advance progress in health care. To focus on a patient-value orientation, to liberate data from silos in pursuit of maximizing personal health and precision medicine, to modernize antiquated IT systems to enable real-time insights… Given an opportunity to advance patient-centric, data-driven, rapid approaches to providing care, researching new knowledge, and mentoring the next generation of clinicians, I quickly jumped on the DevOps bandwagon and evangelized lean and agile IT best practices, worshiped at the altar of automation, and earnestly tried to advance health care IT and de-ossify its supporting organization structures.

Some of you may have noticed the odd quotes around “new modern” or perhaps bristled at the seeming redundancy of the adjectival juxtaposition. This term is something I have noticed of late popping up with increasing frequency in musings on finance and economics. All too briefly, this term captures the idea that the only corporations that can survive in the new economy and tech-laden world in which we now find ourselves share one driving insight: data are everything and data are the only thing. Infrastructure, products, finance, procurement, human resources — all of these corporate accoutrements are secondary to data. True, those functions do matter. But they will not, and cannot, drive organizations to success in the current economy. Success emerges from fully understanding what your data are and the information contained therein.

To bring some concreteness to this oversimplification, think about hi-tech. Companies like Google, Twitter, or Amazon do not lack for IT infrastructure, but they succeed on their DataOps and not their DevOps. While these economic leaders have led the rapid changes in and insights coming from DevOps, their IT infrastructure and processes exist to support data acquisition, processing, and analysis. These companies understand their markets and their customers because they understand their data; the IT advances they have made are first and foremost in support of this understanding. They fully exploit data to drive success. The moats protecting these companies do not come from their unique products or their product pipelines. They come from how well they are able to torture their data in the pursuit of information. To put it bluntly, anyone with enough capital could set up another online market or communications platform to compete against the giants. I would suspect the companies might even relish such competition, if only to defuse the political posturing about possible monopolistic tendencies.

But these new competitors are virtually certain to fail, since they cannot recreate the data and insights these companies have amassed. The vast breadth and depth of these data allow these companies to exploit the tools of data science to harvest insights and patterns from huge swaths of human behavior, some only seemingly related to the business at hand. Without these vast collections of data, without the insights that can only be gleaned from these big data, trying to compete against “new modern” corporations is an exercise in tilting at windmills. Many people love how personalized and insightful their interactions with huge corporations have become, and would be loathe to walk away from that personalization.

This is where my sense of failure begins to rear its head. I am passionate about improving health care; all politics aside, there is not a single health care system on the planet that does not need a vast overhaul toward even better care delivery. My year in DevOps so far has been wonderful, and full of shiny toys with which to play. The team has made great strides and is starting to deliver powerful contributions to improving the workings of our academic medical center. But I realize now that the destination toward which I have been pushing the team, the vision of a lean, agile, reflective DevOps just misses the mark of where we should be heading.

I have spent this year evading one fundamental question — what is DevOps in this setting? At one level, DevOps is about removing barriers, allowing people to excel, and putting function over structure. But at lower level, academic medical centers do not, and should not, develop large software applications. The goal of health care, of working in an academic medical center, is to maximize the value and benefit of health care delivery to each and every patient and to increase the efficiency of those working in this field. The DevOps lexicon, which I still love, of continuous integration, continuous development, continuous reflection, lean workflows, agile processes, cross-functional autonomous teams, all of this is secondary and merely in support of where IT can make its biggest contribution to health care. As I noted above, health care evolves very slowly, but the new economy does not tolerate fossils at all well. We need to catch up with the world.

Health IT, or HIT, must lead health care into the new economy. We must focus on what matters; in other words, HIT must be data-centric in everything we do. We must be continuously striving to support and advance DataOps. Health data, whether they come from the clinic, lab, or classroom, are the sole reason HIT exists and is the number one capability we bring to improving health care. DataOps covers the entire mesh of the collection, management, securing, partitioning, recombining, classification, and analysis of data in order to provide understanding and insights to the practice of health care. DataOps extracts information from the big data to be found in our industry.

DataOps pipeline diagram

The success of this DataOps mesh requires the eradication of data silos, or what I have taken to calling “armories of mediocrity”. Too many organizations in health care (and elsewhere) view the mere ownership of data as an asset. Pulling information out of narrow, isolated slices of data disconnected from all the other information out there cannot provide us with insights into something as complex as human health across all levels from the molecular through the individual to the population. The industry ends up chopping up data into small collections of observations over limited functions: primary care clinics, surgical centers, public health agencies, billing…

Several arguments are given to stop the pooling of data, such as security or profitability. This resistance to data sharing, especially to allowing the people who are described by the data to manage their own information, has led to Federal regulations aimed at preventing silos and data blockage, such as the 21st Century Cares Act. Such resistance leads us to the world we now experience, where small, specialized data sets are spread across countless organizations, requiring redundant staff to secure all of these data in all their locations, and lowering the overall security of the data by introducing multiple points of weakness. Do we really want to live in a world where the people with the most comprehensive insight into our health data end up being the hackers who have cobbled together a collection of multiple sets of data stolen from a variety of health organizations? In other words, those who currently have the best ability to draw insights about health care are the very people who cannot let us know that they could inform us.

The head of NCATS, part of the NIH, was infamous for opening talks with the observation that clinicians are nothing more than glorified auto mechanics. While doctors do not like hearing this, the point he was trying to make is that while there may have been a time where a country doctor could visit someone, bringing just a small bag of tools and a lore of knowledge gained through years of practice, we are long past that time. Just like it is now nigh on impossible to maintain your own car with just a small toolkit and a general understanding of mechanics, we should be terrified of any person who claims to be able to rely solely on what they have learned to do on the job to solve health issues. The strong movement toward evidence-based medicine has been fueled by the leaps and bounds we have made in the HIT ability to collect an increasing range of health data observations and then to process those data. We no longer live in an age where doctors gathered a few data points by counting a pulse rate, osculating lung function, and collecting personal and family health history. Under such conditions, of course we needed people who could apply their personal brainpower to make medical decisions based on the leaps of intuition that they could glean from so few data points. But we no longer live in a world of data paucity; instead, our problem has become how to make sense of all the data coming from so many places or even to identify what data may be relevant. Who would have anticipated forty years ago, for example, that we could end up using the echoes of Wi-Fi signals to track potential health dangers among the elderly? Evidence-based medicine is all about taking the idiosyncratic elements out of medical practice, relying instead on analytic and pattern-recognition insights. And those insights can only be as good as the data from which they are drawn. We need lots and lots of clean data to drive these insights, as well as the tools to work with such big data.

One of my favorite definitions from back when the term Big Data began to be bandied about can be simply paraphrased as: “if you can get your head around the data, you do have Big Data”. The term involves much more than a mere pointer to large sets of data. Big Data are difficult to structure, often of varied quality, and can even combine different units of observation. You may have billions or trillions of rows, but if it is well-structured then you do not have Big Data. To be a little more rigorous, Big Data contain much higher levels of complexity that need to be managed before we can understand the information space. And we need to know about the information content of the data in order to extract knowledge.

To put this too bluntly, we no longer need the compassionate country doctor stopping by in the middle night knowing how to make you better because they have a rich medical history on a few people (i.e., the doctor who knows what you have because they can remember when your grandparent had a similar thing). Instead, we need the tools and people who can help us torture immense quantities of data until they confess and can then synthesize these confessions into a usable body of knowledge. Even better, this body of knowledge can then become embodied in automation, decision-tools, and rapid data exploration tools to assist patients and clinical staff in real time. Similarly, we no longer need corporations miserly lording over their small collection of specialized data stored in their armored vault, occasionally pulling out a coin to admire its sparkle and shine. We need to pool data without a priori and arbitrary structuration. We no longer need IT staff denying access to data by those who need it, all in the name of security. We need to figure out optimal ways of wrapping data in heavy-duty blankets of security while all the while making access by clinical staff and researchers brain-dead simple.

So what is DataOps? Or more accurately, why should DataOps be the ultimate focus of all of HIT? Or to take it one more step, since there is nothing that happens in corporations that is not centered around IT, why should corporations focus primarily and almost solely on DataOps? DataOps starts from the insight of the “new modern corporation” that data drives everything else. DataOps embraces Big Data and constantly strives to create better and additional ways to understand and best use the information embedded in these data. DataOps seeks to find ways of organizing data without creating artifices of structure that preclude future ways of approaching data. DataOps realizes that security and availability are no longer diametrically opposed; indeed, we understand there are ways to store large amounts of data much more securely that have the nice side effect of making legitimate access quite simple. DataOps understands that statistics and data science be effective only when the process of managing data or the underlying IT gets out of the way of analysis. Put simply, we live in an age of massive data and we need to work with a strong focus to use these data to obtain the best insights that we can in order to improve health, health care, and health care delivery.

Finally, I want to emphasize the big insight from DevOps that has driven me over the last year. Organizational structure must follow function. In a world where medical care is all about the data, the structure of the organization needs to be shaped around that insight to maximally optimize what data can do for patients, clinical staff, and researchers. We cannot be structured to focus on billing, for example, and expect to be able to deliver rational health care. And in an age where health data are expanding and reaching across to other domains, organizational structure cannot remain static. We must be as fluid and responsive as the data with which we work. I do fully realize that “maximally optimize” is a verbal abomination — but its intentional use brings us back to “new modern” corporation with which I opened, as well as back to the inevitability that the times are changing. The only health care organizations that will still be functioning and succeeding in a few years will look nothing like those which litter the landscape today.

The survivors, by necessity, will be new modern DataOps organizations delivering maximally optimized health and health care by continually evolving their ability to collect and exploit data (my apologies, but someone had to put that sentence together). I will probably “fail” again in my attempt to create an economy of such organizations and to improve health care using HIT and data, but I relish the ongoing attempt!

--

--