ACM

Communications of the ACM

ACM News

AI In the ICU

By Sandrine Ceurstemont
Commissioned by CACM Staff
August 14, 2020
Comments

View as: Print Mobile App Share:

A doctor works on a patient in the intensive care unit. — Artificial intelligence-based tools can pull in new information about a patient to assess their condition continuously.

Credit: Nvidia

Severely ill patients in hospital intensive care units (ICUs) are monitored constantly using equipment and through lab tests. An electrocardiogram (ECG) is used to track a patient's heart rate, for example, while blood samples are taken frequently and analyzed. However, the data collected must be integrated and interpreted to diagnose and treat patients, which can be a challenge for clinicians who largely review it manually.

"There's a lot to process and humans aren't naturally very good at processing a huge amount of heterogeneous data," says Alistair Johnson, a postdoctoral researcher focused on the application of machine learning to healthcare at the Massachusetts Institute of Technology (MIT).

Furthermore, a patient's condition is typically assessed just once a day. The Sequential Organ Failure Assessment, or SOFA score, is a commonly used tool that evaluates illness severity by summing up 13 variables that represent the functioning of different organ systems. It typically uses the worst measurements from the past 24 hours, which may not be representative of a patient's current state.

"In the ICU, a patient's condition can change very quickly," says Parisa Rashidi, assistant professor and director of the Intelligent Health Lab (i-Heal) at the University of Florida. "It is much better if you have a tool that gets updated frequently as you are measuring physiological signals in a patient."

Machine learning could help. Rashidi and her colleagues have been working to create a tool that can pull in new information about a patient continuously to more accurately assess their condition. In pursuit of that goal, they developed a novel scoring system, called DeepSOFA, that uses deep learning to assess illness severity in the ICU.

The team incorporated a recurrent neural network (RNN) model into the system, due to the nature of the data. The patient information to be analyzed would be in the form of time series — sequences of measurements taken at different times during their stay in the ICU. Using an RNN was an easy choice, since it can easily process this type of data. "We used a customized recurring model that we developed so we could add some interpretability to the model," says Rashidi.

Their model was trained using two datasets to make sure it could be generalized. Each hospital has slightly different care routines and patient cohorts, for example, which can skew results.

One of the databases, UFHealth, contained data from 36,216 ICU admissions at the University of Florida Health Hospital, while the other, the publicly available Mimic-III, tracked 48,948 admissions from Beth Israel Deaconess Medical Center in Boston. Some information from each dataset was used to train the model, which then was validated using parts of the data it had never seen before.

Rashidi and her colleagues found that their model performed significantly better than the traditional SOFA scoring system. In one example, they focused on a single patient from the UFHealth cohort who died after 112 hours in the ICU; while her vital signs had been stable, other aspects of her condition (such as her breathing) worsened over time. Five hours before she died, the SOFA model predicted a 51.5% chance she might die, while DeepSOFA estimated a 99.6% chance of death.

Deep learning models can be hard to interpret, though. Since the algorithms learn on their own, what they are learning is often a black box. In a clinical setting, that can be a drawback. "You have to explain to physicians why your model is making a certain prediction and you have to explain it at some point to family and caregivers when you're making a decision," which is why it is so important that a model is explainable to some degree, says Rashidi.

The team is working on interpretability models that could better explain the decisions made by the system. To date, they have been able to get some clues through a mechanism built into their deep learning model, which assigned a score to each hour preceding its prediction, based on how important it was to the decision; they were then able to look at how physiological signals might have changed in significant time windows. "We highlighted parts of each signal that seem to be important," says Rashidi. "That helps, because you can show it to the clinician and it can be used to improve trust in the model."

More specific aspects of ICU care could benefit from machine learning too. Johnson and his colleagues have been focusing on predicting the need for mechanical ventilation, the insertion of a tube into a patient's trachea, and connecting it to a machine that assists with breathing. Although it is one of the biggest interventions that takes place in an ICU, there is no scientific rationale to guide when it should be initiated, so such decisions usually are based on a doctor's experience and beliefs, which is why they can vary significantly from hospital to hospital, says Johnson.

During the Covid-19 pandemic, for instance, there has been debate over when to ventilate patients. While low blood oxygen levels serve as an indicator for many respiratory ailments, doctors have found that may not be the case for Covid-19 patients, who often don't exhibit extreme breathing problems when blood oxygen levels fall.

Machine learning models, however, could help clinicians make a decision. Models developed by Johnson and his colleagues aim to identify ICU patients that require ventilation as early as possible by evaluating physiological signals at different times during their stay. "Maybe it wasn't clear that the patient definitely needed a ventilator, and this is going to provide additional information," says Johnson. It also could help with resource planning and having a ventilator ready, he adds.

The team built different models using the same public database used by Rashidi and her colleagues, called Mimic-III. Some patients had been mechanically ventilated during their stay, while others had not required that level of intervention.

The researchers found the model that used gradient boosting algorithms performed best. The approach uses decision trees to analyze the physiological data, coming up with an outcome based on weighted scores for each clinical result. They also created a model using a deep learning neural network, but it didn't perform as well. "It's ambiguous as to what it's actually learning," says Johnson. "I don't think it's right to conclude that neural networks are useless in this situation, but we didn't quite have the right configurate."

The model also revealed which physiological signals were most important for determining whether a patient needed mechanical ventilation. Unsurprisingly, respiratory rate and a patient's age were very significant, but urine output turned out to be more relevant than blood oxygen levels, which was unexpected. "The caveat to that is that they may be artificially keeping the oxygen saturation high, and it's getting harder and harder to do that," says Johnson. "It's hard to reason about individual feature importance because there are so many interactions going on."

However, timing was key as well. The gradient-boosting model could best predict whether a patient would require ventilation as much as eight hours before they were intubated. The need for ventilation was harder to gauge earlier than that.

Although the model is promising, Johnson thinks it could be improved by evaluating it with actual patients whose outcomes are still unknown. So far, they have used retrospective data, which showed whether patients had been ventilated or not, allowing them to analyze different time frames prior to intubation. Whether the model would be valid with actual patients is unclear. "I think it's really important to know how this would perform in a setting where it would actually be used," says Johnson.

Rashidi also believes prospective validation is important, and her team is planning to test their model in a clinical setting. They also want to evaluate what clinicians think of it, and how helpful it would be to them.

"It's not just a problem that we, as machine learning practitioners, would like to explore," Rashidi says. "They also need to find it useful."

Sandrine Ceurstemont is a freelance science writer based in London, U.K.

No entries found