In findings published in Nature Methods, Israeli and American researchers report on a revolutionary, big data-driven, machine learning algorithm designed to predict results of human gene expression based on preliminary results of mouse studies. The tool, which is expected to speed up the development of new medical therapies and may reduce future experiments done on mice, was developed by researchers from the research team of Assistant Professor Shai Shen-Orr from the Rappaport Faculty of Medicine at the Technion-Israel Institute of Technology, together with Professor Rob Tibshirani and Ph.D. candidate Wenfei Du from Stanford University. The study was led by Technion Ph.D. candidate Rachelly Normand.
Use of laboratory mice for basic and preclinical research is essential to advance medicine and develop new medicines and therapies around the world. Mouse model studies are critical in experiments that cannot be performed in humans due to ethical considerations, including the study of diseases and physiological processes in the brain, spleen, and heart and in testing the efficacy of new treatments for certain diseases. But despite the common use of lab mice, extrapolations of mouse study results to understanding the effects of such treatment in humans is not always straightforward. This is due to the many physiological, genetic, life expectancy, and environment differences between the two species. In other words, many effects are "lost in translation" in the mouse-to-human transfer and many drugs that are effective in lab mice fail when tested in humans. The tool developed at the Technion, which predicts the relevance of preliminary mouse test results to human physiology, could speed up the development of new drugs and dramatically reduce the cost of development.
One of the developments that enabled this breakthrough is a relatively new norm: uploading raw data from scientific studies to the Internet. This change, which began with the human genome project, has evolved and grown, and there are now measurements of more than 2 million samples registered online. Most were collected from tissues of human patients and disease animal models. The levels of mRNA—a central component in protein production—were measured in each sample, covering tens of thousands of genes in the genome.
"This is a huge amount of data—a tremendous amount of information on the Internet, which is generally not used beyond the study in which it was generated," says Shen-Orr. "The assumption in my laboratory is that these data hold hidden treasures which can be extracted using creative thinking and algorithm development. In the current study, we decided to leverage this information to address the problem of translating animal model findings to insights relevant to humans. In other words, in this study, we bridge the "cross-species gap" arising from the differences between humans and animal models."
Shen-Orr, Normand, and their colleagues developed an algorithm that better "translates" the experiments that were conducted in mice and enables extrapolation of the implications they will have on human physiology. The system is called FIT (an abbreviation of Found in Translation, a play on "Lost in Translation"). Using this big data—a great deal of the information accrued in prior studies and which has collected on the Internet—the system learns the relationship between gene expression in mice and an equivalent human condition. Given a new animal study, such as evaluation of a new pharmaceutical treatment, the system identifies, for each gene, whether the information collected from prior studies is relevant and beneficial for the new study. If the information is relevant, the system adjusts the results measured in the new study and enables investigators to interpret the new study findings in mice such that it is relevant to humans.
The researchers evaluated FIT's performance in 170 different mouse studies. They demonstrated that in 88% of the cases FIT is predicted to be relevant to the new mouse experiment, and the system indeed correctly predicted the gene expression profile in the analogous disease state in humans. This improves the mouse-to-humans inference by 50%. In addition, the researchers tested the predictive power of FIT in a Crohn's disease mouse model. FIT predicted that the ILF3 gene will be expressed in humans, despite the fact that it is not expressed in mice. In a validation experiment, the researchers showed that the protein product of the ILF3 gene is indeed expressed in Crohn's patient samples—a result not previously known and which would not have been discovered without using the machine-learning algorithm.
"This process not only improves research accuracy, but also prevents false leads and shortens drug and therapy development processes," Shen-Orr says.
No entries found