A research team from Munich has created an algorithm that predicts the effects of genetic mutations on RNA formation six times more precisely than previous models. As a result, the genetic causes of rare hereditary diseases and cancer can now be pinpointed with greater accuracy.
The study was published in the journal, 'Nature Genetics.
Variations in genetic sequence are relatively common, affecting one in every thousand nucleotides in a person's genome. In rare cases, these changes can result in faulty RNAs and thus non-functional proteins. Individual organs may become dysfunctional as a result of this. If a rare disease is suspected, computer-assisted diagnosis programmes can aid in the investigation of possible genetic causes. The genome, in particular, can be analysed using algorithms to determine whether there is a link between rare genetic variations and dysfunctions in specific parts of the body.
Interdisciplinary research project
Under the leadership of Julien Gagneur, Professor of Computational Molecular Medicine at the Technical University of Munich (TUM) and leader of the Computational Molecular Medicine research group at Helmholtz Munich, an interdisciplinary team from the Informatics and Medicine departments it has developed a new model that is better than its predecessors at predicting which DNA variations will lead to incorrectly formed RNA.
"A reliable diagnosis can be made for about half of our patients using established DNA analysis methods," said Dr Holger Prokisch, co-author of the study and group leader of the Institute of Human Genetics at TUM and Helmholtz Munich. "For the rest, we need models that improve our predictions. Our newly developed algorithm can make an important contribution to this."
The focus of the model is on splicing
In their study, the researchers considered genetic variations that influence the conversion process of DNA into RNA and ultimately the formation of proteins in a tissue-specific fashion. The focus was on splicing - a process in the cells where the RNA is cut in such a way that the building instructions for the protein can be read later. If there is a variation in the DNA, this process can be disrupted and results in either too much or too little being cut from the RNA. Errors in the splicing process are thought to be one of the most common causes of incorrect protein formation and hereditary diseases.
Significantly greater precision than previous studies
The team leverage on existing data sets in order to be able to make statements about possible associations between genetic variations and splicing dysfunctions in specific tissues. These data sets contain DNA and RNA samples from 49 tissues from 946 individuals.
In comparison to previous studies, the team initially considered each sample to see if and to what extent incorrect splicing resulting from variation in the DNA generally manifests itself through splicing dysfunctions in certain tissues. For example, a protein may be relevant for special areas of the heart, while it may have no function in the brain.
"For this purpose, we created a tissue-specific splicing map in which we quantified which places on the RNA are important to splicing in a given tissue. Thanks to our approach, we were able to limit our model to biologically relevant contexts. The skin and blood samples we used enabled us to draw conclusions about hard-to-reach tissues, such as the brain or the heart," said Nils Wagner, lead author of the study and doctoral student at the Chair of Computational Molecular Medicine at TUM.
In the analysis, each gene with at least one rare genetic variant that is relevant for protein formation was considered. In addition to the protein-coding sections on the RNA, there are sections that are important for other processes in our cells. These were not considered in the study. This resulted in a total of nearly 9 million rare genetic variants being studied.
"Thanks to our newly developed model, we were able to increase the precision of predicting incorrectly splicing sixfold in comparison to previous models. At a recall of 20 per cent, previous algorithms achieved a precision of 10 per cent. Our model achieves a precision of 60 per cent at the same recall," said Prof. Julien Gagneur.
Precision and recall are essential metrics for projecting the effectiveness of models. The precision indicates how many of the genetic variations predicted by the model actually lead to incorrect splicing. The recall shows how many genetic variations and mutations that lead to incorrect splicing are recovered by the model.
"We achieved such a large advance in precision by looking at the splicing process in a tissue-specific way and by using direct splicing measurements from easily accessible tissues such as blood or skin cells in order to predict splicing errors in inaccessible tissues like the heart or the brain," said Prof. Julien Gagneur.
READ ALSO: