Computationally intensive next-generation sequencing problems

Partner: University Hospital Ostrava

Field: Healthcare

The Department of Medical Genetics has an exceptional position within the University Hospital Ostrava (UHO) due to its supra-regional importance resulting from the catchment area of the Moravian-Silesian Region. 

Medical genetics is a multidisciplinary field, the results of which are reflected in all areas of medical care, and their main focus lies in prevention.

Experts of the UHO genetics department provide genetic counselling and perform specialised examinations in cooperation with its cytogenetic and molecular genetic laboratories and laboratories throughout the Czech Republic and abroad.

They diagnose congenital developmental defects, developmental disorders in childhood, monogenic diseases, oncological, neurological, and neurodegenerative diseases, etc. Many of their clients are also pregnant women and couples planning pregnancy. Therefore, the department also deals with the diagnosis of fertility disorders and pregnancy planning for couples carrying genetic diseases and family burdens or for those in kinship relationships.

The main objective of this collaboration was:

  • to test the performance of complex computational tasks in modern DNA sequencing methods, i.e., next-generation sequencing (NGS),
  • and to provide the necessary information for acquiring specialised equipment that processes the results of genetic screening of patients.

The supercomputers of IT4Innovations National Supercomputing Center were designed to help obtain baseline data in terms of computational complexity, scalability, and data volume of the NGS pipeline.

The Czech Barbora and Karolina supercomputers were used to test the complexity of NGS data processing. The state-of-the-art processing pipeline is written in the Nextflow workflow language and thus allows the available computational resources of the supercomputers to be fully utilised. The Nextflow pipeline consists of processes that execute a given task on the input data – the pipeline keeps track of which processes or tasks can run in parallel so that the limit of allocated compute nodes and the associated number of CPUs (central processing units) is not exceeded.

Testing was performed on two types of data containing information on a total of 20 patients (exome: large, 3 patients; and panel: small, MR-MIKRO4 panel, 17 patients). Resulting statistics, called benchmarks, tracking CPU, memory, and data usage were obtained for each type of data processed and each part of the computational pipeline.

The result of the collaboration consists of a series of benchmark performance tests for NGS data processing, which were performed on supercomputers to provide baseline data in terms of computational and data requirements.

The collaboration has successfully delivered the required background for acquiring dedicated equipment for processing patients’ genetic screening results.

“This equipment, which we plan to place in the Department of Medical Genetics, should significantly improve the efficiency and speed up the processing of patient genetic screening data”, explains Jiří Novotný, Department of Medical Genetics, UHO.