Collaborative Mini-Grids for Prediction of Viral RNA Structure and Evolution

This project aims at designing a collaborative, peer-to-peer software architecture for distributed bioinformatics algorithms, which makes research into RNA-based diseases like HIV, SARS, and bird flu more efficient than with current approaches. The project is interdisciplinary and involves researchers from computer science, bioinformatics, molecular biology, and nanotechnology. The partners involve the The IT University of Copenhagen, the Department of Molecular Biology, at the interdisciplinary nanoscience centre (iNANO) at Aarhus University (AU), and CLC Bio.

The overall objective is to make theoretical and practical research into RNA-based diseases more efficient than with current, available methods. This is done by making bioinformatics software for theoretical analysis of RNA available for practical use in a biology laboratory. Detailed analyses on large amounts of data and extensive search in large databases are done in this kind of research. Efficiency is obtained by developing software systems, which utilize existing low-cost computers (e.g. PCs) for analysis and by making such distributed parallel computing much more user-friendly and robust than existing approaches. This implies that such analyses can be done by non-technical persons, including biologists working in the laboratory. The specific goal is to create a general-purpose distributed software infrastructure for bioinformatics research in a biology laboratory, including support for distributed computation and database searches, which makes RNA analysis more efficient, while ensuring that more researchers can actually perform them.

Biological sequence analysis suffers from a fundamental problem, namely that the amount of biological data available is growing faster than the computational power given by Moore's Law (Figure below). This means that new, innovative methods must be developed that exploit the resources available for extensive calculations – for example grid computing.


Computer power versus biological sequence data. The blue curve represents Moore's Law, i.e. the number of transistors on an integrated circuits doubling every 2 years. The red curve represents biological sequence data stored at Genbank given as the amount of base pairs sequenced.
This research has been funded by the Danish Agency for Science, Technology, and Innovation under the project "PC Mini-Grids for Prediction of Viral RNA Structure and Evolution", #09-061856.