Prediction of novel virus–host interactions by integrating clinical symptoms and protein sequences
Robert Hoehndorf and his team have built a system that serves to create a repository of viral genomes sequenced in Saudi Arabia. Their work concerns the development of infrastructure to store and process COVID-19 associated data. In particular, the team seeks to develop sequencing data and metadata obtained from samples across the region. While there are several international data sharing efforts, it is important to collect regional information, develop standardized workflows, and generate sufficient metadata to exchange and combine datasets.
The viral genome database allows for the submission of unprocessed sequencing data. The database then executes a set of standardized workflows that automates large parts of bioinformatics processing and assembles the submitted sequences into a pangenome that can be queried and analyzed further. Importantly, the data is stored together with metadata, which ensures that any information in the sequence database can be exchanged and combined with other data worldwide.