Efficient querying of colored de Bruijn graphs

Speaker: Tizian Schulz, Bielefeld University.

Date: 21 feb 2018, 13h.

Place: Room 407, Bloco H, Campus Gragoatá, UFF.

Abstract: Since the advent of the first sequencing technologies, DNA and protein sequences are stored in massive databases providing access for researchers all over the world. Not least due to their sizes, such databases require fast, efficient and sensitive methods to query information from them. One of the most famous and commonly used methods is known as basic local alignment search tool (BLAST) (Altschul et al., 1990) which searches for highest scoring local alignments between a query sequence and all sequences of the database.

Basically, the algorithm can be divided into three steps. First, short exact matches are searched between query and database sequences. These are then extended left and right to find maximum scoring alignments. In the third step, promising candidates are realigned and close-by findings are combined. Additionally, a significance score is calculated allowing to estimate the results' relevance.

Nowadays, cheap sequencing technologies are used to sequence complete genomes of hundreds and thousands of individuals of the same species. The storage of such large and highly redundant sequences devours a vast amount of memory and requires efficient data structures to store and handle the data.

Colored de Bruijn graphs allow a combined storage of a set of sequences that share a certain similarity and, hence, are considered to be able to fulfill these demands. However, the development of efficient methods to search within the sequences represented as such a graph is challenging. In this talk a new method is introduced that tries to query a colored de Bruijn graph in a BLAST-like manner.