Haris Vikalo – Selected projects

Algorithms for haplotype assembly from next-generation sequencing data

Humans are diploid organisms with two sets of chromosomes: 22 pairs of autosomes and one pair of sex chromosomes. The two chromosomes in a pair of autosomes are homologous, i.e., they have similar DNA sequences and essentially carry the same type of information but are not identical. The most common type of variation between chromosomes in a pair is that where the base in a specific location differs between the two sequences, i.e., the corresponding alleles on the homologous chromosomes are different. The complete information about DNA variations in an individual genome is provided by haplotypes, the list of alleles at contiguous sites in a region of a single chromosome. Haplotype information is essential for medical and pharmaceutical studies, including understanding variations in gene expressions and recombination patterns.

This research aims to develop and analyze novel algorithms for haplotype assembly from next-generation sequencing data. It consists of three main thrusts: (1) Haplotype assembly from next-generation sequencing data is computationally challenging. The first thrust proposes branch-and-bound algorithms that exploit certain structural features of the problem to efficiently find the exact solution. (2) As the size of the haplotype assembly problem grows, the exact solution is increasingly more difficult to obtain. The second thrust is focused on the development of fast heuristic methods with guaranteed performance bounds that enable explicit complexity-accuracy trade-offs. (3) Existing haplotype assembly schemes process DNA fragments comprising nucleotides whose order is already determined by the sequencing platform. The third thrust is focused on the development of algorithms for finding joint solution to the base-calling and haplotype assembly problems, enabling significant improvements in accuracy.