Sequence assembly—improving the ability to assemble the complete human genome

An international team led by researchers in the UC San Diego Department of Computer Science and Engineering has shown that a new genome assembly algorithm, called La Jolla Assembler (LJA), greatly improves large-scale genome reconstructions, in which DNA fragments are arranged The process of forming a complete genome is an essential aspect of genome sequencing.

 

Furthermore, LJA significantly reduced error rates and improved the ability to assemble the complete human genome. This will make it easier to conduct large-scale population studies, in which thousands or millions of people are sequenced and their genomes compared to better understand the genetic factors that lead to disease. The study was recently published in the journal Nature Biotechnology, titled "Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads".

Pavel Pevzner, Distinguished Professor of Computer Science, said: "We used LJA to completely reconstruct almost half of the chromosomes in the human genome in a fully automated fashion. This reduced assembly errors by a factor of five compared to other assembly algorithms using HiFi sequencing technology. "The accuracy of this approach will bring important benefits, especially for large population studies of complex and understudied regions of the human genome, such as centromeres or antibody production sites."

 

Genome assembly algorithms are computational tools for reconstructing genomes based on collections of shorter sequences. For years, researchers have relied almost exclusively on short-read technology, which produces reads of up to 300 nucleotides. These provide vital genomic information but leave gaps in the genome sequence—many in biomedically important regions. As a result, the Human Genome Project, completed two decades ago, left thousands of unassembled regions -- unexplored DNA that could have clinical and scientific significance.

 

"This incomplete assembly of the human genome revolutionized biology and medicine 20 years ago," said lead author Anton Bankevich, a postdoctoral researcher in the Department of Computer Science and Engineering. "However, the missing genome segment may harbor more secret."

 

More recently, scientists have begun employing HiFi sequencing technology (over 10,000 nucleotides), which has helped them sequence complete human and other genomes. The completion of the first complete human genome last year by the Telomere-to-teltelomere (T2T) consortium was an important milestone. However, this feat requires a lot of work and is nearly impossible to scale to hundreds, let alone millions, of genomes.

 

To automate the process and increase speed and accuracy, Pevzner's team employed a computational method called a de Bruijn graph that helps them Sequence assembly into complete genomes. Originally invented by the Dutch mathematician Nicolaas de Bruijn as an obscure mathematical method that has become a workhorse for sequencing, modeling a genome as a complex network of roads linking cities (short segments of the genome) found A method of traversing the network while using each road. In a sense, this is history repeating itself. More than 20 years ago, Pevzner and colleagues used the de Bruijn graph to understand short readings.

Leave a Comment

Shopping Cart