Massively parallel sequencing technology, termed next generation sequencing (NGS), has transformed biological research, offering an unparalleled level of data collection, invigorating the field of genomics and revolutionizing the potential for understanding our genetic basis. As access to technology increases and costs become more affordable, NGS-based research and applications will continue to grow.
Evolution of Sequencing Technology
Sequencing approaches have come a long way from the early, gel-based, dideoxynucleotide technologies. With traditional sequence approaches termed Sanger-sequencing after Frederick Sanger who developed the technique with colleagues in the late 70s.
Nucleotide-specific sequenced fragments are produced using a mix of normal and chemically modified nucleotides that terminate extension, resulting in a range of fragment lengths that can then be separated (originally by gel and later through column separation) and subsequently “read” from shortest to longest to determine overall target sequence. In contrast, next generation sequencing (NGS) approaches essentially conduct sequencing and detection simultaneously, typically in a medium that allows from thousands up to billions of reactions to be sequenced within a single instrument run. Referred to as massively parallel sequencing, NGS produces enormous data sets and is being used in all areas of biological research and applied science.
NGS – Technical Considerations
Each NGS platform uses a different approach, chemistry and medium, however all platforms require pre-preparation of target DNA into a sequence-ready library.
High molecular weight DNA must be sheared to an appropriate size and ligated with the platform-specific adapter that will initiate the sequencing process.
At the other end of the NGS workflow are the data handling steps. Several terabytes of raw data are generated for every NGS run and a series of pre-processing, alignment and assembly steps follow. The read-length attained by NGS is in the range of 50-500 nucleotides, much shorter than that seen with Sanger, thus an appropriate number of overlapping short reads - termed coverage - is an essential quality feature to ensure accurate sequence assembly. The short-read sequences that can be successfully assembled and matched to a reference sequence are termed “mappable reads” and the quality of library preparation can directly influence this number. Determining how many reads is required depends on the overall experiment requirements. Analysis of rare transcripts, or de novo genome assembly, require greater read depth and increased coverage.
NGS – Applications
NGS has been applied in virtually all areas of biological research, excelling in the areas of whole-genome sequencing and resequencing to identify differences among reference sequences and bring more understanding to the genotype/phenotype relationship.
Organisms with no reference genome have also benefited from the rapid approach of NGS, with a growing list of de novo assembled genome sequences now being added to databases, from model organisms through to endangered species and even extinct species such as mammoth and early human.
Metagenomics is a growing field for NGS applications, analyzing all DNA within an environmental or medical sample, offering a complete insight into the complexity of the inhabiting microbiome. Comparisons can be made between healthy and affected individuals, for example using samples from the human gut, in an effort to identify links to specific phenotypes such as obesity or irritable bowel. Genome-wide association studies are also being used in large-scale efforts to identify disease-causing genes and functionally link DNA with phenotype. Transcriptome sequencing using NGS is another way to investigate gene expression, offering a quantifiable snapshot of the RNA transcripts within a sample.