 |
|
|
|
Terminology
- General terms
- Next-generation sequencing
Also called second-generation sequencing or deep sequencing. This primarily refers to the Roche 454, Illumina Genome Analyzer, and ABI SOLiD.
- Next-next-generation sequencing
Can be called third-generation sequencing. This generation most often includes sequencers that sequence without amplification (i.e. on a single DNA molecule, also called Single Molecule sequencing). This includes the current/up-coming machines by Helicos, Pacific Biosciences, Oxford Nanopore...
- Reads
The nucleotide sequences that come of a next-generation sequencers, having variable read-length.
- Tag
- ID-tag or barcode. Sequence added to a sample to allow IDentification after sequencing, usually from a pool of mixed samples with different ID-tags.
- Reads aligned to a reference genome.
- Captured RNA fragments, such as from DeepCAGE and DeepSAGE.
- Barcoding or Indexing
Nucleotide sequences that can be used to separate individual samples out from a pool of samples.
- Paired-end
Sequencing both ends of one DNA fragment. Note: for Illumina and SOLiD this is typical for +/- 300-500 bp fragments.
- Mate-pairs
Sequencing both ends of larger DNA fragments (3-10 Kbp range) from which the middle part was removed during sample preparation (usually involving a circularization step). NOTE: for Illumina this is typical for "longer" inserts.
- The 2 DNA strands
- Forward/Reverse
The forward and reverse sequence read of a DNA fragment. NOTE that using hybridization capture I usually isolate only one strand (complementary to the probe used) that is subsequently sequenced in two directions. In a clinical setting this is different from sequencing a region in both directions (starting with a mix of both strands).
- +/-
Usually used in genome browsers, "top and bottom" strands.
- Sense/antisense
The direction of a sequence read based on the transcriptional orientation of a gene; transcribed (strand) compared opposite non-transcribed (antisense) strand.
- Colorspace
A system used by ABI SOLiD to not represent single nucleotides, but dinucleotides (a pair of nucleotides).
- Dark nucleotide
A nucleotide that in single molecule sequencing in incorporated but not detected, either due to incorporation of unlabeled nucleotides, failure to fluoresce and/or failure to detect by imaging.
- BioInformatics terms
- Alignment
Finding where on a reference genome reads fit.
- Unique(ness): whether a read aligns to one or multiple location.
- Mismatches: when a read aligns to the genome, but one or more nucleotides do not match the reference.
- Probabilistic model: when a read aligns equally to more than one location distribute the alignment output equally (randomly ?) among these locations.
- Assembly
Building contigs out of reads.
- Dustbin
The collection of all reads not aligning to the reference sequence under the thresholds set.
- Sequencability
Giving the reference sequence used and the read lengths of the fragments obtained, the theoretical chance to cover the reference sequence. NOTE regions with reduced "sequencability" usually contain sequences that are present multiple times in the genome.
- FASTQ format
@HWIEAS422:7:1:9543:965#GTCTGG/1
NAGACCTGCGGCTCCTCATCCACGGGCTGGTCGTATGCCGGCTTCTGCTT
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
See http://en.wikipedia.org/wiki/FASTQ_format for more information.
|
|