Bowtie2
From the manual: “Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences”.
bowtie2 can be used to:
- index reference FASTA nucleotide genomes/sequences
- align FASTQ sequencing reads to those genomes/sequences
Differences between bowtie and bowtie2
bowtie2has no upper limit on read lengthbowtie2can make gapped alignmentsbowtie2is more flexible for paired-end alignmentbowtie2is faster and more memory efficientbowtieis advantageous overbowtie2for relatively short sequencing reads (50bp or less)
Indexing a reference genome/sequence using bowtie2-build
Before aligning reads to a reference genome with bowtie2, it must be indexed using bowtie2-build. This command will create six files with the extensions .1.bt2, .2.bt2, .3.bt2, .4.bt2, .rev.1.bt2, and .rev.2.bt2. These six files together are the index. Once an index has been created, the original reference genome/sequence is no longer needed to align reads. Here’s an example bowtie2-build command:
$ bowtie2-build reference_sequence.fasta index_name
In this command, the reference_sequence.FASTA is the nucleotide FASTA sequence we want to index, and index_name is the name of the index. There will be six files beginning with the index_name in the output directory: index_name.1.bt2, index_name.2.bt2, index_name.3.bt2, index_name.4.bt2, index_name.rev.1.bt2, and index_name.rev.2.bt2. There’s no need to specify any of these files individually, just the index_name alone is enough to refer to the entire index.
Aligning reads to an indexed genome/sequence using bowtie2
Now that the genome has been indexed, FASTQ sequencing reads can be aligned to it. This is done using the bowtie2 command. Here’s an example bowtie2 command:
$ bowtie2 --no-unal -p n -x index_name -1 reads_1.fastq -2 reads_2.fastq -S output.sam
In this command…
--no-unalis an optional argument, meaning reads that do not align to the reference genome will not be written tosamoutput-pis the number (n) of processors/threads used-xis the genome index-1is the file(s) containing mate 1 reads-2is the file(s) containing mate 2 reads-Sis the output alignment insamformat
Demonstration
In this video, bowtie2-build is used to index example_nucleotide_sequence.fasta, and the command bowtie2 is used to align reads to this bowtie2 index.
Further reading
- The
bowtie2manual: http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml