Bowtie
bowtie
can be used to:
- index reference FASTA nucleotide genomes/sequences
- align FASTQ sequencing reads to those genomes/sequences
If you want to align short reads (50bp or less), bowtie is more suitable than bowtie2.
Indexing a reference genome/sequence using bowtie-build
Before aligning reads to a reference genome with bowtie
, it must be indexed using bowtie-bui ld
. This command will create six files with the extensions .1.ebwt
, .2.ebwt
, .3.ebwt
, .4.ebwt
, .rev.1.ebwt
, and .rev.2.ebwt
. These six files together are the index. Once an index has been created, the original reference genome/sequence is no longer needed to align reads. Here’s an example bowtie2-build
command:
$ bowtie-build reference_sequence.fasta index_name
In this command, the reference_sequence.FASTA
is the nucleotide FASTA sequence we want to index, and index_name
is the name of the index. There will be six files beginning with the index_name
in the output directory: index_name.1.ebwt
, index_name.2.ebwt
, index_name.3.ebwt
, index_name.4.ebwt
, index_name.rev.1.ebwt
, and index_name.rev.2.ebwt
. There’s no need to specify any of these files individually in subsequent bowtie
commands, the index_name
alone is enough to refer to the entire index.
Aligning reads to an indexed genome/sequence using bowtie
Now that the genome has been indexed, FASTQ sequencing reads can be aligned to it. This is done using the bowtie
command. Here is an example bowtie2
command:
$ bowtie --no-unal --threads n --sam index_name -1 reads_1.fastq -2 reads_2.fastq output.sam
In this command…
--no-unal
is an optional argument, meaning reads that do not align to the reference genome will not be written tosam
output--threads
is the number (n) of processors/threads used--sam
specifies that the output should be written in the SAM formatindex_name
is the name of the genome index-1
is the file(s) containing mate 1 reads (reads_1.fastq
)-2
is the file(s) containing mate 2 reads (reads_2.fastq
)output.sam
is the output alignment insam
format
Demonstration
In this video, bowtie-build
is used to index S_cere_GCF_000146045.2_R64_genomic.fna
, which is a copy of the Saccharomyces cerevisiae S288C genome from RefSeq. The bowtie
command is then used to align Saccharomyces cerevisiae RNAseq reads to this bowtie index.
Further reading
- The
bowtie
manual: http://bowtie-bio.sourceforge.net/manual.shtml