SAMtools
SAMtools is a set of utilities that can manipulate alignment formats. It imports from and exports to the SAM, BAM & CRAM; does sorting, merging & indexing; and allows reads in any region to be retrieved swiftly.
Converting a sam
alignment file to a sorted, indexed bam
file using samtools
Sequence Alignment Map (SAM/.sam
) is a text-based file is a text-based file format for sequence alignments. It’s binary equivalent is Binary Alignment Map (BAM/.bam
), which stores the same data as a compressed binary file. A binary file for a sequence alignment is preferable over a text file, as binary files are faster to work with. A SAM alignment file (example_alignment.sam
) can be converted to a BAM alignment using samtools view
.
$ samtools view -@ n -Sb -o example_alignment.bam example_alignment.sam
In this command…
-@
sets the number (n
) of threads/CPUs to be used. This flag is optional and can be used with othersamtools
commands.-Sb
specifies that the input is in SAM format (S
) and the output will be be BAM format(b
).-o
sets the name of the output file (example_alignment.bam
).example_alignment.sam
is the name of the input file.
Now that the example alignment is in BAM format, we can sort it using samtools sort
. Sorting this alignment will allow us to create a index.
$ samtools sort -O bam -o sorted_example_alignment.bam example_alignment.bam
In this command…
-O
specifies the output format (bam
,sam
, orcram
).-o
sets the name of the output file (sorted_example_alignment.bam
).example_alignment.bam
is the name of the input file.
This sorted BAM alignment file can now be indexed using samtools index
. Indexing speeds allows fast random access to this alignment, allowing the information in the alignment file to be processed faster.
$ samtools index sorted_example_alignment.bam
In this command…
sorted_example_alignment.bam
is the name of the input file.
Demonstration 1
In this video, samtools
is used to convert example_alignment.sam
into a BAM file, sort that BAM file, and index it.
Simulating short reads using wgsim
wgsim
is a SAMtools program that can simulate short sequencing reads from a reference genome. This is useful for creating FASTQ files to practice with.
$ wgsim example_nucleotide_sequence.fasta example_reads_1.fastq example_reads_2.fastq
In this command…
example_nucleotide_sequence.fasta
is the reference genome input.example_reads_1.fastq
andexample_reads_2.fastq
are the names of the simulated read output files.
Demonstration 2
In this video, wgsim
is used to simulate reads from example_nucleotide_sequence.fasta
.
Indexing a FASTA file using samtools faidx
SAMtools can be used to index a FASTA file as follows…
$ samtools faidx file.fasta
After running this command, file.fasta
can now be used by bcftools.
See also
- Alignment formats
- The
samtools
manual: https://www.htslib.org/doc/samtools.html