SAMtools
SAMtools is a set of utilities that can manipulate alignment formats. It imports from and exports to the SAM, BAM & CRAM; does sorting, merging & indexing; and allows reads in any region to be retrieved swiftly.
Converting a sam alignment file to a sorted, indexed bam file using samtools
Sequence Alignment Map (SAM/.sam) is a text-based file is a text-based file format for sequence alignments. It’s binary equivalent is Binary Alignment Map (BAM/.bam), which stores the same data as a compressed binary file. A binary file for a sequence alignment is preferable over a text file, as binary files are faster to work with. A SAM alignment file (example_alignment.sam) can be converted to a BAM alignment using samtools view.
$ samtools view -@ n -Sb -o example_alignment.bam example_alignment.sam
In this command…
-@sets the number (n) of threads/CPUs to be used. This flag is optional and can be used with othersamtoolscommands.-Sbspecifies that the input is in SAM format (S) and the output will be be BAM format(b).-osets the name of the output file (example_alignment.bam).example_alignment.samis the name of the input file.
Now that the example alignment is in BAM format, we can sort it using samtools sort. Sorting this alignment will allow us to create a index.
$ samtools sort -O bam -o sorted_example_alignment.bam example_alignment.bam
In this command…
-Ospecifies the output format (bam,sam, orcram).-osets the name of the output file (sorted_example_alignment.bam).example_alignment.bamis the name of the input file.
This sorted BAM alignment file can now be indexed using samtools index. Indexing speeds allows fast random access to this alignment, allowing the information in the alignment file to be processed faster.
$ samtools index sorted_example_alignment.bam
In this command…
sorted_example_alignment.bamis the name of the input file.
Demonstration 1
In this video, samtools is used to convert example_alignment.sam into a BAM file, sort that BAM file, and index it.
Simulating short reads using wgsim
wgsim is a SAMtools program that can simulate short sequencing reads from a reference genome. This is useful for creating FASTQ files to practice with.
$ wgsim example_nucleotide_sequence.fasta example_reads_1.fastq example_reads_2.fastq
In this command…
example_nucleotide_sequence.fastais the reference genome input.example_reads_1.fastqandexample_reads_2.fastqare the names of the simulated read output files.
Demonstration 2
In this video, wgsim is used to simulate reads from example_nucleotide_sequence.fasta.
Indexing a FASTA file using samtools faidx
SAMtools can be used to index a FASTA file as follows…
$ samtools faidx file.fasta
After running this command, file.fasta can now be used by bcftools.
See also
- Alignment formats
- The
samtoolsmanual: https://www.htslib.org/doc/samtools.html