Htseq-count
Given a file with aligned sequencing reads and a list of genomic features, htseq-count
can be used to count how many reads map to each feature.
Aligining reads to a genome annotation using htseq-count
htseq-count
can be used to align reads to a genome annotation as follows:
$ htseq-count --format bam sorted_alignment_file.bam genome_annotation > output_file.txt
In this command…
--format
or-f
is the format of the input data. Possible values aresam
(for text SAM files) andbam
(for binary BAM files). Default issam
. Abam
file is used in this example.--order
specifies whether the alignments have been sorted by name (name
) or coordinates/position (pos
).sorted_alignment_file.bam
is abam
format alignment file, sorted by name.genome_annotation
is the genome annotation file the reads in thealignment_file
are aligned to (.gtf
or.gff
).> output_file.txt
redirects the output (STDOUT
) tooutput_file.txt
.
Demonstration
In this video, htseq-counts
is used to count how many reads in an alignment file (sorted_example_alignment.bam
) match the genes in a genome annotation (example_genome_annotation.gtf
).
The htseq-count
output file
The program outputs a table with counts for each feature, followed by the special counters, which count reads that were not counted for any feature for various reasons. The names of the special counters all start with a double underscore, to facilitate filtering (Note: The double underscore was absent up to version 0.5.4). The special counters are:
__no_feature
: reads (or read pairs) which could not be assigned to any feature (set S as described above was empty).__ambiguous
: reads (or read pairs) which could have been assigned to more than one feature and hence were not counted for any of these, unless the –nonunique all option was used (set S had more than one element).__too_low_aQual
: reads (or read pairs) which were skipped due to the optional minimal alignment quality flag.__not_aligned
: reads (or read pairs) in the SAM/BAM file without an alignment.__alignment_not_unique
: reads (or read pairs) with more than one reported alignment.
Further reading
- The
htseq-count
manual: https://htseq.readthedocs.io/en/release_0.11.1/count.html