Given a file with aligned sequencing reads and a list of genomic features,
htseq-count can be used to count how many reads map to each feature.
htseq-count can be used to align reads to a genome annotation as follows:
$ htseq-count --format bam sorted_alignment_file.bam genome_annotation > output_file.txt
In this command…
-fis the format of the input data. Possible values are
sam(for text SAM files) and
bam(for binary BAM files). Default is
bamfile is used in this example.
--orderspecifies whether the alignments have been sorted by name (
name) or coordinates/position (
bamformat alignment file, sorted by name.
genome_annotationis the genome annotation file the reads in the
alignment_fileare aligned to (
> output_file.txtredirects the output (
In this video,
htseq-counts is used to count how many reads in an alignment file (
sorted_example_alignment.bam) match the genes in a genome annotation (
The program outputs a table with counts for each feature, followed by the special counters, which count reads that were not counted for any feature for various reasons. The names of the special counters all start with a double underscore, to facilitate filtering (Note: The double underscore was absent up to version 0.5.4). The special counters are:
__no_feature: reads (or read pairs) which could not be assigned to any feature (set S as described above was empty).
__ambiguous: reads (or read pairs) which could have been assigned to more than one feature and hence were not counted for any of these, unless the –nonunique all option was used (set S had more than one element).
__too_low_aQual: reads (or read pairs) which were skipped due to the optional minimal alignment quality flag.
__not_aligned: reads (or read pairs) in the SAM/BAM file without an alignment.
__alignment_not_unique: reads (or read pairs) with more than one reported alignment.