Bcftools
Bcftools are a set of utilities for variant calling and manipulating VCFs and BCFs.
Generating genotype likelihoods for alignment files using bcftools mpileup
bcftools mpileup can be used to generate VCF or BCF files containing genotype likelihoods for one or multiple alignment (BAM or CRAM) files as follows:
$ bcftools mpileup --max-depth 10000 --threads n -f reference.fasta -o genotype_likelihoods.bcf reference_sequence_alignmnet.bam
In this command…
--max-depthor-dsets the reads per input file for each position in the alignment. In this case, it is set to 10000--threadssets the number (n) of processors/threads to use.--fasta-refor-fis used to select the faidx-indexed FASTA nucleotide reference file (reference.fasta) used for the alignment.--outputor-ois used to name the ouput file (genotype_likelihoods.bcf).- The final argument given is the input BAM alignment file (reference_sequence_alignment.bam). Multiple input files can be given here.
Variant calling using bcftools call
bcftools call can be used to call SNP/indel variants from a BCF file as follows:
$ bcftools call -O b --threads n -vc --ploidy 1 -p 0.05 -o variants_unfiltered.bcf genotype_likelihoods.bcf
In this command…
--output-typeor-Ois used to select the output format. In this case, b for BCF.--threadssets the number (n) of processors/threads to use.-vcspecifies that we want the output to contain variants only, using the original SAMtools consensus caller.--ploidyspecifies the ploidy of the assembly.--pval-thresholdor-pis used to the set the p-value threshold for variant sites (0.05).--outputor-ois used to name the ouput file (variants_unfiltered.bcf).- The final argument is the input BCF file (genotype_likelihoods.bcf).
Filtering variants using bcftools filter
bcftools filter can be used to filter variants from a BCF file as follows…
$ bcftools filter --threads n -i '%QUAL>=20' -O v -o variants_filtered.vcf variants_unfiltered.bcf
In this command…
--threadssets the number (n) of processors/threads to use.--includeor-iis used to define the expression used to filter sites. In this case,%QUAL>=20results in sites with a quality score greater than or equal to 20.--output-typeor-Ois used to select the output format. In this case, v for VCF.--outputor-ois used to name the ouput file (variants_filtered.vcf).- The final argument is the input BCF file (genotype_likelihoods.bcf).