Bcftools
Bcftools are a set of utilities for variant calling and manipulating VCFs and BCFs.
Generating genotype likelihoods for alignment files using bcftools mpileup
bcftools mpileup
can be used to generate VCF or BCF files containing genotype likelihoods for one or multiple alignment (BAM or CRAM) files as follows:
$ bcftools mpileup --max-depth 10000 --threads n -f reference.fasta -o genotype_likelihoods.bcf reference_sequence_alignmnet.bam
In this command…
--max-depth
or-d
sets the reads per input file for each position in the alignment. In this case, it is set to 10000--threads
sets the number (n) of processors/threads to use.--fasta-ref
or-f
is used to select the faidx-indexed FASTA nucleotide reference file (reference.fasta) used for the alignment.--output
or-o
is used to name the ouput file (genotype_likelihoods.bcf).- The final argument given is the input BAM alignment file (reference_sequence_alignment.bam). Multiple input files can be given here.
Variant calling using bcftools call
bcftools call
can be used to call SNP/indel variants from a BCF file as follows:
$ bcftools call -O b --threads n -vc --ploidy 1 -p 0.05 -o variants_unfiltered.bcf genotype_likelihoods.bcf
In this command…
--output-type
or-O
is used to select the output format. In this case, b for BCF.--threads
sets the number (n) of processors/threads to use.-vc
specifies that we want the output to contain variants only, using the original SAMtools consensus caller.--ploidy
specifies the ploidy of the assembly.--pval-threshold
or-p
is used to the set the p-value threshold for variant sites (0.05).--output
or-o
is used to name the ouput file (variants_unfiltered.bcf).- The final argument is the input BCF file (genotype_likelihoods.bcf).
Filtering variants using bcftools filter
bcftools filter
can be used to filter variants from a BCF file as follows…
$ bcftools filter --threads n -i '%QUAL>=20' -O v -o variants_filtered.vcf variants_unfiltered.bcf
In this command…
--threads
sets the number (n) of processors/threads to use.--include
or-i
is used to define the expression used to filter sites. In this case,%QUAL>=20
results in sites with a quality score greater than or equal to 20.--output-type
or-O
is used to select the output format. In this case, v for VCF.--output
or-o
is used to name the ouput file (variants_filtered.vcf).- The final argument is the input BCF file (genotype_likelihoods.bcf).