Fastq-dump
fastq-dump
is a tool for downloading sequencing reads from NCBIās Sequence Read Archive (SRA). These sequence reads will be downloaded as FASTQ files. How these FASTQ files are formatted depends on the fastq-dump
options used.
Downloading reads from the SRA using fastq-dump
In this example, we want to download FASTQ reads for a mate-pair library.
$ fastq-dump --gzip --skip-technical --readids --read-filter pass --dumpbase --split-3 --clip --outdir path/to/reads/ SRR_ID
In this commandā¦
--gzip
: Compress output using gzip. Gzip archived reads can be read directly by bowtie2.--skip-technical
: Dump only biological reads, skip the technical reads.--readids
or-I
: Append read ID after spot ID as āaccession.spot.readidā. With this flag, one sequence gets appended the ID.1
and the other.2
. Without this option, pair-ended reads will have identical IDs.--read-filter pass
: Only returns reads that pass filtering (withoutN
s).--dumpbase
or-B
: Formats sequence using base space (default for other than SOLiD). Included to avoid colourspace (in which pairs of bases are represented by numbers).--split-3
separates the reads into left and right ends. If there is a left end without a matching right end, or a right end without a matching left end, they will be put in a single file.--clip
or-W
: Some of the sequences in the SRA contain tags that need to be removed. This will remove those sequences.--outdir
or-O
: (Optional) Output directory, default is current working directory.SRR_ID
: This is is the ID of the run from SRA to be downloaded. This ID begins with āSRRā and is followed by around seven digits (e.g.SRA1234567
).
Other options that can be used instead of --split-3
:
--split-files
splits the FASTQ reads into two files: one file for mate 1s (...1
), and another for mate 2s (..._2
). This option will not mateless pairs into a third file.--split-spot
splits the FASTQ reads into two (mate 1s and mate 2s) within one file.--split-spot
gives you an 8-line fastq format where forward precedes reverse (see https://www.biostars.org/p/178586/#258378).
Demonstration
In this demo, fastq-dump
is used to download compressed FASTQ reads.
Further reading
- Rob Edwardās notes on
fastq-dump
: https://edwards.sdsu.edu/research/fastq-dump/