Link Search Menu Expand Document

Bioinformatics Notebook

by Ronan Harrington

Build Status License: MIT GitHub issues GitHub repo size Website DOI

This project provides introductions to various bioinformatics tools with short guides, video demonstrations, and scripts that tie these tools together. The documents in this project can be read locally in a plain-text editor, or viewed online at https://rnnh.github.io/bioinfo-notebook/. If you are not familiar with using programs from the command line, begin with the page “Introduction to the command line”. If you have any questions, or spot any mistakes, please submit an issue on GitHub.

Pipeline examples

These bioinformatics pipelines can be carried out using scripts and tools described in this project. Input files for some of these scripts can be specified in the command line; other scripts will need to be altered to fit the given input data.

SNP analysis

  • FASTQ reads from whole genome sequencing (WGS) can be assembled using SPAdes.
  • Sequencing reads can be aligned to this assembled genome using bowtie2.
  • The script snp_calling.sh aligns sequencing reads to an assembled genome and detects single nucleotide polymorphisms (SNPs). This will produce a Variant Call Format (VCF) file.
  • The proteins in the assembled reference genome- the genome to which the reads are aligned- can be annotated using genome_annotation_SwissProt_CDS.sh.
  • The genome annotation GFF file can be cross-referenced with the VCF file using annotating_snps.R. This will produce an annotated SNP format file.
  • Annotated SNP format files can be cross-referenced using annotated_snps_filter.R. For two annotated SNP files, this script will produce a file with annotated SNPs unique to the first file, and a file with annotated SNPs unique to the second file.

RNA-seq analysis

Detecting orthologs between genomes

Contents

1. General guides

2. Program guides

3. Scripts

Installation instructions

After following these instructions, there will be a copy of the bioinfo-notebook GitHub repo on your system in the ~/bioinfo-notebook/ directory. This means there will be a copy of all the documents and scripts in this project on your computer. If you are using Linux and run the Linux setup script, the bioinfo-notebook virtual environment- which includes the majority of the command line programs covered in this project- will also be installed using conda.

1. This project is written to be used through a UNIX (Linux or Mac with macOS Mojave or later) operating system. If you are using a Windows operating system, begin with these pages on setting up Ubuntu (a Linux operating system):

Once you have an Ubuntu system set up, run the following command to update the lists of available software:

$ sudo apt-get update # Updates lists of software that can be installed

2. Run the following command in your home directory (~) to download this project:

$ git clone https://github.com/rnnh/bioinfo-notebook.git

3. If you are using Linux, run the Linux setup script with this command after downloading the project:

$ bash ~/bioinfo-notebook/scripts/linux_setup.sh

Video demonstration of installation

asciicast

Repository structure

bioinfo-notebook/
├── assets/
│   └── bioinfo-notebook_logo.svg
├── data/
│   ├── blastx_SwissProt_example_nucleotide_sequence.fasta.tsv
│   ├── blastx_SwissProt_S_cere.tsv
│   ├── design_table.csv
│   ├── example_genome_annotation.gtf
│   ├── example_nucleotide_sequence.fasta
│   └── featCounts_S_cere_20200331.csv
├── docs/
│   ├── annotated_snps_filter.md
│   ├── annotating_snps.md
│   ├── augustus.md
│   ├── blast.md
│   ├── bowtie2.md
│   ├── bowtie.md
│   ├── cl_intro.md
│   ├── cl_solutions.md
│   ├── combining_featCount_tables.md
│   ├── conda.md
│   ├── DE_analysis_edgeR_script.md
│   ├── DE_analysis_edgeR_script.pdf
│   ├── fasterq-dump.md
│   ├── fastq-dump.md
│   ├── fastq-dump_to_featureCounts.md
│   ├── featureCounts.md
│   ├── file_formats.md
│   ├── genome_annotation_SwissProt_CDS.md
│   ├── htseq-count.md
│   ├── linux_setup.md
│   ├── orthofinder.md
│   ├── part1.md    # Navigation page for website
│   ├── part2.md    # Navigation page for website
│   ├── part3.md    # Navigation page for website
│   ├── report_an_issue.md
│   ├── samtools.md
│   ├── sgRNAcas9.md
│   ├── snp_calling.md
│   ├── SPAdes.md
│   ├── ubuntu_virtualbox.md
│   ├── UniProt_downloader.md
│   └── wsl.md
├── envs/            # conda environment files
│   ├── augustus.yml            # environment for Augustus
│   ├── bioinfo-notebook.txt
│   ├── bioinfo-notebook.yml
│   ├── orthofinder.yml         # environment for OrthoFinder
│   └── sgRNAcas9.yml           # environment for sgRNAcas9
├── scripts/
│   ├── annotated_snps_filter.R
│   ├── annotating_snps.R
│   ├── combining_featCount_tables.py
│   ├── DE_analysis_edgeR_script.R
│   ├── fastq-dump_to_featureCounts.sh
│   ├── genome_annotation_SwissProt_CDS.sh
│   ├── linux_setup.sh
│   ├── snp_calling.sh
│   └── UniProt_downloader.sh
├── _config.yml     # Configures github.io project website
├── .gitignore
├── LICENSE
├── README.md
└── .travis.yml     # Configures Travis CI testing for GitHub repo