author: Maurizio Polano date: autosize: true
========================================================
Locating where each generated sequence came from in the genome
Popular aligners:
Demo to follow after this talk
======================================================== Post-processing of aligned files
========================================================
Represent useful QC information:
========================================================
#bash
pwd
cd
cd ~/Desktop/
mkdir -p USERNAME_11feb2019/
cd USERNAME_11feb2019/
git clone https://github.com/bioinfo-dirty-jobs/quality.git
ls -ltr
========================================================
source activate NGSBASE
cd quality
mkdir test
cp trimming/qa/*.fastq test/
ls test/
mkdir -p quality_pre
fastqc -o quality_pre/ test/*.fastq
firefox quality_pre/*.html
========================================================
So I want to delete the first adapter GATCGGAAGAGCACACGTCTGAACTCCAGTCACACA
conda install cutadapt -y
cutadapt -m 20 -e 0.1 -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACACA trimming/qa/sample2.fastq -o sample2--cutadapt.fastq
The following options were used:
========================================================
Total reads processed: 100,000 Reads with adapters: 32,792 (32.8%) Reads that were too short: 31,113 (31.1%) Reads written (passing filters): 68,887 (68.9%)
Total basepairs processed: 3,600,000 bp Total written (filtered): 2,473,750 bp (68.7%)
=== Adapter 1 ===
Sequence: GATCGGAAGAGCACACGTCTGAACTCCAGTCACACA; Type: regular 3’; Length: 36; Trimmed: 32792 times.
No. of allowed errors: 0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-36 bp: 3
Bases preceding removed adapters: A: 1.2% C: 1.2% G: 1.0% T: 2.0% none/other: 94.6%
========================================================
fastqc sample2--cutadapt.fastq
========================================================
curl -O -J -L https://osf.io/shqpv/download
curl -O -J -L https://osf.io/9m3ch/download
curl -O -J -L https://osf.io/v24pt/download
======================================================== Now we’ll do some trimming!
Scythe uses a Naive Bayesian approach to classify contaminant substrings in sequence reads. It considers quality information, which can make it robust in picking out 3’-end adapters, which often include poor quality bases.
The first thing we need is the adapters to trim off
conda install -c faircloth-lab scythe -y
scythe -a adapters.fasta -o SRR957824_adapt_R1.fastq SRR957824_500K_R1.fastq.gz
scythe -a adapters.fasta -o SRR957824_adapt_R2.fastq SRR957824_500K_R2.fastq.gz
========================================================
Most modern sequencing technologies produce reads that have deteriorating quality towards the 3’-end and some towards the 5’-end as well. Incorrectly called bases in both regions negatively impact assembles, mapping, and downstream bioinformatics analyses. We will trim each read individually down to the good quality part to keep the bad part from interfering with downstream applications. To do so, we will use sickle. Sickle is a tool that uses sliding windows along with quality and length thresholds to determine when quality is sufficiently low to trim the 3’-end of reads and also determines when the quality is sufficiently high enough to trim the 5’-end of reads. It will also discard reads based upon a length threshold.
========================================================
conda install sickle-trim -y
sickle pe -f SRR957824_adapt_R1.fastq -r SRR957824_adapt_R2.fastq \
-t sanger -o SRR957824_trimmed_R1.fastq -p SRR957824_trimmed_R2.fastq \
-s /dev/null -q 25
fastqc SRR957824_trimmed_R1.fastq SRR957824_trimmed_R2.fastq
========================================================