Tutorial star-fusion

STAR-FUSION

Tutorial with Data


======

First of all we need to connect with the virtual machine. To achieve this we need to use terminal (if you use linux) or PUtty or mobaxterm

Connect with the virtual machine:

      **ssh -X nomeutente49@m49.lsb.biocomp.unibo.it**
X11 forwarding request failed on channel 0
Linux m49.lsb.biocomp.unibo.it 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u3 (2017-08-06) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Mon Sep  4 10:10:54 2017 from 172.16.197.254
um49@m49:~$ 


Then we move on the directory of the tutorial

**cd fusion/tutorial-fusion-transcript-2017/**
um49@m49:~/fusion/tutorial-fusion-transcript-2017$

then we aply ls -ltr on the directory:

**(root) um49@m49:~/fusion/tutorial-fusion-transcript-2017$ ls -ltr**
RAW  README.md  Results  scripts


### Introduction run
---


We need to detect fusion trascript of BT474 cells. Here we use only a small part of fastq. 

The virtual machine dimension not allow to run completely this tutorial. For this reason we use the kick start mode. This mean we ALIGN with STAR and create Chimeric.out.junction.
From this file we star tu use STAR-FUSION to filter fusion. 

This code are only for teach purpose. **Do not execute.** Here I present the way to align data and create Chimeric.out.junction and the run STAR-FUSION.




do NOT run

STAR –genomeDir ref_genome.fa.star.idx/ \ –readFilesIn RAW/BT474-demo_1.fastq.gz RAW/BT474-demo_2.fastq.gz –twopassMode Basic –outReadsUnmapped None –chimSegmentMin 12 –chimJunctionOverhangMin 12 –alignSJDBoverhangMin 10 –alignMatesGapMax 100000 –alignIntronMax 100000
–chimSegmentReadGapMax parameter 3 –alignSJstitchMismatchNmax 5 -1 5 5 –runThreadN 3 –limitBAMsortRAM 31532137230 –outSAMtype BAM SortedByCoordinate –readFilesCommand zcat


Now we can run STAR-FUSION 

do NOT run

STAR-Fusion –genome_lib_dir ctat_genome_lib_build_dir -J Chimeric.out.junction –left_fq RAW/BT474-demo_1.fastq.gz –right_fq RAW/BT474-demo_2.fastq.gz –output_dir Fusion_star_kickstart_BT474


NOW it is your turn:

### Hands On (1): Starting from junction file


Unfortunately is it not possible to run all the steps in our virtual machine but we can simulate kickstart analisys.

If you do not activate the enviroment: please activate:

source /opt/conda/bin/activate

Then  creat the lin with the database of the star-fusion:

export STAR_database=/nfs/fusion/GRCh37_gencode_v19_CTAT_lib_July192017/ctat_genome_lib_build_dir

If you want you can investigate the directory

ls -ltr $STAR_database/


Now move on the **Results/STAR-FUSION** directory

$ cd ~/fusion/tutorial-fusion-transcript-2017/Results/STAR-FUSION/

STAR-Fusion --genome_lib_dir $STAR_database  -J Chimeric.out.junction  --output_dir test_kickstart

This procedure create many files.

**[centos@pol-produ test_kickstart]$ ls**
star-fusion.filter.intermediates_dir                   star-fusion.fusion_candidates.preliminary.wSpliceInfo.ok
star-fusion.fusion_candidates.final                    star-fusion.predict.intermediates_dir
star-fusion.fusion_candidates.final.abridged           star-fusion.STAR-Fusion.filter.ok
star-fusion.fusion_candidates.preliminary              star-fusion.STAR-Fusion.predict.ok
star-fusion.fusion_candidates.preliminary.wSpliceInfo

but in your server you can see this error:

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
      LANGUAGE = (unset),
      LC_ALL = (unset),
      LC_MEASUREMENT = "it_IT.UTF-8",
      LC_PAPER = "it_IT.UTF-8",
      LC_MONETARY = "it_IT.UTF-8",
      LC_NAME = "it_IT.UTF-8",
      LC_ADDRESS = "it_IT.UTF-8",
      LC_NUMERIC = "it_IT.UTF-8",
      LC_TELEPHONE = "it_IT.UTF-8",
      LC_IDENTIFICATION = "it_IT.UTF-8",
      LC_TIME = "it_IT.UTF-8",
      LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
CMD: mkdir -p /home/um49/fusion/tutorial-fusion-transcript-2017/Result/star_fusion/TEST_STAR
* Running CMD: /opt/conda/lib/STAR-Fusion/util/STAR-Fusion.predict  -J /home/um49/fusion/tutorial-fusion-transcript-2017/Result/star_fusion/Chimeric.out.junction  --genome_lib_dir /nfs/fusion/GRCh37_gencode_v19_CTAT_lib_July192017/ctat_genome_lib_build_dir  --min_junction_reads 1  --min_sum_frags 2  --min_novel_junction_support 3  -O /home/um49/fusion/tutorial-fusion-transcript-2017/Result/star_fusion/TEST_STAR/star-fusion 
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
      LANGUAGE = (unset),
      LC_ALL = (unset),
      LC_MEASUREMENT = "it_IT.UTF-8",
      LC_PAPER = "it_IT.UTF-8",
      LC_MONETARY = "it_IT.UTF-8",
      LC_NAME = "it_IT.UTF-8",
      LC_ADDRESS = "it_IT.UTF-8",
      LC_NUMERIC = "it_IT.UTF-8",
      LC_TELEPHONE = "it_IT.UTF-8",
      LC_IDENTIFICATION = "it_IT.UTF-8",
      LC_TIME = "it_IT.UTF-8",
      LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
CMD: mkdir -p /home/um49/fusion/tutorial-fusion-transcript-2017/Result/star_fusion/TEST_STAR/star-fusion.predict.intermediates_dir
-parsing GTF file: /nfs/fusion/GRCh37_gencode_v19_CTAT_lib_July192017/ctat_genome_lib_build_dir/ref_annot.gtf
-building interval tree for fast searching of gene overlaps
-parsing fusion evidence: /home/um49/fusion/tutorial-fusion-transcript-2017/Result/star_fusion/Chimeric.out.junction
-mapping reads to genes
-outputting fusion candidates to file: /home/um49/fusion/tutorial-fusion-transcript-2017/Result/star_fusion/TEST_STAR/star-fusion.fusion_candidates.preliminary
* Running CMD: /opt/conda/lib/STAR-Fusion/util/append_breakpoint_junction_info.pl /home/um49/fusion/tutorial-fusion-transcript-2017/Result/star_fusion/TEST_STAR/star-fusion.fusion_candidates.preliminary /nfs/fusion/GRCh37_gencode_v19_CTAT_lib_July192017/ctat_genome_lib_build_dir > /home/um49/fusion/tutorial-fusion-transcript-2017/Result/star_fusion/TEST_STAR/star-fusion.fusion_candidates.preliminary.wSpliceInfo
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
      LANGUAGE = (unset),
      LC_ALL = (unset),
      LC_TIME = "it_IT.UTF-8",
      LC_MONETARY = "it_IT.UTF-8",
      LC_ADDRESS = "it_IT.UTF-8",
      LC_TELEPHONE = "it_IT.UTF-8",
      LC_NAME = "it_IT.UTF-8",
      LC_MEASUREMENT = "it_IT.UTF-8",
      LC_IDENTIFICATION = "it_IT.UTF-8",
      LC_NUMERIC = "it_IT.UTF-8",
      LC_PAPER = "it_IT.UTF-8",
      LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
[fai_load] build FASTA index.
[fai_build] fail to write FASTA index /nfs/fusion/GRCh37_gencode_v19_CTAT_lib_July192017/ctat_genome_lib_build_dir/ref_genome.fa.fai
[fai_load] fail to open FASTA index.
Could not load fai index of /nfs/fusion/GRCh37_gencode_v19_CTAT_lib_July192017/ctat_genome_lib_build_dir/ref_genome.fa
Error, command: samtools faidx /nfs/fusion/GRCh37_gencode_v19_CTAT_lib_July192017/ctat_genome_lib_build_dir/ref_genome.fa chr17:38243092-38243108 died with ret 65280 at /opt/conda/lib/STAR-Fusion/util/append_breakpoint_junction_info.pl line 148, <$fh> line 2.
Error, cmd: /opt/conda/lib/STAR-Fusion/util/append_breakpoint_junction_info.pl /home/um49/fusion/tutorial-fusion-transcript-2017/Result/star_fusion/TEST_STAR/star-fusion.fusion_candidates.preliminary /nfs/fusion/GRCh37_gencode_v19_CTAT_lib_July192017/ctat_genome_lib_build_dir > /home/um49/fusion/tutorial-fusion-transcript-2017/Result/star_fusion/TEST_STAR/star-fusion.fusion_candidates.preliminary.wSpliceInfo died with ret 65280 at /opt/conda/lib/STAR-Fusion/PerlLib/Pipeliner.pm line 79.
      Pipeliner::run(Pipeliner=HASH(0x14ed418)) called at /opt/conda/lib/STAR-Fusion/STAR-Fusion line 289

On the last line you see there is a problem. This error nrmaly is a fucntion of RAM and the fact STAR-FUSION use two step mode for alignment. However explore the result of pre-run sample. Please open results.final Move on test_kickstart/:

cd /home/um49/fusion/tutorial-fusion-transcript-2017/Results/STAR-FUSION/test_kickstart


Explote the directory with ls:

**(root) um49@m49:~/fusion/tutorial-fusion-transcript-2017/Results/STAR-FUSION/test_kickstart$ ls**
star-fusion.STAR-Fusion.predict.ok
star-fusion.fusion_candidates.final
star-fusion.fusion_candidates.preliminary
star-fusion.fusion_candidates.preliminary.wSpliceInfo
star-fusion.predict.intermediates_dir

On star-fusion.fusion_candidates.final are reported all the fusion detect by STAR-FUSION: Look the first 10 reads:

head star-fusion.fusion_candidates.final
#FusionName JunctionReadCount SpanningFragCount SpliceType  LeftGene    LeftBreakpoint    Rig
htGene      RightBreakpoint   JunctionReads     SpanningFrags     LargeAnchorSupport      LeftBreakDinuc    LeftBreakEntropy  RightBreakDinuc   RightBreakEntropy
THRA--AC090627.1  28    93    ONLY_REF_SPLICE   THRA^ENSG00000126351.8  chr17:38243106:+  AC090627.1^
ENSG00000235300.3 chr17:46371709:+  HWI-EAS418:3:95:1749:754,HWI-EAS418:3:54:203:1023,HWI-EAS418:5:35:1
55:1130,HWI-EAS418:3:53:260:35,HWI-EAS418:5:16:64:1068,HWI-EAS418:3:82:1618:163,HWI-EAS418:5:99:101:324,HWI-EAS418:
3:11:325:1032,HWI-EAS418:3:37:126:297,HWI-EAS418:6:93:26:596,HWI-EAS418:3:53:388:255,HWI-EAS418:5:32:704:1206,HWI-E
AS418:6:85:691:681,HWI-EAS418:5:20:1228:1132,HWI-EAS418:3:47:1650:1419,HWI-EAS418:5:14:499:1741,HWI-EAS418:6:3:707:
593,HWI-EAS418:5:45:1561:1639,HWI-EAS418:5:64:1516:1756,HWI-EAS418:5:4:875:1514,HWI-EAS418:5:89:1452:1503,HWI-EAS41
8:6:39:877:1543,HWI-EAS418:3:31:1762:753,HWI-EAS418:5:68:968:1480,HWI-EAS418:3:94:453:1917,HWI-EAS418:5:5:1001:1845
,HWI-EAS418:5:9:173:788,HWI-EAS418:3:57:171:49  HWI-EAS418:3:65:225:1696,HWI-EAS418:5:28:1103:1102,HWI-EAS418:6:24:
1553:1579,HWI-EAS418:3:37:1174:256,HWI-EAS418:3:28:1180:1048,HWI-EAS418:3:41:709:320,HWI-EAS418:3:58:466:261,HWI-EA
S418:3:7:1182:835,HWI-EAS418:3:83:1650:1325,HWI-EAS418:3:46:1299:691,HWI-EAS418:3:84:794:1201,HWI-EAS418:3:94:375:4
67,HWI-EAS418:3:33:1322:82,HWI-EAS418:3:35:702:891,HWI-EAS418:3:63:773:641,HWI-EAS418:3:61:1319:752,HWI-EAS418:3:88
:1179:1598,HWI-EAS418:5:56:1560:1267,HWI-EAS418:6:10:354:1000,HWI-EAS418:6:87:1171:1515,HWI-EAS418:5:41:160:762,HWI
-EAS418:3:39:1768:1665,HWI-EAS418:3:54:1750:468,HWI-EAS418:3:99:1126:249,HWI-EAS418:3:99:499:40,HWI-EAS418:6:12:269
:87,HWI-EAS418:5:10:233:504,HWI-EAS418:3:56:1742:1973,HWI-EAS418:3:56:1058:1165,HWI-EAS418:6:21:1096:711,HWI-EAS418
:3:84:708:580,HWI-EAS418:6:97:184:1273,HWI-EAS418:3:14:1262:1127,HWI-EAS418:5:40:460:482,HWI-EAS418:3:28:923:1862,H
WI-EAS418:3:87:1105:1160,HWI-EAS418:3:57:1619:1427,HWI-EAS418:3:21:1078:1775,HWI-EAS418:5:87:1225:275,HWI-EAS418:3:
52:1017:1325,HWI-EAS418:6:88:553:1917,HWI-EAS418:3:25:1270:1038,HWI-EAS418:6:58:353:619,HWI-EAS418:3:69:612:734,HWI
--More--(8%)


How many fusion STAR-FUSION detect? Use awk for print the first column of the file:

**awk  '{print  $1}' *.final**
#FusionName
THRA--AC090627.1
THRA--AC090627.1
ACACA--STAC2
RPS6KB1--SNF8
VAPB--IKZF3
TOB1--SYNRG
VAPB--IKZF3
VAPB--IKZF3
ZMYND8--CEP250
AHCTF1--NAAA
STX16--RAE1
AHCTF1--NAAA
STX16-NPEPL1--RAE1
RAB22A--MYO9B
MED1--ACSF2
MED13--BCAS3
MED1--STXBP4
STARD3--DOK5
MED13--BCAS3
DIDO1--TTI1
DIDO1--TTI1
TRIM37--MYO19
BRD4--RFX1
TRPC4AP--MRPL45
TRPC4AP--MRPL45
GLB1--CMTM7
PIP4K2B--RAD51C
ASTN2--RP11-281A20.1


Hands On: Complete process on Cloud

Detect fusion on simulated rnaseq data. This step require at least 40G of RAM.This procedure was conducted on our server. Here I wrote the code only for teaching purpose. If we have time I’ll show you how works.

#Not run
STAR-Fusion --genome_lib_dir reference/GRCh37_gencode_v19_CTAT_lib_July192017/
--left_fq RAW/reads_50x_R1.fq.gz
--right_fq RAW/reads_50x_R2.fq.gz
--output_dir analysis/star-fusion/Sim_50x/

This step we perform on our server. This is the results of the real run:

Please check the result:

Aligned.sortedByCoord.out.bam         star-fusion.fusion_candidates.final
Chimeric.out.bam                      star-fusion.fusion_candidates.final.abridged
Chimeric.out.junction                 star-fusion.fusion_candidates.final.abridged.FFPM
Chimeric.out.sam                      star-fusion.fusion_candidates.final.abridged.FFPM.ok
star-fusion.fusion_candidates.preliminary.wSpliceInfo.ok
igv.log                               star-fusion.predict.intermediates_dir
Log.final.out                         star-fusion.STAR-Fusion.filter.ok
Log.out                               star-fusion.STAR-Fusion.predict.ok
Log.progress.out                      _STARgenome
SJ.out.tab                            star.ok
star-fusion.filter.intermediates_dir  _STARpass1

Here you have only a small part of the ouput. This are present in your directory of the ouptut of STAR-FUSION results. Move on this directory:

cd ~/fusion/tutorial-fusion-transcript-2017/Results/STAR-FUSION/Sim_50x/

Then look the directory using ls:

**ls -ltr**
total 1748
-rw-r--r-- 1 um49 student    6854 Sep  1 08:48 star-fusion.fusion_candidates.final.abridged.FFPM
-rw-r--r-- 1 um49 student  714032 Sep  1 08:48 star-fusion.fusion_candidates.final
-rw-r--r-- 1 um49 student       0 Sep  1 08:48 star.ok
-rw-r--r-- 1 um49 student 1062022 Sep  1 08:48 star-fusion.fusion_candidates.preliminary

Please open star-fusion.fusion_candidates.final and with the help of manual try to understand the meaning of the columns. manual

Please open the fusion file and extract the fusion genes names.

(fusion) [centos@pol-produ Sim_50x]$ awk  '{print  $1}' *.final
#FusionName
THOC5--TNFRSF25
SUPT4H1--RAC3
ZNF691--C5orf45
DRD2--OR4D11
FUS--DDIT3
YARS2--UBASH3B
DCTN3--KIDINS220
COCH--IDE
SUPV3L1--PTPRN2
PAX3--FOXO1
FAT1--IL18BP
ETV6--NTRK3
EFCAB7--TRAF3IP3
DBF4B--ESYT3
EIF4G2--PRSS35
ZNF311--ABCC1
RORA--MTERFD3
EFCAB11--HNRNPF
KIAA1549--BRAF
EMID1--EEF1D
SLC6A19--DEAF1
PSAT1--RBBP4
BCLAF1--ABCA10
ST3GAL5--NCKAP1L
SLMAP--RFPL3
ASB4--HMOX1
ALDH7A1--TBC1D26
ALDH7A1--ZNF286A
ZNF628--ZC3H8
MT-CO3--MT-ATP6
ALDH7A1--ZNF286A
MT-CO3--MT-ATP6
MT-ND4--MT-ND4L
MT-ND1--MT-RNR2
MT-ND1--MT-TL1
MTND4P12--MT-ND1
RP11-444D3.1--SOX5
RP11-96H19.1--RP11-446N19.1
CRIM1--FEZ2
NCOR2--UBC


Hands On (3)

Please extract the reads of this fusion PAX3–FOXO1 coordinate.

How to see the junction files of the fusion??? Here I insert all the code we nee to run for create the bam file of the chimeric reads. DO NOT RUN this.

$ samtools view -bS Chimeric.out.sam > Chimeric.out.bam 
$ samtools sort Chimeric.out.bam Chimeric_sort
$ samtools index Chimeric_sort.bam
$ igvtools count Chimeric_sort.bam Chimeric_sort.bam.tdf ref_genome.fa

To view the alignments supporting the fusion in IGV, click File->Open URL and copy/paste the following URL into the text box for PAX3

http://www.scienziofilia.altervista.org/dati/Chimeric_sort.bam

if you don’t have IGV please install in your comuter link Now you can search some spanning reads or encompassing reads.