STAR-FUSION configuration

STAR-FUSION

======

Introduction

STAR-Fusion is a component of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT). STAR-Fusion uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads. STAR-Fusion further processes the output generated by the STAR aligner to map junction reads and spanning reads to a reference annotation set article. This program are really fast, however need at least 40G RAM. During this year after pubblish the paper of this tool the author develope other important modules to annotate to inspect and validate fusion. During this tutorial we don’t have time to see this feature.

Installation

Exist many way to install STAR-Fusion. You can use conda recipes or you can install localy.The last version of STAR-FUSION have some interesting feature and imporv the bility to filter the fusion.

 git clone --recursive https://github.com/STAR-Fusion/STAR-Fusion.git

Tools Required:

STAR https://github.com/alexdobin/STAR
possibly some non-standard Perl modules - see below:

   A typical perl module installation may involve:
   perl -MCPAN -e shell
   install DB_File
   install URI::Escape
   install Set::IntervalTree
   install Carp::Assert
   install JSON::XS

Computing / Hardware requirements

If you’re planning to run STAR to align reads to the human genome, then you’ll need ~30G RAM. If you’ve already run STAR and are just planning on running STAR-Fusion given the existing STAR outputs, then modest resources are required and it should run on any commodity hardware.

Data Resources Required:

A reference genome and corresponding protein-coding gene annotation set, including blast-matching gene pairs must be provided to STAR-Fusion. We provide several alternative resources for human fusion transcript detection depending on whether you want to use GRCh37 or GRCh38 reference human genomes and corresponding Gencode annotation sets. Options are available here: https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/, so choose one, and below we refer to it as ‘CTAT_resource_lib.tar.gz’. The gene annotations in each case are restricted to the protein-coding and lincRNA transcripts.

If you’re looking to apply STAR-Fusion using a different target, you’ll need to generate the required resources as described by our FusionFilter resource builder. FusionFilter comes included in the STAR-Fusion software.

Preparing the genome resource lib

If you downloaded the large (30G) ‘plug-n-play’ resource lib, then just untar/gz the archive and use it directly.

Ways to run STAR-FUSION

Running STAR on paired end read fastqs can be performed in a single step. Set the prefix argument --outFileNamePrefix to specify the location of the output.

Given paried-end of FASTQ files, run STAR-Fusion like so:


STAR-Fusion --genome_lib_dir /path/to/your/CTAT_resource_lib \
             --left_fq reads_1.fq \
             --right_fq reads_2.fq \
             --output_dir star_fusion_outdir

If you have single-end FASTQ files, just use the –left_fq parameter:

 STAR-Fusion --genome_lib_dir /path/to/your/CTAT_resource_lib \
             --left_fq reads_1.fq \ 
             --output_dir star_fusion_outdir

Also other programs are able to detect fusion from single-end. One of main problems are the lenght of the reads. It is important to have relatively long read (ex. at least 100 bp) otherwise you will be underpowerred for detecting fusion transcripts

Kickstart mode: run STAR yourself, and the running STAR–Fusion using eisting ouputs

It’s not always the case that you want to have STAR-Fusion run STAR directly, as you may have already run STAR earlier on, or prefer to run STAR separately to use the outputs in other processes such as for expression estimates or variant detection. Parameters that we recommend for running STAR as part of STAR-Fusion are as follows:

STAR --genomeDir ${star_index_dir} \                                             
      --readFilesIn ${left_fq_filename} ${right_fq_filename} \
      --twopassMode Basic \                                   
      --outReadsUnmapped None \                               
      --chimSegmentMin 12 \                                  
      --chimJunctionOverhangMin 12 \                         
      --alignSJDBoverhangMin 10 \                            
      --alignMatesGapMax 100000 \                            
      --alignIntronMax 100000 \                              
      --chimSegmentReadGapMax parameter 3 \                  
      --alignSJstitchMismatchNmax 5 -1 5 5 \
      --runThreadN ${THREAD_COUNT} \                           
      --limitBAMsortRAM 31532137230 \                        
      --outSAMtype BAM SortedByCoordinate

This will (in part) generate a file called ‘Chimeric.out.junction’, which is used by STAR-Fusion like so:

STAR-Fusion --genome_lib_dir /path/to/your/CTAT_resource_lib \
             -J Chimeric.out.junction \
             --output_dir star_fusion_outdir --left_fq ${left_fq_filename} --right_fq ${right_fq_filename}

FusionInpector and Fusionannotate are very importnt for filter the fusion.

Training Course on Fusion transcript detection