Introduction

Linux is the best-known and most-used open source operating system. As an operating system, Linux is software that sits underneath all of the other software on a computer, receiving requests from those programs and relaying these requests to the computer’s hardware.

GNU/Linux is a free and open-source operating system developed by thousands of contributors and led by Linus Torvalds since the beginning in 1991 Linux shells (commonly Bash)
allow users to execute more than 200 commands and to write pipelines in the Shell Script programming language to automatize tasks
Linux is widely used in research and super computers, more than 96% of super computers use Linux:
http://www.top500.org/statistics/list It’s an essential tool for bioinformatics and big data analysis and research

How does Linux differ from other operating systems?

In many ways, Linux is similar to other operating systems you may have used before, such as Windows, OS X, or iOS. Like other operating systems, Linux has a graphical interface, and types of software you are accustomed to using on other operating systems, such as word processing applications, have Linux equivalents. In many cases, the software’s creator may have made a Linux version of the same program you use on other systems. If you can use a computer or other electronic device, you can use Linux.

But Linux also is different from other operating systems in many important ways. First, and perhaps most importantly, Linux is open source software. The code used to create Linux is free and available to the public to view, edit, and—for users with the appropriate skills—to contribute to.

Linux is also different in that, although the core pieces of the Linux operating system are generally common, there are many distributions of Linux, which include different software options. This means that Linux is incredibly customizable, because not just applications, such as word processors and web browsers, can be swapped out. Linux users also can choose core components, such as which system displays graphics, and other user-interface components

Prepare and Store the data

We often forget how science and engineering function. Ideas come from previous exploration more often than from lightning strokes. —John W. Tukey

Just as a well-organized laboratory makes a scientist’s life easier, a well-organized andwell-documented project makes a bioinformatician’s life easier. Regardless of the particular project you’re working on, your project directory should be laid out in a consistent and understandable fashion. Clear project organization makes it easier for both you and collaborators to figure out exactly where and what everything is. Additionally, it’s much easier to automate tasks when files are organized and clearly named. For example, processing 300 gene sequences stored in separate FASTA files with a script is trivial if these files are organized in a single directory and are consistently named.

                           [explain my command ](https://explainshell.com/)

All files and directories used in your project should live in a single project directory with a clear name. During the course of a project, you’ll have amassed data files, notes, scripts, and so on if these were scattered all over your hard drive (or worse, across many computers’ hard drives), it would be a nightmare to keep track of every‐ thing. Even worse, such a disordered project would later make your research nearly impossible to reproduce. In addition to having a well-organized directory structure, your bioinformatics project also should be well documented. Poor documentation can lead to irreproduci‐ bility and serious errors.

Document your methods and workflows
Document the origin of all data in your project directory
Document when you downloaded data
Record data version information
Describe how you downloaded the data
Document the versions of the software that you ran
```
 $ echo dog-{gone,bowl,bark}
     dog-gone dog-bowl dog-bark
```
Create Project

	$ mkdir ERIKA/{RAW,Scripts,quality}
	$ tree ERIKA/
			ERIKA/
			├── quality
			├── RAW
			└── Scripts

	$ls Analisi_12march2018/

The Unix shell is the foundational computing environment for bioinformatics. The shell serves as our interface to large bioinformatics programs, as an interactive con‐ sole to inspect data and intermediate results, and as the infrastructure for our pipe‐ lines and workflows. This chapter will help you develop a proficiency with the necessary Unix shell concepts used extensively throughout the rest of the book. This will allow you to focus on the content of commands in future chapters, rather than be preoccupied with understanding shell syntax.

We learned the basics of the Unix shell: using streams, redirecting output, pipes, and working with processes. These core concepts not only allow us to use the shell to run command-line bioinformatics tools, but to leverage Unix as a modular work environment for working with bioinformatics data. In this chapter, we’ll see how we can combine the Unix shell with command-line data tools to explore and manipulate data quickly

pwd	tell you where you are
ls	list the content of the current directory
ls	list the content of a directory
cd	go to the specified directory
cd ~ (or cd)	go to your home directory
cd ..	go to the parent directory
tree	list the content of a directory in a tree-like format
mkdir	create the specified directory 1.2. View the content of a file
less, more	view text with paging
head	Print first lines of a file
tail	print last lines of a file
cat	print the content of a file to the screen
zcat	print the content of a gzip compressed file to the screen 1.3. File manipulations
rm	remove file
cp	copy file1 to file2
mv	rename file1 to file2 1.4. Some other useful commands
find / -type f	recursively find all files in a specific folder
find . -name ‘'	recursively find anything whose name contains in the current folder (Single quotes must be used in order to avoid wildcard expansion by the shell)
grep	show lines of text containing a given pattern
grep -v	show lines of text not containing a given pattern
sort	sort lines of text files
wc	count words, lines and characters
`>` (output redirection)	allow to redirect the output to a file
pipe)	allow to send the output from one program to another
cut	extract selected portion of each line from one or more files
echo	input a line of text and display it on standard output

1.5. AWK programming

AWK - UNIX shell programming language. A fast and stable tool for processing text files.

awk '/www/ { print $0 }'	search for the pattern www in each line of the file
awk '$3==”www”'	search for the exact match of www in the third column of the file
awk 'length($0) > 80'	print every line in the file that is longer than 80 characters
awk 'NR % 2 == 0'	print even-numbered lines of the file 1.5.1. Some built-in variables
NR	Number of records
NF	Number of fields
FS	Field separator character
OFS	Output field separator character

See www.grymoire.com/Unix/Awk.html and www.tutorialspoint.com/awk/awk_basic_examples.htm for more information

Writing and editing files 2.1. GNU nano

TRY

Follow the first tutorial TRY ME

COMMAND LINES

USE HELP OF THE PROGRAMS AND DOCUMENTATION

  source activate rnaseq
  fastqc -h

RNA-seq Analisys Course

Introduction

Introduction

How does Linux differ from other operating systems?

Prepare and Store the data

1.5. AWK programming

AWK - UNIX shell programming language. A fast and stable tool for processing text files.

TRY

COMMAND LINES