Introduction

Introduction

Linux is the best-known and most-used open source operating system. As an operating system, Linux is software that sits underneath all of the other software on a computer, receiving requests from those programs and relaying these requests to the computer’s hardware.

  • GNU/Linux is a free and open-source operating system developed by thousands of contributors and led by Linus Torvalds since the beginning in 1991 Linux shells (commonly Bash)
  • allow users to execute more than 200 commands and to write pipelines in the Shell Script programming language to automatize tasks
  • Linux is widely used in research and super computers, more than 96% of super computers use Linux:
  • http://www.top500.org/statistics/list It’s an essential tool for bioinformatics and big data analysis and research

How does Linux differ from other operating systems?

In many ways, Linux is similar to other operating systems you may have used before, such as Windows, OS X, or iOS. Like other operating systems, Linux has a graphical interface, and types of software you are accustomed to using on other operating systems, such as word processing applications, have Linux equivalents. In many cases, the software’s creator may have made a Linux version of the same program you use on other systems. If you can use a computer or other electronic device, you can use Linux.

But Linux also is different from other operating systems in many important ways. First, and perhaps most importantly, Linux is open source software. The code used to create Linux is free and available to the public to view, edit, and—for users with the appropriate skills—to contribute to.

Linux is also different in that, although the core pieces of the Linux operating system are generally common, there are many distributions of Linux, which include different software options. This means that Linux is incredibly customizable, because not just applications, such as word processors and web browsers, can be swapped out. Linux users also can choose core components, such as which system displays graphics, and other user-interface components

Prepare and Store the data

We often forget how science and engineering function. Ideas come from previous exploration more often than from lightning strokes. —John W. Tukey

Just as a well-organized laboratory makes a scientist’s life easier, a well-organized andwell-documented project makes a bioinformatician’s life easier. Regardless of the particular project you’re working on, your project directory should be laid out in a consistent and understandable fashion. Clear project organization makes it easier for both you and collaborators to figure out exactly where and what everything is. Additionally, it’s much easier to automate tasks when files are organized and clearly named. For example, processing 300 gene sequences stored in separate FASTA files with a script is trivial if these files are organized in a single directory and are consistently named.

                       [explain my command ](https://explainshell.com/)

All files and directories used in your project should live in a single project directory with a clear name. During the course of a project, you’ll have amassed data files, notes, scripts, and so on if these were scattered all over your hard drive (or worse, across many computers’ hard drives), it would be a nightmare to keep track of every‐ thing. Even worse, such a disordered project would later make your research nearly impossible to reproduce. In addition to having a well-organized directory structure, your bioinformatics project also should be well documented. Poor documentation can lead to irreproduci‐ bility and serious errors.

  • Document your methods and workflows
  • Document the origin of all data in your project directory
  • Document when you downloaded data
  • Record data version information
  • Describe how you downloaded the data
  • Document the versions of the software that you ran
     $ echo dog-{gone,bowl,bark}
         dog-gone dog-bowl dog-bark
    

    Create Project

	$ mkdir Analisi_12march2018/{RAW,Scripts,quality}
	$ tree Analisi_12march2018/
			Analisi_12march2018/
			├── quality
			├── RAW
			└── Scripts

	$ls Analisi_12march2018/

The Unix shell is the foundational computing environment for bioinformatics. The shell serves as our interface to large bioinformatics programs, as an interactive con‐ sole to inspect data and intermediate results, and as the infrastructure for our pipe‐ lines and workflows. This chapter will help you develop a proficiency with the necessary Unix shell concepts used extensively throughout the rest of the book. This will allow you to focus on the content of commands in future chapters, rather than be preoccupied with understanding shell syntax.

We learned the basics of the Unix shell: using streams, redirecting output, pipes, and working with processes. These core concepts not only allow us to use the shell to run command-line bioinformatics tools, but to leverage Unix as a modular work environment for working with bioinformatics data. In this chapter, we’ll see how we can combine the Unix shell with command-line data tools to explore and manipulate data quickly

pwd tell you where you are
ls list the content of the current directory
ls list the content of a directory
cd go to the specified directory
cd ~ (or cd) go to your home directory
cd .. go to the parent directory
tree list the content of a directory in a tree-like format
mkdir create the specified directory 1.2. View the content of a file
less, more view text with paging
head Print first lines of a file
tail print last lines of a file
cat print the content of a file to the screen
zcat print the content of a gzip compressed file to the screen 1.3. File manipulations
rm remove file
cp copy file1 to file2
mv rename file1 to file2 1.4. Some other useful commands
find / -type f recursively find all files in a specific folder
find . -name ‘' recursively find anything whose name contains in the current folder (Single quotes must be used in order to avoid wildcard expansion by the shell)
grep show lines of text containing a given pattern
grep -v show lines of text not containing a given pattern
sort sort lines of text files
wc count words, lines and characters
> (output redirection) allow to redirect the output to a file
pipe) allow to send the output from one program to another
cut extract selected portion of each line from one or more files
echo input a line of text and display it on standard output

1.5. AWK programming

AWK - UNIX shell programming language. A fast and stable tool for processing text files.

awk '/www/ { print $0 }' search for the pattern www in each line of the file
awk '$3==”www”' search for the exact match of www in the third column of the file
awk 'length($0) > 80' print every line in the file that is longer than 80 characters
awk 'NR % 2 == 0' print even-numbered lines of the file 1.5.1. Some built-in variables
NR Number of records
NF Number of fields
FS Field separator character
OFS Output field separator character

See www.grymoire.com/Unix/Awk.html and www.tutorialspoint.com/awk/awk_basic_examples.htm for more information

  1. Writing and editing files 2.1. GNU nano

TRY

Follow the first tutorial TRY ME

COMMAND LINES

USE HELP OF THE PROGRAMS AND DOCUMENTATION

  source activate NGSBASE
  fastqc -h