Linux is the best-known and most-used open source operating system. As an operating system, Linux is software that sits underneath all of the other software on a computer, receiving requests from those programs and relaying these requests to the computer’s hardware.
In many ways, Linux is similar to other operating systems you may have used before, such as Windows, OS X, or iOS. Like other operating systems, Linux has a graphical interface, and types of software you are accustomed to using on other operating systems, such as word processing applications, have Linux equivalents. In many cases, the software’s creator may have made a Linux version of the same program you use on other systems. If you can use a computer or other electronic device, you can use Linux.
But Linux also is different from other operating systems in many important ways. First, and perhaps most importantly, Linux is open source software. The code used to create Linux is free and available to the public to view, edit, and—for users with the appropriate skills—to contribute to.
Linux is also different in that, although the core pieces of the Linux operating system are generally common, there are many distributions of Linux, which include different software options. This means that Linux is incredibly customizable, because not just applications, such as word processors and web browsers, can be swapped out. Linux users also can choose core components, such as which system displays graphics, and other user-interface components
We often forget how science and engineering function. Ideas come from previous exploration more often than from lightning strokes. —John W. Tukey
Just as a well-organized laboratory makes a scientist’s life easier, a well-organized andwell-documented project makes a bioinformatician’s life easier. Regardless of the particular project you’re working on, your project directory should be laid out in a consistent and understandable fashion. Clear project organization makes it easier for both you and collaborators to figure out exactly where and what everything is. Additionally, it’s much easier to automate tasks when files are organized and clearly named. For example, processing 300 gene sequences stored in separate FASTA files with a script is trivial if these files are organized in a single directory and are consistently named.
[explain my command ](https://explainshell.com/)
All files and directories used in your project should live in a single project directory with a clear name. During the course of a project, you’ll have amassed data files, notes, scripts, and so on if these were scattered all over your hard drive (or worse, across many computers’ hard drives), it would be a nightmare to keep track of every‐ thing. Even worse, such a disordered project would later make your research nearly impossible to reproduce. In addition to having a well-organized directory structure, your bioinformatics project also should be well documented. Poor documentation can lead to irreproduci‐ bility and serious errors.
$ echo dog-{gone,bowl,bark}
dog-gone dog-bowl dog-bark
Create Project
$ mkdir Analisi_12march2018/{RAW,Scripts,quality}
$ tree Analisi_12march2018/
Analisi_12march2018/
├── quality
├── RAW
└── Scripts
$ls Analisi_12march2018/
The Unix shell is the foundational computing environment for bioinformatics. The shell serves as our interface to large bioinformatics programs, as an interactive con‐ sole to inspect data and intermediate results, and as the infrastructure for our pipe‐ lines and workflows. This chapter will help you develop a proficiency with the necessary Unix shell concepts used extensively throughout the rest of the book. This will allow you to focus on the content of commands in future chapters, rather than be preoccupied with understanding shell syntax.
We learned the basics of the Unix shell: using streams, redirecting output, pipes, and working with processes. These core concepts not only allow us to use the shell to run command-line bioinformatics tools, but to leverage Unix as a modular work environment for working with bioinformatics data. In this chapter, we’ll see how we can combine the Unix shell with command-line data tools to explore and manipulate data quickly
pwd | tell you where you are |
ls | list the content of the current directory |
ls |
list the content of a directory |
cd |
go to the specified directory |
cd ~ (or cd) | go to your home directory |
cd .. | go to the parent directory |
tree |
list the content of a directory in a tree-like format |
mkdir |
create the specified directory 1.2. View the content of a file |
less, more | view text with paging |
head | Print first lines of a file |
tail | print last lines of a file |
cat | print the content of a file to the screen |
zcat | print the content of a gzip compressed file to the screen 1.3. File manipulations |
rm |
remove file |
cp |
copy file1 to file2 |
mv |
rename file1 to file2 1.4. Some other useful commands |
find |
recursively find all files in a specific folder |
find . -name ‘ |
recursively find anything whose name contains |
grep |
show lines of text containing a given pattern |
grep -v |
show lines of text not containing a given pattern |
sort | sort lines of text files |
wc | count words, lines and characters |
> (output redirection) |
allow to redirect the output to a file |
pipe) | allow to send the output from one program to another |
cut | extract selected portion of each line from one or more files |
echo | input a line of text and display it on standard output |
awk '/www/ { print $0 }' |
search for the pattern www in each line of the file |
awk '$3==”www”' |
search for the exact match of www in the third column of the file |
awk 'length($0) > 80' |
print every line in the file that is longer than 80 characters |
awk 'NR % 2 == 0' |
print even-numbered lines of the file 1.5.1. Some built-in variables |
NR | Number of records |
NF | Number of fields |
FS | Field separator character |
OFS | Output field separator character |
See www.grymoire.com/Unix/Awk.html and www.tutorialspoint.com/awk/awk_basic_examples.htm for more information
Follow the first tutorial TRY ME
USE HELP OF THE PROGRAMS AND DOCUMENTATION
source activate NGSBASE
fastqc -h