###Processing Text Streams in Linux
The difference between >
(redirection operator) and |
(pipeline operator) is that while the first connects a command with a file, the latter connects the output of a command with another command
Operator | Command line | Description |
---|---|---|
`cat` | `cat mytext.txt` | Show in standard output the content of the file `mytext.txt`. |
`>` | `cat > mytext.txt` | Redirect standard input to `mytext.txt` file, so everything you write in the console is going to be in the text file. |
`>>` | `cat mytext.txt >> mytext2.txt` | Concatenate the content of the file `mytext.txt` with the content of the file `mytext2.txt`. |
`<` | `cat < mytext.txt` | Similar to `>`. Redirect standard input and passed as an argument to `cat` command. |
`|` | `man sudo | grep login` | Connect the output of a command with another command |
###sed
Command
It is a stream editor useful to perform basic text transformations on an input stream (a file or input from a pipeline).
Command line | Description |
---|---|
`sed ‘s/term/replacement/flag’ file` | Basic syntax |
`sed ‘s/y/Y/g’ ahappychild.txt > ahappychild2.txt` | Replace all the ocurrences of _y_ for _Y_ in the file `ahappychild.txt` and added to the file `ahappychild2.txt`. |
`sed 's/and/\&/g;s/^I/You/g' ahappychild.txt` | Special characters **/**, **\\**, **&**, need to escape it using backward slash (**\\**). Also use **;** to do a second replace command. **^** (caret sign) is the beginning of the line. |
`sed -n '/^Jun 8/ p' /var/log/messages | sed -n 1,5p` | With the **-n** option we tell `sed` to print (indicated by **p**) only the part of the file (or the pipe) that matches the pattern (**Jun 8** at the beginning of line in the first case and lines **1** through **5** inclusive in the second case). |
`sed '/^#\|^$/d' apache2.conf` | `sed` one-liner deletes (**d**) blank lines or those starting with **#** (the **|** character indicates a boolean OR between the two regular expressions). |
###sort
and uniq
Command
The uniq
command allows us to report or remove duplicate lines in a file. We must note that uniq
does not detect repeated lines unless they are adjacent. Thus, uniq
is commonly used along with a preceding sort
(which is used to sort lines of text files). By default, sort
takes the first field (separated by spaces) as key field. To specify a different key field, we need to use the -k
option.
Command line | Description |
---|---|
`du -sch /var/* | sort –h` | The `du –sch /path/to/directory/*` command returns the disk space usage per subdirectories and files within the specified directory in human-readable format (`-h`). |
`cat /var/log/mail.log | uniq -c -w 6` | You can count the number of events in a log by date by telling `uniq` to perform the comparison using the first **6** characters (`-w 6`) of each line (where the date is specified), and prefixing each output line by the number of occurrences (`-c`) with the following command. |
`cat sortuniq.txt | cut -d: -f1 | sort | uniq` | Cut the first field (fields are delimited by a colon), sort by name, and remove duplicate lines. |
###grep
Command
grep
searches text files or (command output) for the occurrence of a specified regular expression and outputs any line containing a match to standard output.
Command line | Description |
---|---|
`grep -i alucard /etc/passwd` | Display the information from **/etc/passwd** for user alucard, ignoring case. |
`ls -l /etc | grep rc[0-9]` | Show all the contents of **/etc** whose name begins with **rc** followed by any single number. |
Command line | Description |
---|---|
`cat sortuniq.txt | tr [:lower:] [:upper:]` | Change all lowercase to uppercase in **sortuniq.txt** file. |
`ls -l | tr -s ' '` | Squeeze the delimiter in the output of `ls –l` to only one space. |
Command line | Description |
---|---|
`cat /etc/passwd | cut -d: -f1,7` | Extract the user accounts and the default shells assigned to them from **/etc/passwd** (the `–d` option allows us to specify the field delimiter, and the `–f` switch indicates which field(s) will be extracted. |
`last | grep alucard | tr -s ‘ ‘ ` `| cut -d’ ‘ -f1,3 | sort -k2 | uniq` | Summing up, we will create a text stream consisting of the first and third non-blank files of the output of the last command. We will use `grep` as a first filter to check for sessions of user **alucard**, then squeeze delimiters to only one space (`tr -s ‘ ‘`). Next, we’ll extract the first and third fields with `cut`, and finally `sort` by the second field (IP addresses in this case) showing unique. |