sed & awk Tutorial

Sep 27, 2020

Credits - These are my notes from Richard Bohm’s Udemy Class.

SED

SED is a stream editor.

Structure :

sed OPTIONS … [SCRIPT] [INPUTFILE….]
cat [INPUTFILE] | sed OPTIONS … [SCRIPT]
SCRIPT:
- [addr]X[options]
  - [addr] - can be a single line, number, a regular expression, or range of lines. If [addr] is specified, the command X will be executed only on the matched lines.
  - X - single-letter sed command.
  - Additional [options] for sed command.
- Eg: sed '30,35d' infile.txt > outfile.txt
  - Delete range of lines Line 30 - line 35 from infile and save output to outfile.txt
  - d - is the delete command
Sed by default does not alter the inputfile, only prints.
Sed accepts one single command at a time. To use multiple commands use -e option
3 ways to specify multiple instructions:
- Using ; . Eg : sed 's/abc/replace/; s/pqr/replace2/' inputfile
- Using the precedence ‘-e’ . Eg: sed 's/abc/replace/' -e 's/pqr/replace2/' inputfile
- Multiline entry:
```
       $ sed '
       > s/Red/Ray/
       > s/Dev/Development/
       > ' emp.txt
```

Example file to be used throughout : emp.txt

    cat emp.txt

    output:
    Name Age Unit
    Ant  23  IT
    Bec  25  HT
    Red  34  CEO
    Dua  32  FIN
    Wui  29  PR
    Wui  29  PR
    Van  27  Dev
    Kim  26  Dev

Sed Commands :

a - Append text after a line.

eg: Append after IT some text : sed '/IT/a QWE ## XY' emp.txt

      output:
      Name Age Unit
      Ant  23  IT
      QWE  ## XY
      Bec  25  HT
      Red  34  CEO
      Dua  32  FIN
      Wui  29  PR
      Van  27  Dev
      Kim  26  Dev

i 'text' - insert text before a line

eg: Insert before IT some text : sed '/IT/i QWE ## XY' emp.txt

      output:
      Name Age Unit
      QWE  ## XY
      Ant  23  IT
      Bec  25  HT
      Red  34  CEO
      Dua  32  FIN
      Wui  29  PR
      Van  27  Dev
      Kim  26  Dev

d - delete the pattern

eg: Delete all lines containing ‘Dev’ : sed '/Dev/d' emp.txt

  output:
  Name Age Unit
  Ant  23  IT
  Bec  25  HT
  Red  34  CEO
  Dua  32  FIN
  Wui  29  PR

p - print the pattern.
- eg: sed -n '/Dev/p' emp.txt ‘-n’ is used to explicitly disable print all, as sed prints all by default.
```
  output:
  Van  27  Dev
  Kim  26  Dev
```
- eg: “Print 2nd line” sed -n '2p' emp.txt
```
  output:
  Ant  23  IT
```
- eg: “Print from line 2 to 5” sed -n '2,5p' emp.txt
```
  output:
  Ant  23  IT
  Bec  25  HT
  Red  34  CEO
  Dua  32  FIN
```

c - Change command used to change lines.

eg: Change all lines with ‘Dev’ to XXXXXX… - sed '/Dev/c xxxxxxxxxxxxxxxxxxxxxxx' emp.txt

   output:
  Name Age Unit
  Ant  23  IT
  Bec  25  HT
  Red  34  CEO
  Dua  32  FIN
  Wui  29  PR
  xxxxxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxxxxx

q[exit-code] - exit sed without processing any more commands or input.
- eg: sed '/Bec/q2' emp.txt will quit once the search pattern matches and will receive exit status code 2. echo $? gives an output of 2
```
  output:
  Name Age Unit
  Ant  23  IT
  Bec  25  HT
```
s/regexp/replacement/[flags] - (substitute) Match the regex against the content of the pattern space. If found replace matched string with ‘replacement’. Use g to substitute globally.
- Eg: echo "1110000111" | sed 's/000/XXX/' –> output: 111XXX0111
- Eg: echo "The quick brown fox , trots away ... " | sed 's/[[:space:]]/#/g'. This will replaces all spaces globally. Output : The#quick#brown#fox##,#trots#away#####...##
- Eg: echo "The quick brown fox , trots away ... " | sed 's/[[:space:]]/#/ . Without the global flag (g) will simply replace the first instance. –> output: The#quick brown fox , trots away ...
- Eg: Replace specific lines in a file. sed '8s/Dev/Manager/' emp.txt –> Replace ‘Dev’ with ‘Manager’ on the 8th line.
```
  output:
  Name Age Unit
  Ant  23  IT
  Bec  25  HT
  Red  34  CEO
  Dua  32  FIN
  Wui  29  PR
  Van  27  Dev
  Kim  26  Manager
```
- Eg: Replace all lines we use range 1,$. Use command sed '1,$s/e/EE/' emp.txt to replace all instances of ‘e’ with ‘EE’.
```
  output:
  NamEE Age Unit
  Ant  23  IT
  BEEc  25  HT
  REEd  34  CEO
  Dua  32  FIN
  Wui  29  PR
  Van  27  DEEv
  Kim  26  DEEv
```

Command-Line- Options:

-n - disable automatic printing; sed produced output when explicitlytold via the p command.
-e script - add script
- eg: Search some text and quit with exit code 2 : sed -ne '/CEO/p' -ne '/CEO/q2' emp.txt and echo $? will return 2.
```
  output:
  Red  34  CEO
```
- eg: Print “000- before/after -000” before and after lines containing ‘Dua’ : sed -e '/Dua/a 000- after -000' -e '/Dua/i 000- before -000' emp.txt
```
  output:
  Name Age Unit
  Ant  23  IT
  Bec  25  HT
  Red  34  CEO
  000- before -000
  Dua  32  FIN
  000- after -000
  Wui  29  PR
  Van  27  Dev
  Kim  26  Dev
```
-r - use extended regular expressions rather than basic regular expressions.
-i - Use this flag to modify the input file.
- We take a copy of emp.txt as emp2.txt and perform the following : sed -ni '/Wui/p' emp2.txt
- cat emp2.txt :
```
  output:
  Wui  29  PR
```

e To run scripts. This is different than -e

sed '/Name/e echo -n "Date:"; date' emp.txt - Run and print date before ‘Name’.

  output:
  Date:Sat Aug 29 08:00:34 IST 2020
  Name Age Unit
  Ant  23  IT
  Bec  25  HT
  Red  34  CEO
  Dua  32  FIN
  Wui  29  PR
  Van  27  Dev
  Kim  26  Dev

sed '1 e echo -n "Date:"; date' emp.txt - Print date on the first line.

  output:
  Date:Sat Aug 29 08:05:41 IST 2020
  Name Age Unit
  Ant  23  IT
  Bec  25  HT
  Red  34  CEO
  Dua  32  FIN
  Wui  29  PR
  Van  27  Dev
  Kim  26  Dev

sed '1,$ e echo -n "Date:"; date' emp.txt - Print date before every line.

  output:
  Date:Sat Aug 29 08:16:09 IST 2020
  Name Age Unit
  Date:Sat Aug 29 08:16:09 IST 2020
  Ant  23  IT
  Date:Sat Aug 29 08:16:09 IST 2020
  Bec  25  HT
  Date:Sat Aug 29 08:16:09 IST 2020
  Red  34  CEO
  Date:Sat Aug 29 08:16:09 IST 2020
  Dua  32  FIN
  Date:Sat Aug 29 08:16:09 IST 2020
  Wui  29  PR
  Date:Sat Aug 29 08:16:09 IST 2020
  Van  27  Dev
  Date:Sat Aug 29 08:16:09 IST 2020
  Kim  26  Dev
  Date:Sat Aug 29 08:16:09 IST 2020

Sed Commands :

Its not advised to have long sed scripts on commndline, a better option is to create a script file.

Syntax: sed -f scriptfile inputfile. Where scriptfile contains the sed instructions.

AWK

Searches files for patterns and performs actions specified in the AWK body.

Structure :

awk'program_to_perform_action' file1 file2 ...
Divided into 3 sections BEGIN, Main & END
BEGIN - Code specified here is executed before executing the operations on the file.
Main - Executed for ech line of the file.

END - After awk process of all lines.

 awk 'BEGIN{
     code_in_BEGIN_section}
 {Code_in_Main_Body}
 END{
     code_END_Section }' file1 file2 ...

Example:

    echo "one two three" | awk'BEGIN{begin_code}{main_code}END{end_code}'

Example 2:

    echo "one two three" | awk'{main_code}'

[NOTE] In AWK body, Bash features DO NOT WORK. AWK has its own syntax.

Hello World: awk 'BEGIN{print "Hello world !"}'
To print “Hello World !” on each line after Enter awk '{print "Hello world !"}'
Example: echo "This is one line" | awk 'BEGIN{print "start"}{print "OK"}END{print "stop"}' The output is as shown below. OK is printed once as there is just one line.
```
  Output:
  start
  OK
  stop
```
Example (lets input a file hello.txt which has 3 lines): cat hello.txt | awk 'BEGIN{print "start"}{print "OK"}END{print "stop"}' OR awk 'BEGIN{print "start"}{print "OK"}END{print "stop"}' hello.txt
```
  output:
  start
  OK
  OK
  OK
  stop
```

Fields :

Fields are by default seperated by space.
$0 prints entire line
$1 prints the first field and so on ..
Examples:
- echo "1 2 3 4 5" | awk '{print $0}' will output 1 2 3 4 5
- echo "1 2 3 4 5" | awk '{print $1}' will output 1
- echo "1 2 3 4 5" | awk '{print $3}' will output 3
Examples with a File (emp.txt)

To print all columns in the file awk '{print $0}' emp.txt

Output:

 Name Age Unit
  Ant  23  IT
  Bec  25  HR
  Red  34  CEO
  Dua  32  FIN

awk '{print $1}' emp.txt
```
  Name
  Ant
  Bec
  Red
  Dua
```
awk '{print $2}' emp.txt
```
  Age
  23
  25
  34
  32
```
awk '{print $3}' emp.txt
```
  Unit
  IT
  HR
  CEO
  FIN
```

awk '{print $1,$3}' emp.txt

  Name Unit
  Ant IT
  Bec HR
  Red CEO
  Dua FIN

Search Patterns :

Awk performs operations line by line. Search pattern is defined between ‘//’. AWK uses the default Regular Expressions

eg 1 : awk ' /CEO/ {print $1,$3}' emp.txt

output:
    Red CEO

NF - Number of Fields :

echo "1 two 3 four" | awk '{print NF}' Output : 4
echo "1 two 3 four" | awk '{print $(NF-2)}' Output : two.
$NF = 4 and using this we could do mathemetical operations. Eg. To print the second last field we could ‘{print $(NF-1)}’

NR - Number of Records :

Records in Awk are by default seperated by a newline.

Eg 1: echo "1 two 3 four" | awk '{print $(NR)}' . Output : 1

Eg 2: awk '{print NR}' emp.txt

As awk processess line by line, for each line it prints the nos. of records found.

To print exact records from a file we could use END. Eg: awk 'END{print NR}' emp.txt. Output : 6

FS - Field Seperator :

Default is space. We can define custom values for the field seperator.

Eg: echo "102 202 303" | awk 'BEGIN{FS="0"} {print $1"-"$2"-"$3"-"$4}' . Output of the above command : 1-2 2-2 3-3 . We used 0 as FS.

RS - Record Seperator :

By default seperated by newline. Can devine custom values for RS (record seperator).

Eg: echo "102 202 303" | awk 'BEGIN{RS="0"} END{print NR}' . Output is 4.
Now the default RS would return 1. as there is only one line. Eg. echo "102 202 303" | awk 'END{print NR}' Output : 1

AWK Variables :

Assignment :
- a=1
- RS="\t"
- FS=":"
Increment/ Decrement :
- a++ / a--
- a=a+1 / a=a-1
Math Operations :
- a=b+c add
- a=b*c multiply
- a=b/c divide
- a=b-c subtract
- a=b%c Modulus
- a=b^c Raise var to the power
- a=b**c Raise var to the power

AWK - if Statement

    if(condition){
            command(s)
    }
    else{
        command(s)    
    }

Comparisons:
- == , <, <=, >, >=, !=
- eg: awk '{if ($1 == "Red"){print $1, $2, $3}}' emp.txt. Output would be : Red 34 CEO.
- OR if we want to print all fields we could also use $0 instead. Eg: awk '{if ($1 == "Red"){print $0}}' emp.txt Output : Red 34 CEO.

AWK For loops

Structure:

for (initialization; condition; increment){
        command(s)
}

Eg: awk 'BEGIN{for(i=1; i<=3; i++){print "test -", i}}' Output:
```
  test - 1
  test - 2
  test - 3
```

AWK Script files :

Structure : awk -f awkscript inputfiles

Jovin Lobo

Infosec . DevSecOps

I specialize in web-security and have currently started exploring the amazing world of DevSecOps and Automation.