Geeks With Blogs
Rahul Anand's Blog If my mind can conceive it, and my heart can believe it, I know I can achieve it.

Linux/Unix offers many text processing tools which are very powerful and can be used together to process data and extract information from files.

Some of these are listed below with few commonly used examples:

  1. head -- display first lines of a file
  2. tail -- display the last part of a file
  3. cat -- concatenate and print files
  4. less – paged output, with support to move forward and backward
  5. cut -- cut out selected portions of each line of a file
  6. sort -- sort lines of text files
  7. uniq -- report or filter out repeated lines in a file
  8. paste -- merge corresponding or subsequent lines of files
  9. join -- relational database operator
  10. diff -- compare files line by line
  11. awk -- pattern-directed scanning and processing language
  12. grep -- print lines matching a pattern
  13. find -- display lines beginning with a given string
  14. wc -- word, line, character, and byte count
  15. tr -- translate characters
  16. sed -- stream editor

Examples:

1. Display top 20 lines of a big text based file

head –n 20 bigfile.txt

2. Display 20 lines skipping 1st line (you may skip first row as it has headers) from a big file

tail –n +1 bigfile.txt | head –n 20

3. Concatenate two files.

cat file1 file2 > combinedfile

Use zcat to read a compressed file.

4. Display a file with options to move backward/forward.

less bigfile.txt

Type ‘ctrl+f’ to move forward, and ‘ctrl+b’ to move backward. Type ‘/searchstring’ to search a text in text ahead of cursor and ‘?searchstring to search backward. Type ‘/’ or ‘?’ to search again.

5. Cut the first column (read a compressed file, skip the header row, take next 10 rows and project the first field

zcat monthlyfile.zip | tail –n +1 | head | cut –f1

6. Sort the input file on first column

zcat monthlyfile.zip | tail –n +1 | head | cut –f1 | sort

7. Display duplicates with count

zcat monthlyfile.zip | tail –n +1 | head | cut –f1 | sort | uniq –c

8. Create a colon-separated list of directories named bin, suitable for use in the PATH environment variable:

find / -name bin -type d | paste -s -d : -

9. Find the extra data from first file which are not present in second file.

join -v 1 <(zcat monthly_feb.zip | tail -n +1) <(zcat monthly_jan.zip | tail -n +1)

10. Find the differences from two files

diff -y -W 80 firstfile.txt secondfile.txt

11. Print the first two fields from file (with tab separated fields) along with the row number and at the end print total rows.

awk –F ‘\t’ ‘{print “Row Count: “,NR,$1,'”,”,$2} END '{print “Total Record Count: “,NR}’

12. Output the line which contain word ‘error’ or ‘warning’ from the log file.

grepiwerror|warning” 20130218.log

13. List files with extension *.gz and greater than 1GB?

find / -type f -name *.tar.gz -size +1G -exec ls -l {} \;

14. Count number of processes currently running on a server

ps -ef | wc –l

15. Translate the contents of file to upper case

tr "[:lower:]" "[:upper:]" file1.txt

16. Get the user names from the /etc/passwd file

sed 's/\([^:]*\).*/\1/' /etc/passwd

 

Advanced usage:

Disk usage per directory
du -sk * | sort -n | while read size fname; do for unit in k M G T P E Z Y; do if [ $size -lt 1024 ]; then echo -e "${size}${unit}\t${fname}"; break; fi; size=$((size/1024)); done; done

Directory listing with number of files
find . -type f -exec dirname {} \; | awk -F '/' '{arr[$2]++} END { for (i in arr) { print i, arr[i] } }' | sort

Compare Files and print lines of first file which are not present in second
awk 'NR==FNR{a[$0]++;next} !($0 in a){print $0}' file1 file2

 

References:

http://www.thegeekstuff.com/

http://www.freebsd.org/

http://tldp.org/LDP/abs/html/textproc.html

Posted on Monday, February 18, 2013 1:07 PM Linux | Back to top


Comments on this post: Linux – Text processing commands

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © Rahul Anand | Powered by: GeeksWithBlogs.net