Linux tips and tricks: 2012

In this tutorial i'll try to explain how to count words in some text and make list of used words.

First we could get some text file. I get War and Peace by Leo Tolstoy from Project Gutenberg:

wget http://gutenberg.org/files/2600/2600.txt

After getting file, we must to convert uppercase letters to lowercase:

tr A-Z a-z

then convert everything which is not small letter to new line

tr -cs a-z '\n'

then sort result

sort

finally get unique words and count them:

uniq -c

To get this into working condition we must put those commands into pipeline. To do that we'll make file wordcount:

nano wordcount

and type this:

cat "$@" | tr A-Z a-z | tr -cs a-z '\n'|sort|uniq -c

save and exit nano editor.
Give executable rights to file wordcount:

chmod +x wordcount

and count words with

./wordcount <textfile>

in this case:

./wordcount 2600.txt

and we'll get words used in this great book and their number of appearances.

If we want to get only dictionary used in this book, we make another file:

nano makedict

and type this:

cat "$@" | tr A-Z a-z | tr -cs a-z '\n'|sort|uniq

take note that only difference between wordcount and makedict scripts is -c switch in uniq command.
of course, we must give execution rights to file:

chmod +x makedict

and use:

./makedict 2600.txt

Linux tips and tricks

Sunday, October 14, 2012

Bash - Count words in text and make dictionary

Saturday, August 4, 2012

Mandelbrot fractal zoom animation