In this tutorial i'll try to explain how to count words in some text and make list of used words.
First we could get some text file. I get War and Peace by Leo Tolstoy from Project Gutenberg:
wget http://gutenberg.org/files/2600/2600.txt
After getting file, we must to convert uppercase letters to lowercase:
tr A-Z a-zthen convert everything which is not small letter to new line
tr -cs a-z '\n'then sort result
sortfinally get unique words and count them:
uniq -c
To get this into working condition we must put those commands into pipeline. To do that we'll make file wordcount:
and type this:nano wordcount
save and exit nano editor.cat "$@" | tr A-Z a-z | tr -cs a-z '\n'|sort|uniq -c
Give executable rights to file wordcount:
chmod +x wordcountand count words with
./wordcount <textfile>in this case:
./wordcount 2600.txtand we'll get words used in this great book and their number of appearances.
If we want to get only dictionary used in this book, we make another file:
nano makedictand type this:
cat "$@" | tr A-Z a-z | tr -cs a-z '\n'|sort|uniqtake note that only difference between wordcount and makedict scripts is -c switch in uniq command.
of course, we must give execution rights to file:
chmod +x makedictand use:
./makedict 2600.txt