Thursday, January 8, 2015

[Bash] Count Letters in a Text File

Bash script have very useful commands for Text File analysis. For this example we will use small part from Leo Tolstoy: War and Peace, available on Project Gutenberg:




This part is placed in file onepart.txt. To count letters all letters must have same case. Command tr will translate all letters to uppercase letters:
$ cat onepart.txt|tr a-z A-Z
Next, output is filtered. Command tr is useful here with first argument are switches -cd and seccond is letter to be filtered. Switch -c means complent and -d to delete, in short those two arguments means erase everything what is not letter in argument.
Pipelining we get:
$ cat onepart.txt|tr a-z A-Z|tr -cd A
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
Or for B, C etc...

$ cat 3.txt|tr a-z A-Z|tr -cd B
BBBB

$ cat 3.txt|tr a-z A-Z|tr -cd C
CCCCCCCCCCCCCCCCCCCCCCC

 $ cat 3.txt|tr a-z A-Z|tr -cd D
DDDDDDDDDDDDDDDDDDDDDDDDDD

And of course we need to count those letters. Command wc with switch -m do exactly that:

$ cat 3.txt|tr a-z A-Z|tr -cd A|wc -m
40

$ cat 3.txt|tr a-z A-Z|tr -cd B|wc -m
4

$ cat 3.txt|tr a-z A-Z|tr -cd C|wc -m
23

$ cat 3.txt|tr a-z A-Z|tr -cd D|wc -m
26
and, of course, script to count all letters occurances: 

#!/bin/bash

text=$(cat $1|tr a-z A-Z)
echo "Letter occurances:"

for l in A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
do 
let=$(echo $text|tr -cd $l|wc -m)
echo "$l $let"
done
Script is a little bit slower because text is analyzed for every letter.

No comments:

Post a Comment