Showing posts with label count. Show all posts
Showing posts with label count. Show all posts

Thursday, January 8, 2015

[Bash] Count Letters in a Text File

Bash script have very useful commands for Text File analysis. For this example we will use small part from Leo Tolstoy: War and Peace, available on Project Gutenberg:




This part is placed in file onepart.txt. To count letters all letters must have same case. Command tr will translate all letters to uppercase letters:
$ cat onepart.txt|tr a-z A-Z
Next, output is filtered. Command tr is useful here with first argument are switches -cd and seccond is letter to be filtered. Switch -c means complent and -d to delete, in short those two arguments means erase everything what is not letter in argument.
Pipelining we get:
$ cat onepart.txt|tr a-z A-Z|tr -cd A
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
Or for B, C etc...

$ cat 3.txt|tr a-z A-Z|tr -cd B
BBBB

$ cat 3.txt|tr a-z A-Z|tr -cd C
CCCCCCCCCCCCCCCCCCCCCCC

 $ cat 3.txt|tr a-z A-Z|tr -cd D
DDDDDDDDDDDDDDDDDDDDDDDDDD

And of course we need to count those letters. Command wc with switch -m do exactly that:

$ cat 3.txt|tr a-z A-Z|tr -cd A|wc -m
40

$ cat 3.txt|tr a-z A-Z|tr -cd B|wc -m
4

$ cat 3.txt|tr a-z A-Z|tr -cd C|wc -m
23

$ cat 3.txt|tr a-z A-Z|tr -cd D|wc -m
26
and, of course, script to count all letters occurances: 

#!/bin/bash

text=$(cat $1|tr a-z A-Z)
echo "Letter occurances:"

for l in A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
do 
let=$(echo $text|tr -cd $l|wc -m)
echo "$l $let"
done
Script is a little bit slower because text is analyzed for every letter.

Saturday, February 22, 2014

Linux How to count words and lines in a text file

If you want to count words in some text file you could use:
wc -w <file>
example for file 2600.txt (War and Peace from Leo Tolstoy) from Guttenberg Project :
wc -w 2600.txt
output:
566321 2600.txt
 If you want to count lines in some text file you could use:
wc -l <file>
the same file for example
wc -l 2600.txt
output:
65008 2600.txt

Full options list for wc command:

  • -c or --bytes byte counts
  • -m or --chars character counts
  • -l or --lines newline counts
  • -L or --max-line-length lenght of longest line
  • -w or --words word counts

 also:
  • --help
  • --version