I put a commented version that explains what this does at the bottom
- Install http://brew.sh/
- In Terminal:
brew install poppler hunspell - mkdir ~/Dictionaries
- Download dictionaries from https://cgit.freedesktop.org/libreoffice/dictionaries/tree/en and put them in there. Use the "plain" link. You need both the .dic and the .aff files.
pdftotext pattern.pdf - | tr -s '[:blank:][:punct:]' '\n' | awk 'length($1) >= 2 && length($1) <= 5 { print $1 }' | hunspell -d ~/Dictionaries/en_US -a | awk '{print $1,$2}' | grep -v '^[\*@]' | tr -d '&#' | sort | uniq -c
3 10cm
2 4sts
1 5cm
1 60cm
2 7mm
1 Aran
3 Liesl
3 bo
1 bo2
1 bo4
1 dpn
1 dpns
2 grey # funny, this is probably cause I used the en_US dictionary instead of en_GB
19 k1
2 k17
1 k2
10 k2tog
21 k3
2 k31
5 k4
2 kwise
5 liesl
5 pdf
2 psso
3 pwise
6 rnd
6 sl
22 sl1
40 sts
3 ws
6 www
3 yds

Explanatory notes:
brew install poppler hunspellThis is the easiest way to install the PDF to text converter and the spell checker.
mkdir ~/Dictionaries, download dictionariesUnfortunately you have to manually install the dictionaries
To see how this works, you can start from the first command up to the pipe (
|) and run it, and then add more pipes and commands to the end, one at a time, and run to see what you get. It's fun :)pdftotext pattern.pdf -extract the text from the pattern
tr -s '[:blank:][:punct:]' '\n'replace all spaces and punctuation with newlines so that we get one word per line
awk 'length($1) >= 2 && length($1) <= 5 { print $1 }'print out all words that are 2-5 characters long
hunspell -d ~/Dictionaries/en_US -arun them through the spell checker. In this example we are using the en_US dictionary
awk '{print $1,$2}'"the spell checker prints out a lot of stuff, we only need to see the result of the check and the word that was submitted
grep -v '^[\*@]lines that start with
*or@are not bad spellingstr -d '&#'bad spellings get a
&or#symbol, remove those symbols, we don't need to see themsortsort the results so we can do a unique count
uniq -cdo a unique count