Use this script to extract statistics from a PDF file.
My goal was to check if font size 12 was mainly used inside some documents.
Dependency:
Usage:
pdfminer_font_size_statistics.py filepath.mdf
Sample output:
>>> Percentage for 12.0 font size: 96.72% (12.0pt: 88456 chars for total of 91453 chars)
Percentage for 36.0 font size: 0.03% (36.0pt: 24 chars for total of 91453 chars)
Percentage for 27.0 font size: 0.03% (27.0pt: 30 chars for total of 91453 chars)
Percentage for 16.0 font size: 0.25% (16.0pt: 227 chars for total of 91453 chars)
Percentage for 6.5 font size: 0.55% (6.5pt: 499 chars for total of 91453 chars)
Percentage for 10.0 font size: 0.08% (10.0pt: 72 chars for total of 91453 chars)
Percentage for 13.0 font size: 0.86% (13.0pt: 784 chars for total of 91453 chars)
Percentage for 9.0 font size: 1.38% (9.0pt: 1261 chars for total of 91453 chars)
Percentage for 11.0 font size: 0.10% (11.0pt: 87 chars for total of 91453 chars)
Percentage for 14.0 font size: 0.01% (14.0pt: 13 chars for total of 91453 chars)
Inpired from this answer