Skip to content

Instantly share code, notes, and snippets.

@bufke
bufke / document_to_text.py
Last active September 30, 2024 20:46
Convert odt, doc, docx, pdf to text with python and some linux programs. Doesn't require Libreoffice.
from subprocess import Popen, PIPE
from docx import opendocx, getdocumenttext
#http://stackoverflow.com/questions/5725278/python-help-using-pdfminer-as-a-library
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from cStringIO import StringIO