Skip to content

Instantly share code, notes, and snippets.

@DouglasUrner
Forked from aembleton/docx2md.md
Last active January 21, 2018 21:27
Show Gist options
  • Select an option

  • Save DouglasUrner/586db1a117b5e237ff777ac66f7de8c8 to your computer and use it in GitHub Desktop.

Select an option

Save DouglasUrner/586db1a117b5e237ff777ac66f7de8c8 to your computer and use it in GitHub Desktop.
Convert a Word Document With Images into MD

Convert Word documents with images to Markdown

The Problem

A lot of documents are created and saved in Microsoft Word (*.docx). But Microsoft Word is a proprietary format, and it's not really useful for presenting documents on the web and I haven't found a way to get it to play nicely with Git. So, I wanted to find a way to convert a .docx file into markdown.

Installing Pandoc

On a Mac you can use homebrew to install Pandoc by running the command brew install pandoc.

The Solution

As it turns out, there are several open-source tools that allow for conversion between file types. Pandoc is one of them, and it's powerful. In fact, pandoc's website says "If you need to convert files from one markup format into another, pandoc is your swiss-army knife." Pandoc can convert from markdown into .docx, and it also works in the other direction.

Example

Say you have the Council Rules in a Word Document named "test.docx." (For a real-life example, visit https://github.com/vzvenyach/Council_Rules/). Now, you run the following at the command line:

pandoc --from docx --to markdown --extract-media=. --output test.md test.docx

The result is a quite usable markdown file, with any images or other media saved into a folder named media in the current directory.

Admittedly, there's a bit of junk at the top with the Table of Contents. Delete that and it will render nicely with strapdown.js. Here is the final version of the Rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment