Converting Word Documents with Embedded Images to Markdown
A complete guide to handling Word documents that contain images, diagrams, and screenshots when converting to Markdown format. Learn when to preserve images vs. extract text content.
January 2025
Converting a Word document with images to Markdown is the moment most converters quietly fail. The text comes through, the headings line up, and then you discover that every screenshot has become a broken `` reference, or worse, has silently disappeared. Word2MD.net was built around the image problem specifically because that's where every documentation migration we've seen breaks. This guide explains what actually happens to images during conversion, the pipeline Word2MD.net uses to keep them intact, and when to embed versus link versus OCR.
What happens to images by default
A .docx file is a ZIP archive. Images live inside `/word/media/` as raw PNG, JPEG, or EMF binaries. A naive converter does one of three things, all bad: inlines them as massive base64 strings (bloating the Markdown to megabytes), drops them with a `[image]` placeholder, or fails entirely on EMF/WMF formats that Word still emits for charts and screenshots from older Office versions. None of these results survive a Git PR review.
Word2MD.net's image pipeline
- Extraction: each embedded image is unpacked from the .docx ZIP and given a stable filename derived from the document name plus a hash.
- Format normalization: EMF and WMF — which most browsers can't render — are converted to PNG on the fly.
- Sidecar export: in batch mode, images ship in an `images/` folder inside the result ZIP, alongside the Markdown that references them with clean relative paths.
- Alt text preservation: if the original Word doc had alt text on the image, it's carried into the `![alt]()` syntax — critical for accessibility and SEO.
- OCR option: for screenshots of UIs, terminals, or scanned pages, AI OCR runs alongside the image link to extract searchable text.
AI OCR for screenshots and scans
Screenshots break RAG pipelines because LLM indexes can't see inside images. A 50-page product spec with 80 UI screenshots loses 80 chunks of content the moment you ingest it. Word2MD.net's optional OCR pass extracts visible text from each image and emits it next to the `![alt]()` link as a fenced code block. The image still ships for human readers; the text version makes the same content retrievable by LlamaIndex, LangChain, Haystack, or any custom embedding pipeline. For knowledge-base teams this single feature often justifies the conversion.
When to embed, link, or OCR — decision rule
Use inline base64 only for documents under 50 KB total that you'll publish on a single static page (rare). Use sidecar export with relative links for anything that lives in a Git repo — it gives reviewers normal image diffs and keeps the Markdown human-readable. Add OCR when the image content itself carries information your readers or LLMs need to search: error messages in screenshots, code in terminal grabs, labels in diagrams. Skip OCR for decorative photos — it adds noise. The honest test: if a reader could ignore the image and lose nothing, you don't need OCR.