Is Word2MD free to use?

Yes, Word2MD is completely free to use. You can convert unlimited Word documents to Markdown without any cost or registration.

Do you upload my files to a server?

No, all conversions happen directly in your browser. Your files never leave your device, ensuring complete privacy and security.

What Word formats are supported?

We support .docx files (Microsoft Word 2007 and later). Legacy .doc files are not currently supported.

Can I convert multiple files at once?

Yes, you can upload and convert multiple .docx files simultaneously. All converted files can be downloaded as individual Markdown files or packaged in a ZIP archive.

What Markdown features are supported?

We support standard Markdown syntax including headings, lists, links, images, tables, bold, italic, code blocks, and more. The conversion preserves document structure and formatting as closely as possible.

Back to Blog

🚀 Just Launched

Converting Word Documents with Embedded Images to Markdown

A complete guide to handling Word documents that contain images, diagrams, and screenshots when converting to Markdown format. Learn when to preserve images vs. extract text content.

January 2025

Converting a Word document with images to Markdown is the moment most converters quietly fail. The text comes through, the headings line up, and then you discover that every screenshot has become a broken `![image](data:...)` reference, or worse, has silently disappeared. Word2MD.net was built around the image problem specifically because that's where every documentation migration we've seen breaks. This guide explains what actually happens to images during conversion, the pipeline Word2MD.net uses to keep them intact, and when to embed versus link versus OCR.

What happens to images by default

A .docx file is a ZIP archive. Images live inside `/word/media/` as raw PNG, JPEG, or EMF binaries. A naive converter does one of three things, all bad: inlines them as massive base64 strings (bloating the Markdown to megabytes), drops them with a `[image]` placeholder, or fails entirely on EMF/WMF formats that Word still emits for charts and screenshots from older Office versions. None of these results survive a Git PR review.

Word2MD.net's image pipeline

Extraction: each embedded image is unpacked from the .docx ZIP and given a stable filename derived from the document name plus a hash.
Format normalization: EMF and WMF — which most browsers can't render — are converted to PNG on the fly.
Sidecar export: in batch mode, images ship in an `images/` folder inside the result ZIP, alongside the Markdown that references them with clean relative paths.
Alt text preservation: if the original Word doc had alt text on the image, it's carried into the `![alt]()` syntax — critical for accessibility and SEO.
OCR option: for screenshots of UIs, terminals, or scanned pages, AI OCR runs alongside the image link to extract searchable text.

AI OCR for screenshots and scans

Screenshots break RAG pipelines because LLM indexes can't see inside images. A 50-page product spec with 80 UI screenshots loses 80 chunks of content the moment you ingest it. Word2MD.net's optional OCR pass extracts visible text from each image and emits it next to the `![alt]()` link as a fenced code block. The image still ships for human readers; the text version makes the same content retrievable by LlamaIndex, LangChain, Haystack, or any custom embedding pipeline. For knowledge-base teams this single feature often justifies the conversion.

When to embed, link, or OCR — decision rule

Use inline base64 only for documents under 50 KB total that you'll publish on a single static page (rare). Use sidecar export with relative links for anything that lives in a Git repo — it gives reviewers normal image diffs and keeps the Markdown human-readable. Add OCR when the image content itself carries information your readers or LLMs need to search: error messages in screenshots, code in terminal grabs, labels in diagrams. Skip OCR for decorative photos — it adds noise. The honest test: if a reader could ignore the image and lose nothing, you don't need OCR.

Try Word2MD Image Conversion