Apresentando o word2md-cli: Converta Docx para Markdown no Terminal
Criamos uma versão de linha de comando do Word2MD para trabalhos em lote, pipelines de CI e fluxos de scripting. Instale em segundos com npx, OCR de imagens com IA opcional incluído.
Abril de 2026
Word2MD.net has always focused on making docx-to-markdown conversion fast and private in the browser. But over the last few months, we kept hearing the same ask: "Can I automate this?" Developers wanted to convert hundreds of files in CI, writers wanted to drop a script into their publishing pipeline, and AI teams wanted to preprocess documentation into markdown for their RAG systems. So we shipped word2md-cli — a tiny Node.js command-line tool that brings the same conversion engine to your terminal.
Install and run in one command
No setup, no config. npx fetches and runs it on demand:
npx word2md-cli input.docx
That's it — you get input.md in the same directory. Prefer a global install?
npm install -g word2md-cli
word2md input.docx
What you can do with it
Convert a single file
word2md input.docx # → input.md next to source
word2md input.docx -o custom.md # custom output path
word2md input.docx --stdout # pipe to another command
The --stdout flag is great for chaining:
word2md report.docx --stdout | pandoc -f markdown -t html -o report.html
Batch convert a whole folder
word2md ./docs/*.docx -d ./markdown/
Ideal for migrating SharePoint exports, Confluence archives, or Google Docs downloads into a modern static site.
Extract text from embedded images (OCR)
Pass --ocr to enable image OCR via PaddleX. Screenshots, diagrams, and scanned pages get their text extracted and inlined into the markdown:
export PADDLEX_OCR_URL="https://..."
export PADDLEX_OCR_TOKEN="..."
word2md input.docx --ocr --ocr-concurrency 4
Or pass credentials as flags:
word2md input.docx --ocr \
--paddlex-url "https://..." \
--paddlex-token "xxx"
Plain text output
Strip markdown syntax for clean prose — useful when feeding docs into LLM pipelines:
word2md input.docx --format text -o plain.txt
CI/CD integration
Drop it into a GitHub Action to auto-convert every docx committed to your repo:
- name: Convert Word docs to Markdown
run: npx word2md-cli docs/*.docx -d site/content/
Combine with Astro, Hugo, or Next.js and you have a self-updating documentation site that accepts Word files as input. Non-technical contributors keep writing in Word. Engineers keep shipping markdown. Everyone wins.
CLI vs. the web app
| Feature | Web app | CLI |
|---|---|---|
| Base conversion | Browser (client-side) | Local Node.js |
| Batch processing | Drag multiple files | Glob patterns, scripts |
| Image OCR | Built-in API | BYO PaddleX credentials |
| Automation | ❌ | ✅ Pipes, cron, CI |
| Live preview | ✅ | ❌ (pipe to a viewer) |
Same conversion engine (mammoth + custom post-processing), same output. The CLI is just the scriptable surface.
Open source
word2md-cli is MIT-licensed on GitHub. Issues, feature requests, and PRs welcome. The code is intentionally small — around 150 lines of TypeScript — so it's easy to audit, fork, or extend with your own rules.
What's next
--watchmode that auto-converts files on save--api-keyflag that uses your Word2MD.net account for OCR (no PaddleX setup needed)- More input formats: PDF, RTF, ODT
Opinions on priority? Drop a note on GitHub.
Meanwhile — go try it:
npx word2md-cli some.docx
Thirty seconds from now you have markdown.