Giới thiệu word2md-cli: Chuyển đổi Docx sang Markdown từ Terminal
Chúng tôi đã xây dựng phiên bản dòng lệnh của Word2MD cho các tác vụ hàng loạt, CI pipeline và quy trình scripting. Cài đặt trong vài giây với npx, OCR hình ảnh AI tùy chọn.
Tháng 4 năm 2026
Word2MD.net has always focused on making docx-to-markdown conversion fast and private in the browser. But over the last few months, we kept hearing the same ask: "Can I automate this?" Developers wanted to convert hundreds of files in CI, writers wanted to drop a script into their publishing pipeline, and AI teams wanted to preprocess documentation into markdown for their RAG systems. So we shipped word2md-cli — a tiny Node.js command-line tool that brings the same conversion engine to your terminal.
Install and run in one command
No setup, no config. npx fetches and runs it on demand:
npx word2md-cli input.docx
That's it — you get input.md in the same directory. Prefer a global install?
npm install -g word2md-cli
word2md input.docx
What you can do with it
Convert a single file
word2md input.docx # → input.md next to source
word2md input.docx -o custom.md # custom output path
word2md input.docx --stdout # pipe to another command
The --stdout flag is great for chaining:
word2md report.docx --stdout | pandoc -f markdown -t html -o report.html
Batch convert a whole folder
word2md ./docs/*.docx -d ./markdown/
Ideal for migrating SharePoint exports, Confluence archives, or Google Docs downloads into a modern static site.
Extract text from embedded images (OCR)
Pass --ocr to enable image OCR via PaddleX. Screenshots, diagrams, and scanned pages get their text extracted and inlined into the markdown:
export PADDLEX_OCR_URL="https://..."
export PADDLEX_OCR_TOKEN="..."
word2md input.docx --ocr --ocr-concurrency 4
Or pass credentials as flags:
word2md input.docx --ocr \
--paddlex-url "https://..." \
--paddlex-token "xxx"
Plain text output
Strip markdown syntax for clean prose — useful when feeding docs into LLM pipelines:
word2md input.docx --format text -o plain.txt
CI/CD integration
Drop it into a GitHub Action to auto-convert every docx committed to your repo:
- name: Convert Word docs to Markdown
run: npx word2md-cli docs/*.docx -d site/content/
Combine with Astro, Hugo, or Next.js and you have a self-updating documentation site that accepts Word files as input. Non-technical contributors keep writing in Word. Engineers keep shipping markdown. Everyone wins.
CLI vs. the web app
| Feature | Web app | CLI |
|---|---|---|
| Base conversion | Browser (client-side) | Local Node.js |
| Batch processing | Drag multiple files | Glob patterns, scripts |
| Image OCR | Built-in API | BYO PaddleX credentials |
| Automation | ❌ | ✅ Pipes, cron, CI |
| Live preview | ✅ | ❌ (pipe to a viewer) |
Same conversion engine (mammoth + custom post-processing), same output. The CLI is just the scriptable surface.
Open source
word2md-cli is MIT-licensed on GitHub. Issues, feature requests, and PRs welcome. The code is intentionally small — around 150 lines of TypeScript — so it's easy to audit, fork, or extend with your own rules.
What's next
--watchmode that auto-converts files on save--api-keyflag that uses your Word2MD.net account for OCR (no PaddleX setup needed)- More input formats: PDF, RTF, ODT
Opinions on priority? Drop a note on GitHub.
Meanwhile — go try it:
npx word2md-cli some.docx
Thirty seconds from now you have markdown.