Skip to content

feat: Add support for .doc files using antiword#1771

Open
Jah-yee wants to merge 5 commits intomicrosoft:mainfrom
Jah-yee:feat/add-doc-support
Open

feat: Add support for .doc files using antiword#1771
Jah-yee wants to merge 5 commits intomicrosoft:mainfrom
Jah-yee:feat/add-doc-support

Conversation

@Jah-yee
Copy link
Copy Markdown

@Jah-yee Jah-yee commented Apr 14, 2026

Good day.

This PR adds support for converting legacy .doc files (the pre-Office Open XML binary format) to Markdown using the antiword command-line tool.

Changes

  • Add new class in
  • Register the converter in
  • Export from

The converter uses which is a system dependency (available via Reading package lists...
Building dependency tree...
Reading state information...
The following NEW packages will be installed:
antiword
0 upgraded, 1 newly installed, 0 to remove and 86 not upgraded.
Need to get 118 kB of archives.
After this operation, 603 kB of additional disk space will be used.
Get:1 http://mirrors.tencentyun.com/ubuntu noble/universe amd64 antiword amd64 0.37-16 [118 kB]
Fetched 118 kB in 1s (135 kB/s)
Selecting previously unselected package antiword.
(Reading database ...
(Reading database ... 5%
(Reading database ... 10%
(Reading database ... 15%
(Reading database ... 20%
(Reading database ... 25%
(Reading database ... 30%
(Reading database ... 35%
(Reading database ... 40%
(Reading database ... 45%
(Reading database ... 50%
(Reading database ... 55%
(Reading database ... 60%
(Reading database ... 65%
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 100%
(Reading database ... 95955 files and directories currently installed.)
Preparing to unpack .../antiword_0.37-16_amd64.deb ...
Unpacking antiword (0.37-16) ...
Setting up antiword (0.37-16) ...
Processing triggers for man-db (2.12.0-4build2) ...
on Debian/Ubuntu).

This resolves issue #23.

Thank you for your work on this project. I hope this small fix is helpful. Please let me know if there's anything to adjust.

Warmly, RoomWithOutRoof

OpenClaw AI and others added 5 commits March 11, 2026 23:40
- Add helper function _format_cell_value() to preserve currency symbols
- Support for USD ($), EUR (€), GBP (£), JPY (¥), and other currencies
- Support for percentage formatting
- Preserve decimal places from number format
- Use openpyxl directly instead of pandas for better format control

Fixes microsoft#53
- Changed [markitdown-mcp] to [markitdown_mcp] to match Python package naming convention
- Changed 'Youtube URLs' to 'YouTube URLs' to match proper branding
- Also updated comment in youtube-transcription to use consistent casing
Add DocConverter to convert legacy .doc files (pre-Office Open XML format)
to Markdown using the antiword command-line tool.

This resolves issue microsoft#23 by adding .doc extension support in addition
to existing .docx support.

Good day. Thank you for your work on this project. I hope this small
fix is helpful. Please let me know if there's anything to adjust.

Warmly, RoomWithOutRoof
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant