What languages and platforms does dxpdf support?

dxpdf is written in Rust and available as a CLI tool (via cargo install), a Rust library (via crates.io), and a Python package (via PyPI). It runs on macOS, Linux, and Windows.

How accurate is the conversion?

dxpdf uses a Flutter-inspired measure-layout-paint pipeline designed for pixel-level fidelity. It supports 34 OOXML features including tables, images, headers/footers, and styles. Visual regression tests compare output against Word-generated references.

On Apple M3 Max, dxpdf converts a 3-page document with tables and images in approximately 113ms with 19 MB peak memory usage. It is fast enough to run inline in web request handlers.

Can I use dxpdf in a Python application?

Yes. Install with pip install dxpdf. The Python package wraps the Rust core via PyO3, providing native performance. Use dxpdf.convert() for bytes-in/bytes-out or dxpdf.convert_file() for file-to-file conversion.

What DOCX features are not yet supported?

Features not yet implemented include strikethrough, text highlighting, small caps, text shadows, keep-with-next pagination, VML images, multi-column layouts, first-page headers, table of contents fields, footnotes, comments, text boxes, shapes, right-to-left text, and hyphenation.

dxpdf

The Problem

Converting Word documents to PDF is one of the most common tasks in business software. Invoices, contracts, reports, compliance forms — they start as .docx files and need to become PDFs for sharing, archiving, or printing.

The existing solutions all come with significant tradeoffs:

Microsoft Office / LibreOffice — requires installing a full office suite on every server. LibreOffice's headless mode is slow, memory-hungry, and produces inconsistent results across versions. Scaling means running multiple instances that consume gigabytes of RAM.
Cloud APIs (Google Docs, Adobe, CloudConvert) — adds latency, costs per conversion, and sends potentially sensitive documents to third-party servers. Not viable for regulated industries or air-gapped environments.
HTML-to-PDF tools (wkhtmltopdf, Puppeteer) — requires converting DOCX to HTML first, losing formatting fidelity. Tables, headers/footers, and page breaks rarely survive the round trip.

None of these work well when you need fast, accurate, offline conversion at scale — especially in automated pipelines, CI/CD systems, or embedded applications where installing LibreOffice is not an option.

How dxpdf Solves It

dxpdf is a standalone DOCX-to-PDF converter written in Rust and powered by Google's Skia graphics library. It reads .docx files directly, parses the OOXML structure, and renders pixel-accurate PDF output — all in a single binary with no external dependencies beyond Skia.

A Flutter-inspired measure-layout-paint pipeline ensures that text wrapping, table sizing, and page breaks match what Microsoft Word produces:

DOCX (ZIP) → Parse → Document Model → Measure → Layout → Paint → PDF

The result is a converter that runs in ~115ms on a 3-page document with tables and images, using just 19 MB of memory — fast enough to run inline in a request handler or batch-process thousands of documents.

What It Supports

dxpdf implements 34 OOXML features with full coverage, including:

Text formatting — bold, italic, underline, font size, font family, color, character spacing, superscript, subscript, run shading
Paragraphs — alignment (left, center, right), spacing, indentation, tab stops, borders, shading
Tables — column widths, cell margins with 3-level cascade, merged cells (horizontal and vertical), row heights, borders, cell shading, nested tables
Images — inline (PNG, JPEG, BMP, WebP) and floating/anchored with alignment and percentage-based positioning
Styles — paragraph and character styles with basedOn inheritance, document defaults, theme fonts
Headers and footers — text, images, page numbers (PAGE/NUMPAGES field codes)
Lists — bullets, decimal, lower/upper letter, lower/upper roman with counter tracking
Hyperlinks — rendered as clickable PDF link annotations
Sections — multiple page sizes and margins, section breaks, portrait and landscape orientations
Layout — automatic pagination, word wrapping, line spacing modes, floating image text flow

Three Ways to Use It

Command-line tool

Install and run with a single command:

cargo install dxpdf
dxpdf input.docx -o output.pdf

Rust library

One function call — bytes in, bytes out:

let docx_bytes = std::fs::read("document.docx")?;
let pdf_bytes = dxpdf::convert(&docx_bytes)?;
std::fs::write("output.pdf", &pdf_bytes)?;

For more control, inspect the parsed document model before rendering:

use dxpdf::{parse, model};

let document = parse::parse(&std::fs::read("document.docx")?)?;

for block in &document.blocks {
    match block {
        model::Block::Paragraph(p) => { /* inspect paragraph */ }
        model::Block::Table(t) => { /* inspect table */ }
    }
}

let pdf_bytes = dxpdf::convert_document(&document)?;

Python package

Install from PyPI and use in any Python application:

pip install dxpdf

import dxpdf

# Bytes in, bytes out
pdf_bytes = dxpdf.convert(open("input.docx", "rb").read())

# File to file
dxpdf.convert_file("input.docx", "output.pdf")

Performance

Benchmarked on Apple M3 Max with hyperfine (20 runs, 3 warmup), converting a 3-page document with 11 tables, 2 images, and 2 sections:

Metric	Value
Mean conversion time	113 ms
Peak memory (RSS)	19 MB

The Rust core has 104 unit tests and 9 integration tests, including visual regression tests that compare rendered PDFs against Word-generated reference documents.

dxpdf

The Problem

How dxpdf Solves It

What It Supports

Three Ways to Use It

Command-line tool

Rust library

Python package

Performance

Real-World Use Cases

Automated document pipelines

Regulated environments

Embedded and edge computing

Python web applications

Top Contributors

thenixan

markovdigital

Repository Info

Activity

Want to contribute?