dxpdf

A fast, standalone DOCX-to-PDF converter written in Rust and powered by Skia — no Microsoft Office, LibreOffice, or cloud API required. Available as a CLI tool, Rust library, and Python package.

RustPython
1Star
0Forks
v0.1.4

The Problem

Converting Word documents to PDF is one of the most common tasks in business software. Invoices, contracts, reports, compliance forms — they start as .docx files and need to become PDFs for sharing, archiving, or printing.

The existing solutions all come with significant tradeoffs:

  • Microsoft Office / LibreOffice — requires installing a full office suite on every server. LibreOffice's headless mode is slow, memory-hungry, and produces inconsistent results across versions. Scaling means running multiple instances that consume gigabytes of RAM.
  • Cloud APIs (Google Docs, Adobe, CloudConvert) — adds latency, costs per conversion, and sends potentially sensitive documents to third-party servers. Not viable for regulated industries or air-gapped environments.
  • HTML-to-PDF tools (wkhtmltopdf, Puppeteer) — requires converting DOCX to HTML first, losing formatting fidelity. Tables, headers/footers, and page breaks rarely survive the round trip.

None of these work well when you need fast, accurate, offline conversion at scale — especially in automated pipelines, CI/CD systems, or embedded applications where installing LibreOffice is not an option.

How dxpdf Solves It

dxpdf is a standalone DOCX-to-PDF converter written in Rust and powered by Google's Skia graphics library. It reads .docx files directly, parses the OOXML structure, and renders pixel-accurate PDF output — all in a single binary with no external dependencies beyond Skia.

A Flutter-inspired measure-layout-paint pipeline ensures that text wrapping, table sizing, and page breaks match what Microsoft Word produces:

DOCX (ZIP) → Parse → Document Model → Measure → Layout → Paint → PDF

The result is a converter that runs in ~115ms on a 3-page document with tables and images, using just 19 MB of memory — fast enough to run inline in a request handler or batch-process thousands of documents.

What It Supports

dxpdf implements 34 OOXML features with full coverage, including:

  • Text formatting — bold, italic, underline, font size, font family, color, character spacing, superscript, subscript, run shading
  • Paragraphs — alignment (left, center, right), spacing, indentation, tab stops, borders, shading
  • Tables — column widths, cell margins with 3-level cascade, merged cells (horizontal and vertical), row heights, borders, cell shading, nested tables
  • Images — inline (PNG, JPEG, BMP, WebP) and floating/anchored with alignment and percentage-based positioning
  • Styles — paragraph and character styles with basedOn inheritance, document defaults, theme fonts
  • Headers and footers — text, images, page numbers (PAGE/NUMPAGES field codes)
  • Lists — bullets, decimal, lower/upper letter, lower/upper roman with counter tracking
  • Hyperlinks — rendered as clickable PDF link annotations
  • Sections — multiple page sizes and margins, section breaks, portrait and landscape orientations
  • Layout — automatic pagination, word wrapping, line spacing modes, floating image text flow

Three Ways to Use It

Command-line tool

Install and run with a single command:

cargo install dxpdf
dxpdf input.docx -o output.pdf

Rust library

One function call — bytes in, bytes out:

let docx_bytes = std::fs::read("document.docx")?;
let pdf_bytes = dxpdf::convert(&docx_bytes)?;
std::fs::write("output.pdf", &pdf_bytes)?;

For more control, inspect the parsed document model before rendering:

use dxpdf::{parse, model};

let document = parse::parse(&std::fs::read("document.docx")?)?;

for block in &document.blocks {
    match block {
        model::Block::Paragraph(p) => { /* inspect paragraph */ }
        model::Block::Table(t) => { /* inspect table */ }
    }
}

let pdf_bytes = dxpdf::convert_document(&document)?;

Python package

Install from PyPI and use in any Python application:

pip install dxpdf
import dxpdf

# Bytes in, bytes out
pdf_bytes = dxpdf.convert(open("input.docx", "rb").read())

# File to file
dxpdf.convert_file("input.docx", "output.pdf")

Performance

Benchmarked on Apple M3 Max with hyperfine (20 runs, 3 warmup), converting a 3-page document with 11 tables, 2 images, and 2 sections:

MetricValue
Mean conversion time113 ms
Peak memory (RSS)19 MB

The Rust core has 104 unit tests and 9 integration tests, including visual regression tests that compare rendered PDFs against Word-generated reference documents.

Real-World Use Cases

Automated document pipelines

CI/CD systems or batch processors that generate contracts, invoices, or reports from .docx templates. dxpdf runs as a single binary — no LibreOffice installation, no Docker image with a full desktop environment, no per-document API costs.

Regulated environments

Healthcare, legal, and financial applications where documents cannot leave the network. dxpdf runs fully offline with no external service calls, making it suitable for air-gapped and on-premises deployments.

Embedded and edge computing

IoT devices, kiosks, or lightweight containers where installing a 500 MB office suite is not practical. dxpdf's 19 MB memory footprint and sub-second conversion times make it viable for resource-constrained environments.

Python web applications

Django, Flask, or FastAPI backends that need to convert uploaded DOCX files on the fly. The Python bindings wrap the Rust core via PyO3, delivering native performance without a subprocess or external service.

No. dxpdf is a standalone converter that reads DOCX files directly and renders PDFs using Google's Skia graphics engine. It has no dependency on any office suite.

Top Contributors

thenixan

thenixan

99 commits

Repository Info

Technology Stack
RustPython
Version
v0.1.4
License
MIT
Contributor
1
Last Update
Mar 21, 2026

Activity

Star1
Forks0
Watcher1

Want to contribute?

View Repository