CM-PDF

Convert PDF forms
into clean JSON

Upload a PDF, extract its form fields, review AI labels, and produce a validated Schema V3 structure ready for downstream automation.

01 Widget extraction
02 Human review
03 Schema V3

Drag & drop your PDF

or click to browse

Upload target

Form filler

Open completed forms and download filled PDFs.

Platform Demo

How CM-PDF works

CM-PDF is a human-in-the-loop pipeline for turning messy PDF forms into reliable structured data. AI does the heavy extraction work, while reviewers keep control over labels, widget ownership, table structure, and the final database update.

1

Upload a PDF form

The platform stores the source PDF, detects duplicates, and prepares it for extraction.

2

Extract PDF widgets

AcroForm fields are detected with page, type, bounds, and nearby OCR text for reviewer context.

3

Link widgets to labels

A vision model proposes human-readable labels, then reviewers correct missed or ambiguous fields.

4

Build Schema V3

The form becomes clean JSON: sections, text, fields, groups, irregular tables, and repeat groups.

5

Review and update DB

Validation blocks duplicate, unknown, or missing widget ownership before saving the final output.

Live workflow preview

Stage 3 Form Builder

18/18 widgetsSchema V3
PDF widgetsPage 1
page1-widget1 - First name
page1-widget2 - Date of birth
Yes
No
reviewer links fields to source widgets
Clean schema outputstored in MongoDB
{
  "schema_version": 3,
  "title": "Patient Intake",
  "pages": [{
    "page": 1,
    "items": [
      { "kind": "section",
        "label": "Applicant" },
      { "kind": "field",
        "label": "First name",
        "input_type": "text",
        "widgets": [
          { "widget_id": "page1-widget1" }
        ] },
      { "kind": "field",
        "label": "Consent?",
        "input_type": "radio",
        "choices": [
          { "label": "Yes",
            "widget_id": "page1-widget3" },
          { "label": "No",
            "widget_id": "page1-widget4" }
        ] }
    ]
  }]
}

Precise Highlighting

Review every extracted PDF widget against the original page.

Smart Workspace

Move through extraction, linking, and schema review in one flow.

Universal Export

Save validated Schema V3 output for automation and training data.