
Upload a PDF, extract its form fields, review AI labels, and produce a validated Schema V3 structure ready for downstream automation.
Drag & drop your PDF
or click to browse
Upload target
Form filler
Open completed forms and download filled PDFs.
Platform Demo
CM-PDF is a human-in-the-loop pipeline for turning messy PDF forms into reliable structured data. AI does the heavy extraction work, while reviewers keep control over labels, widget ownership, table structure, and the final database update.
The platform stores the source PDF, detects duplicates, and prepares it for extraction.
AcroForm fields are detected with page, type, bounds, and nearby OCR text for reviewer context.
A vision model proposes human-readable labels, then reviewers correct missed or ambiguous fields.
The form becomes clean JSON: sections, text, fields, groups, irregular tables, and repeat groups.
Validation blocks duplicate, unknown, or missing widget ownership before saving the final output.
Live workflow preview
Stage 3 Form Builder
{
"schema_version": 3,
"title": "Patient Intake",
"pages": [{
"page": 1,
"items": [
{ "kind": "section",
"label": "Applicant" },
{ "kind": "field",
"label": "First name",
"input_type": "text",
"widgets": [
{ "widget_id": "page1-widget1" }
] },
{ "kind": "field",
"label": "Consent?",
"input_type": "radio",
"choices": [
{ "label": "Yes",
"widget_id": "page1-widget3" },
{ "label": "No",
"widget_id": "page1-widget4" }
] }
]
}]
}Review every extracted PDF widget against the original page.
Move through extraction, linking, and schema review in one flow.
Save validated Schema V3 output for automation and training data.