PDF Annotation - HelpBack to WorkspaceHome

PDF Form Annotation Pipeline

This tool converts a raw PDF form into a clean Schema V3 definition that can drive a digital form experience. The schema contains only visible form structure: headings, printed text, fields, groups, tables, repeat groups, and widget ownership.

Upload PDFHome page
->
1. WidgetsAuto-extracted
->
2. LinkingLabel widgets
->
3. Form BuilderEdit Schema V3
->
Update DBMongoDB envelope

Schema V3 intentionally avoids presentation rules. You do not choose roles, layouts, table semantics, spans, or field presentations. The frontend maps the clean schema directly to the editor and preview.

1. Widget Extraction

Stage 1 shows all interactive PDF fields: text boxes, checkboxes, radio buttons, select controls, and signature fields. These source fields are called widgets.

Widget table

ActionWhat it does
LocateHighlights the widget on the PDF canvas.
HistoryShows the edit history for this widget.
DeleteRemoves the widget from the source extraction.

Drawing a new widget

Press D to enter draw mode, then drag on the PDF canvas to define the widget box. A popup lets you choose the widget type.

2. Widget Linking

Stage 2 labels source widgets and helps identify widgets that still need review. Stage 3 uses these widgets as the allowed source IDs when building Schema V3.

PanelMeaning
Linked WidgetsWidgets with a reviewed label.
Missed WidgetsWidgets that still need a label or connection.

3. Form Builder

Stage 3 edits Schema V3 directly. The left panel shows pages and ordered page items. Select any item to edit it in the inspector. Changes auto-save as a draft; use Update DB when the schema is ready to become the pipeline output.

Important: Old Schema V2 drafts are not converted. If the editor finds V2 structure, re-run Stage 3 extraction to produce Schema V3.

Schema V3 Structure

A Schema V3 document is a clean form definition. Extraction status, provider, model, usage, and page diagnostics live outside the schema in the extraction envelope.

schema schema_version, title, page_count, pages
page page number plus ordered items
items section, text, field, group, table, repeat_group

Page items are flat and ordered. A section is only a marker heading. Groups, tables, and repeat groups are the containers for nested items.

Item Types

KindUse it for
sectionPrinted section headings or major dividers.
textAny printed prose that is not an answer field: instructions, legal text, warnings, notes, or paragraphs.
fieldOne logical answer field with one widget carrier.
groupA labeled collection of related nested items.
tableRows and cells, including irregular tables where each row has its own cell count.
repeat_groupRepeated records such as providers, contacts, dependents, jobs, or services.

What the JSON Looks Like

Envelope

{
  "schema": {
    "schema_version": 3,
    "title": "Employment Form",
    "page_count": 3,
    "pages": [
      { "page": 1, "items": [] }
    ]
  },
  "extraction": {
    "file_id": "64a1f3...",
    "cached": false,
    "version": 0,
    "provider": "openrouter",
    "model": "openai/gpt-4o-mini",
    "usage": {},
    "api_error": false,
    "parse_error": false,
    "processing": false,
    "page_status": []
  }
}

Simple field

{
  "kind": "field",
  "label": "First Name",
  "input_type": "text",
  "widgets": [
    { "widget_id": "page1-widget6" }
  ]
}

Split field

{
  "kind": "field",
  "label": "Date of Birth",
  "input_type": "date",
  "parts": [
    { "label": "Month", "widget_id": "page1-widget1" },
    { "label": "Day", "widget_id": "page1-widget2" },
    { "label": "Year", "widget_id": "page1-widget3" }
  ]
}

Choice field

{
  "kind": "field",
  "label": "Please check all that apply.",
  "input_type": "checkbox",
  "choices": [
    { "label": "Absent Parent", "widget_id": "page1-widget13" },
    { "label": "Child Care", "widget_id": "page1-widget15" },
    {
      "label": "Other",
      "widget_id": "page1-widget35",
      "details": {
        "label": "Other",
        "input_type": "text",
        "widgets": [
          { "widget_id": "page1-widget32" }
        ]
      }
    }
  ]
}

Group and repeat group

{
  "kind": "group",
  "label": "Applicant Name",
  "items": []
}

{
  "kind": "repeat_group",
  "label": "Providers",
  "item_label": "Provider",
  "items": [
    { "label": "Provider 1", "items": [] }
  ]
}

Field Types and Widget Carriers

Every field has an input_type and exactly one widget carrier.

Input typeUse it for
textNames, addresses, IDs, free-form answers.
dateDates, including split month/day/year widgets.
numberAmounts, counts, years, scores.
checkboxSingle checkbox or select-all-that-apply groups.
radioMutually exclusive options.
selectDropdown or list selection widgets.
signatureSignature or initials fields.
CarrierUse it when
widgets[]The logical field is backed by one normal widget, or a simple list of equivalent widgets.
parts[]One answer is split across multiple widgets, such as date parts or SSN boxes.
choices[]Each option has its own widget, such as radio buttons or checkbox lists. A choice may use details for an attached Other/Specify input.

Irregular Tables

Schema V3 tables do not need row spans, column spans, or a regular/irregular setting. Each row owns its own cells. A row may have two cells, the next may have three, and another may include an empty cell.

{
  "kind": "table",
  "label": "Assessment",
  "rows": [
    {
      "cells": [
        { "kind": "text", "text": "Question" },
        { "kind": "text", "text": "Answer" },
        { "kind": "text", "text": "Comments" }
      ]
    },
    {
      "cells": [
        { "kind": "text", "text": "Pain level" },
        {
          "kind": "field",
          "label": "Pain level",
          "input_type": "number",
          "widgets": [{ "widget_id": "page2-widget1" }]
        }
      ]
    },
    {
      "cells": [
        { "kind": "text", "text": "Mobility notes" },
        {
          "kind": "field",
          "label": "Mobility notes",
          "input_type": "text",
          "widgets": [{ "widget_id": "page2-widget2" }]
        },
        { "kind": "empty" }
      ]
    }
  ]
}

Decision Guide

Which item kind?

Printed heading with no child ownershipsection
Instructions, legal notice, warning, note, or paragraphtext
One logical answerfield
Related items under one labelgroup
Rows and cells, even if row widths varytable
Repeated records such as providers or dependentsrepeat_group

Which carrier?

Normal single text, number, date, select, signature, or checkboxwidgets[]
One answer split into multiple boxesparts[]
Radio choices or checkbox-list choiceschoices[]
Other, Specify, or Explain text attached to one choicechoices[].details

PDF Widget Linking

A widget reference connects a Schema V3 field to a physical PDF widget. The reference is always a widget_id inside widgets[], parts[], choices[], or a choice's optional details.

  1. Select a field in the Stage 3 tree.
  2. Choose the carrier mode in the field inspector.
  3. Select the target slot, part, or choice.
  4. Click the matching PDF widget in link mode.
Coverage rule: Every extracted widget must appear exactly once before Update DB, including attached detail inputs under choices[].details. Missing widgets are shown as unresolved work.

Validation and Review

Click Checks or press Ctrl+K to run validation. Update DB blocks when Schema V3 is malformed or PDF widget coverage is incomplete.

CheckMeaning
schema_version === 3V2 and malformed drafts are rejected.
Supported item kindsOnly section, text, field, group, table, and repeat_group are allowed.
One field carrierA field must use exactly one of widgets, parts, or choices. Choice details must use exactly one of widgets or parts.
Duplicate widget IDsThe same widget cannot be owned by two fields.
Unknown widget IDsReferences must exist in the source PDF widgets when those are available.
Missing widgetsWarnings while editing, blocking errors during Update DB.

Keyboard Shortcuts

ShortcutActionContext
Ctrl+ZUndoAll stages
Ctrl+Y / Ctrl+Shift+ZRedoAll stages
Ctrl+SSave draftAll stages
Alt+LeftPrevious PDFAll stages
Alt+RightNext PDFAll stages
Left / RightPrevious / next pageStages 1-3
Ctrl+KOpen checksStage 3
DToggle draw modeStage 1
EscClose modal or cancel active modeAll stages

Common Errors

"Schema V2 structured JSON is not supported"

The draft or cache uses the old V2 shape. Re-run Stage 3 extraction so the backend produces Schema V3.

"Field must use exactly one widget carrier"

Open the field inspector and choose one mode: widgets, parts, or choices. Remove the other carriers.

"Duplicate widget ID"

The same PDF widget is referenced by more than one field. Keep the first correct owner and unlink the duplicate reference.

"Missing extracted widget"

A source PDF widget is not owned by any field. Link it to the right field, or keep it in the Unresolved fields group until you decide where it belongs.

"Unknown widget ID"

The schema references a widget ID that does not exist in the current PDF widget list. Remove the bad reference or relink the field to a valid widget.