PaddleOCR is an OCR and document intelligence toolkit in the Baidu PaddlePaddle ecosystem. It covers general text recognition, document layout analysis, table parsing, formula recognition, and other capabilities for common document-processing scenarios such as scanned documents, photographed documents, multi-page PDFs, and technical documents.
X-AnyLabeling integrates these capabilities through a dedicated PaddleOCR panel for document understanding and intelligent text recognition workflows. The panel supports layout parsing, text recognition, formula recognition, and table recognition for images and PDF files. After parsing is complete, you can review, edit, copy, and export the recognized results.
Two service modes are supported: you can call the official PaddleOCR API directly, or connect to a PaddleOCR model served through an X-AnyLabeling-compatible remote inference service. Parsed results are saved as local JSON files and displayed in the interface together with source-file regions, structured content, and editable result blocks.
PaddleOCR-Demo.mp4
The X-AnyLabeling client supports the official PaddleOCR API by default, so you do not need to deploy an additional inference service. When the PaddleOCR panel is opened for the first time and no API information has been configured, the PPOCR API Settings dialog appears automatically. Enter the corresponding API_KEY to enable official API parsing. To update the configuration later, click the gear button at the top of the right-side result panel.
The following official API model options are currently supported in the parsing-model drop-down list:
PaddleOCR-VL-1.5 (API)PaddleOCR-VL (API)
PaddleOCR-API.mp4
To obtain the required API information:
- Visit the PaddleOCR website.
- Open the API call example, switch to the
Async Parsetab, and copy theAPI_KEY. - Return to
PPOCR API Settingsin X-AnyLabeling, paste the key, and confirm.
The configuration is saved locally:
${workspace}/xanylabeling_data/paddleocr/api_settings.json
By default, ${workspace} is the user's home directory, ~. If X-AnyLabeling is started with --work-dir, that directory is used instead.
If you want to run PaddleOCR locally or in a private environment, you can deploy an inference service with X-AnyLabeling-Server. For the setup process, see this example, install the required dependencies, and start the service.
Make sure X-AnyLabeling-Server has been updated to the latest version, and check that the ppocr_layoutstructv3_vl_1_5 model configuration is available. If you want to implement and integrate your own PaddleOCR inference pipeline, declare the following capability flag in the model configuration. The client uses this flag to determine whether the model can be used in the PaddleOCR panel:
...
capabilities:
ppocr_pipeline: true
...After the service starts, reopen the PaddleOCR annotation panel. The Parsing Model drop-down list at the top right displays the currently available models. When you select a model other than the official (API) entries, parsing requests are sent to the deployed inference service automatically.
You can open PaddleOCR in either of the following ways: click the PaddleOCR icon in the left toolbar, or use the Ctrl+4 shortcut.
After the panel opens, click + New Parsing at the top of the left panel to import a file. The imported file is copied to the local PaddleOCR workspace and added to the parsing queue automatically.
The X-AnyLabeling PaddleOCR panel currently supports the following file types:
| Type | Extensions |
|---|---|
| PDF document | .pdf |
| Image | .bmp, .cif, .gif, .jpeg, .jpg, .png, .tif, .tiff, .webp |
PDF files are first rendered locally into per-page PNG previews. Official API parsing submits the original PDF once through Async Jobs, while remote-service parsing continues to use the local preview pages. For multi-page PDFs, the page count, preview images, and recognition results are all retained in the local workspace.
Tip
- Hold
Ctrland scroll the mouse wheel in the source-file preview area to zoom the preview page quickly. - Click any block in the left preview area or the right result area to match and highlight the corresponding content on both sides.
- Double-click a block in the right-side recognition result area, or click the block's
Correctbutton, to enter edit mode. - Hover over a block in the source-file preview area and click the floating
Copybutton to copy that block's content. - For multi-page PDFs, use the page controls at the bottom to jump between pages, or scroll through the page-separated parsing results in the right result area.
- After you manually correct recognition results, the edited blocks are recorded in the JSON file. To fetch model results again, use the reparse button on the right.
Note
- The official PaddleOCR API requires a valid
API_KEY. If the API returns401, check whether the key is valid. - Remote-service models appear in the model drop-down list only when
/v1/modelsreturns models with theppocr_pipelinecapability. - Imported files are copied to the PaddleOCR workspace. Deleting the original external file does not affect the imported copy.
The PaddleOCR panel consists of three main areas:
| Area | Description |
|---|---|
| Left file navigation panel | Import files, view recent files, view favorites, search, filter, and delete files |
| Middle source-file preview area | Display images or PDF pages with PaddleOCR layout blocks, polygon boxes, and category colors overlaid |
| Right parsing result area | Switch between Document parsing and JSON views, and copy, download, reparse, or edit recognized blocks |
Note
The colored dot in the lower-left corner of each file item in the left navigation panel indicates the parsing status:
- Blue means pending or parsing.
- Green means parsing completed.
- Red means parsing failed.
| Location | Button/Component | Function |
|---|---|---|
| Top left | + New Parsing |
Import an image or PDF and start parsing automatically |
| Left navigation | Recent |
Show recently imported and parsed files |
| Left navigation | Favorites |
Show favorited files only |
| Left navigation | Search button | Expand the filename search box |
| Left navigation | Filter button | Filter by sorting rule, file type, and parsing status |
| Left file item | Star button | Add or remove the current file from favorites |
| Left file item | Delete button | Delete the source file, JSON file, PDF preview pages, block screenshots, and other related data |
| Middle page bar | Left/right arrows | Switch to the previous or next PDF page |
| Middle page bar | Page number input | Jump to a specific PDF page |
| Middle page bar | Zoom out / zoom in buttons | Zoom the source-file preview area |
| Middle page bar | Reset zoom button | Restore the preview scale to fit width |
| Source-file preview area | Floating Copy |
Copy the content of the currently hovered block |
| Top right | Parsing Model |
Select an official (API) model or a remote PaddleOCR model |
| Right view | Document parsing |
View layout blocks, text, formulas, tables, and images as cards |
| Right view | JSON |
View the complete JSON result for the current file |
| Right tools | Gear button | Configure the official PaddleOCR API_KEY |
| Right tools | Reparse button | Reparse the current file |
| Right tools | Copy button | Copy Markdown content in the document view, or copy JSON in the JSON view |
| Right tools | Download button | Download a ZIP file in the document view, or download JSON in the JSON view |
| Result block card | Copy |
Copy the content of a single block |
| Result block card | Correct |
Enter edit mode for the current block |
| Parsing banner | Cancel Parsing |
Cancel the current batch parsing task |
| Parsing-failed banner | Copy Log |
Copy the error log |
| Parsing-failed banner | Reparse |
Reparse the failed file |
After parsing is complete, the right-side Document parsing view displays blocks in layout order. Different block types use different colors:
| Type | Example Labels | Color Meaning |
|---|---|---|
| Text | text, doc_title, paragraph_title, footer, seal, etc. |
Blue |
| Table | table |
Green |
| Image | image, chart, header_image, footer_image |
Purple |
| Header | header |
Light purple |
| Formula | display_formula, formula, formula_number, algorithm |
Yellow |
| Edited block | Any block | Orange border or edit-state marker |
The following editors are currently supported:
| Editor | Trigger Scenario | Description |
|---|---|---|
| Rich text editor | Plain text, titles, footers, seals, and other non-table or non-formula content | Supports basic rich text editing and saves the result as Markdown/text content |
| LaTeX formula editor | display_formula, formula, formula_number, algorithm |
Supports editing LaTeX source and renders a live preview below |
| Table editor | table or content recognized as a table structure |
Supports cell editing, selection copying, adding and deleting rows or columns, and basic text styling |
Warning
If an item contains many formulas, the first time you open it or scroll to the corresponding result block may take a short while. This is mainly caused by formula preview rendering. Rendered results are cached, so the same content does not need to be rendered again on subsequent loads.
The PaddleOCR panel saves imported files and parsed results in the local workspace:
${workspace}/xanylabeling_data/paddleocr/
├── api_settings.json
├── ui_state.json
├── files/
│ ├── example.pdf
│ ├── image.png
│ ├── __PDF_example/
│ │ ├── page_001.png
│ │ └── page_002.png
│ └── __BLOCK_IMAGES_image.png/
│ └── page_001_block_0001.png
└── jsons/
├── example.pdf.json
└── image.png.json
| Path | Description |
|---|---|
api_settings.json |
Cached API_KEY, selected API model, and compatibility API URL for the official PaddleOCR API |
ui_state.json |
UI state, such as the list of favorited files |
files/ |
Local copies of imported files |
files/__PDF_<filename>/ |
Per-page PNG previews rendered from a PDF |
files/__BLOCK_IMAGES_<filename>/ |
Local crops for image-type blocks |
jsons/<filename>.json |
PaddleOCR parsing results and edited results for the current file |
Note
Deleting a file item in the left panel also deletes the source file, local JSON file, PDF preview pages, and block crop images.
Each imported file has a corresponding JSON file. The core structure is as follows:
{
"layoutParsingResults": [
{
"prunedResult": {
"page_count": 1,
"width": 1240,
"height": 1754,
"model_settings": {
"pipeline_model": "PaddleOCR-VL-1.5",
"api_mode": "async_jobs"
},
"parsing_res_list": [
{
"block_label": "text",
"block_content": "Recognized text content",
"block_bbox": [100, 120, 500, 180],
"block_id": 1,
"block_order": 1,
"group_id": 1,
"global_block_id": 1,
"global_group_id": 1,
"block_polygon_points": [
[100, 120],
[500, 120],
[500, 180],
[100, 180]
]
}
]
},
"markdown": {
"text": "Full-page Markdown content",
"images": {
"page_1:block_1": "files/__BLOCK_IMAGES_image.png/page_001_block_0001.png"
}
},
"outputImages": {},
"inputImage": "files/image.png"
}
],
"preprocessedImages": [],
"dataInfo": {
"type": "image",
"numPages": 1,
"pages": [
{
"width": 1240,
"height": 1754
}
]
},
"_ppocr_meta": {
"status": "parsed",
"source_path": "files/image.png",
"updated_at": "2026-04-18 12:00:00",
"error_message": "",
"edited_blocks": [],
"block_image_paths": {},
"pipeline_model": "PaddleOCR-VL-1.5",
"api_mode": "async_jobs",
"api_job_id": "39373553546153984"
}
}Key fields:
| Field | Description |
|---|---|
layoutParsingResults |
Page-level parsing results. Images usually have one page, while PDFs can have multiple pages |
prunedResult.parsing_res_list |
Block list for the current page |
block_label |
Block type, such as text, table, display_formula, or image |
block_content |
Recognized content that can be viewed, copied, and edited |
block_bbox |
Rectangular bounding box of the block, in [x1, y1, x2, y2] format |
block_polygon_points |
Polygon points of the block, used for highlighting in the preview area |
markdown.text |
Markdown text returned by PaddleOCR or generated by combining blocks |
markdown.images |
Mapping of image resources referenced in the Markdown |
dataInfo.type |
File type. Valid values are image and pdf |
dataInfo.numPages |
Number of pages |
_ppocr_meta.status |
Parsing status: pending, parsed, or error |
_ppocr_meta.edited_blocks |
List of block keys that have been manually edited |
_ppocr_meta.block_image_paths |
Local resource paths for image-type blocks |
_ppocr_meta.pipeline_model |
Parsing model that generated the result |
_ppocr_meta.api_mode |
Official API mode, such as async_jobs |
_ppocr_meta.api_job_id |
Official Async Jobs task id, when available |
The download button on the right exports different content depending on the current view:
| Current View | Downloaded Content |
|---|---|
Document parsing |
A ZIP file containing doc_0.md, image resources under imgs/, and layout detection visualizations named layout_det_res_*.jpg |
JSON |
The complete JSON for the current file. The default filename is <original_filename>_by_<model_name>.json |