Extract structured data from a document based on a user-defined field schema. Define field names, types (string, number, boolean, date, array), and optional descriptions. The AI reads the document and returns a JSON object with the extracted values. For PDF files, you can optionally specify which pages to process. Cost: 3 credits per page.
API Key (starts with ak_...)
File ID of the document to extract data from (PDF, DOCX, ODT, or Markdown)
"550e8400-e29b-41d4-a716-446655440000"
Fields to extract from the document. Keys are field names, values define the expected type and optional description. Maximum 30 fields.
{
"invoiceNumber": {
"type": "string",
"description": "Invoice number"
},
"totalAmount": {
"type": "number",
"description": "Total amount including VAT"
},
"date": {
"type": "date",
"description": "Invoice date"
},
"lineItems": {
"type": "array",
"description": "List of line items with description and amount"
}
}Page specifications (e.g. "1", "2-5", "3-"). If omitted, all pages are processed (up to 50 pages). Only applicable to PDF files.
["1", "3-5"]Optional webhook configuration
Extraction job created
Job ID
"550e8400-e29b-41d4-a716-446655440000"
Action that was performed
"pdf.merge"
Current job status
PENDING, PROCESSING, COMPLETED, FAILED "PENDING"
Input file IDs
["file-id-1", "file-id-2"]Output file ID (available when COMPLETED)
null
Error message (available when FAILED)
null
Job creation timestamp
Job start timestamp
Job completion timestamp
Structured job result data (e.g. OCR markdown/JSON, generated filename, PDF metadata, form fields). Available when the job produces a direct result instead of an output file.
Deprecated — use result instead. Additional metadata, duplicated from result for backward compatibility.