Extract structured data

curl --request POST \ --url https://api.autype.com/api/v1/dev/tools/lens/extract \ --header 'Content-Type: application/json' \ --header 'X-API-Key: <api-key>' \ --data ' { "fileId": "550e8400-e29b-41d4-a716-446655440000", "fields": { "invoiceNumber": { "type": "string", "description": "Invoice number" }, "totalAmount": { "type": "number", "description": "Total amount including VAT" }, "date": { "type": "date", "description": "Invoice date" }, "lineItems": { "type": "array", "description": "List of line items with description and amount" } }, "pages": [ "1", "3-5" ], "webhook": { "webhookUrl": "https://example.com/webhook", "webhookAuth": { "headerName": "X-API-Key", "headerValue": "my-secret-key", "basicAuthUsername": "user", "basicAuthPassword": "pass" } } } '

{ "id": "550e8400-e29b-41d4-a716-446655440000", "action": "pdf.merge", "status": "PENDING", "inputFileIds": [ "file-id-1" ], "outputFileId": null, "error": null, "createdAt": "2023-11-07T05:31:56Z", "startedAt": {}, "completedAt": {}, "result": {}, "metadata": {} }

Authorizations

X-API-Key

string

header

required

API Key (starts with ak_...)

Body

application/json

fileId

string

required

File ID of the document to extract data from (PDF, DOCX, ODT, or Markdown)

Example:

"550e8400-e29b-41d4-a716-446655440000"

fields

object

required

Fields to extract from the document. Keys are field names, values define the expected type and optional description. Maximum 30 fields.

Example:

{
  "invoiceNumber": {
    "type": "string",
    "description": "Invoice number"
  },
  "totalAmount": {
    "type": "number",
    "description": "Total amount including VAT"
  },
  "date": {
    "type": "date",
    "description": "Invoice date"
  },
  "lineItems": {
    "type": "array",
    "description": "List of line items with description and amount"
  }
}

pages

string[]

Page specifications (e.g. "1", "2-5", "3-"). If omitted, all pages are processed (up to 50 pages). Only applicable to PDF files.

Example:

["1", "3-5"]

webhook

object

Optional webhook configuration

Show child attributes

Response

201 - application/json

Extraction job created

string

required

Job ID

Example:

"550e8400-e29b-41d4-a716-446655440000"

action

string

required

Action that was performed

Example:

"pdf.merge"

status

enum<string>

required

Current job status

Available options:

PENDING,

PROCESSING,

COMPLETED,

FAILED

Example:

"PENDING"

inputFileIds

string[]

required

Input file IDs used for this job

Example:

["file-id-1"]

outputFileId

object

required

Output file ID (available when COMPLETED)

Example:

null

error

object

required

Error message (available when FAILED)

Example:

null

createdAt

string<date-time>

required

Job creation timestamp

startedAt

object

required

Job start timestamp

completedAt

object

required

Job completion timestamp

result

object

Structured job result data (e.g. OCR markdown/JSON, generated filename, PDF metadata, form fields). Available when the job produces a direct result instead of an output file.

metadata

object

deprecated

Deprecated — use result instead. Additional metadata, duplicated from result for backward compatibility.