UNESCO Metadata Pipeline

DCAT-AP 3.0 SDG Classification 5 Languages UNESCO Thesaurus GLiNER2 Graph RAG Apache 2.0

Processing Pipeline

Parsing

PDF / text / URL

Extraction

GLiNER2 NER

Grounding

Graph RAG CoE

Validation

Thesaurus URIs

Aggregation

Hi-Transformer

Formatting

LLM → JSON-LD

Validation

Threshold guard

Key Endpoints

POST

/api/v1/process

Submit a document (PDF, UNESDOC ID, raw text, or URL) for metadata extraction.

GET

/api/v1/status/{id}

Check processing status with stage and progress percentage.

GET

/api/v1/result/{id}

Retrieve the complete DCAT-AP 3.0 JSON-LD metadata output.

POST

/api/v1/batch

Submit up to 100 documents in a single batch request.

GET

/api/v1/documents

List processed documents with filtering by country, year, or region.

GET

/health

Public health check — returns service status and version. No auth required.

Authentication

All endpoints except /health require an API key via the X-API-Key request header.
Example: curl -H "X-API-Key: your-key" https://<space-url>/api/v1/status/test