🏛️  UNESCO

Metadata Pipeline API

Multi-lingual DCAT-AP 3.0 metadata extraction from UNESCO documents — powered by GLiNER2, Graph RAG & quantized LLMs.

Open API Docs (Swagger) Health Check
DCAT-AP 3.0 SDG Classification 5 Languages UNESCO Thesaurus GLiNER2 Graph RAG Apache 2.0

Processing Pipeline

1
Parsing
PDF / text / URL
2
Extraction
GLiNER2 NER
3
Grounding
Graph RAG CoE
4
Validation
Thesaurus URIs
5
Aggregation
Hi-Transformer
6
Formatting
LLM → JSON-LD
7
Validation
Threshold guard

Key Endpoints

POST
/api/v1/process
Submit a document (PDF, UNESDOC ID, raw text, or URL) for metadata extraction.
GET
/api/v1/status/{id}
Check processing status with stage and progress percentage.
GET
/api/v1/result/{id}
Retrieve the complete DCAT-AP 3.0 JSON-LD metadata output.
POST
/api/v1/batch
Submit up to 100 documents in a single batch request.
GET
/api/v1/documents
List processed documents with filtering by country, year, or region.
GET
/health
Public health check — returns service status and version. No auth required.

Authentication

All endpoints except /health require an API key via the X-API-Key request header.
Example:  curl -H "X-API-Key: your-key" https://<space-url>/api/v1/status/test