1
Ingest
Gather attestations into the corpus from sources.
Overview
The Ingest stage is the entry point of the editorial pipeline. It receives raw attestation submissions and performs initial validation, normalization, and lemma discovery.
This stage ensures that incoming text is properly formatted, linguistically valid, and ready for deeper analysis in subsequent stages.
Responsibilities
- Validate attestation (authentic language use?)
- Normalize text (Unicode NFC, whitespace)
- Discover lemmas (tokenize, lemmatize, rank by significance)
- Score 34-dimension significance (once for entire attestation)
- Create lemma affiliations (metadata linking lemma to attestation)
Output
IngestResponse
- normalizedText
- attestationSignificance (34 dimensions)
- lemmas[] with affiliation metadata
Coming Soon
Detailed documentation, metrics, and live stage monitoring will be available here.