1

Ingest

Gather attestations into the corpus from sources.

Overview

The Ingest stage is the entry point of the editorial pipeline. It receives raw attestation submissions and performs initial validation, normalization, and lemma discovery.

This stage ensures that incoming text is properly formatted, linguistically valid, and ready for deeper analysis in subsequent stages.

Responsibilities

  • Validate attestation (authentic language use?)
  • Normalize text (Unicode NFC, whitespace)
  • Discover lemmas (tokenize, lemmatize, rank by significance)
  • Score 34-dimension significance (once for entire attestation)
  • Create lemma affiliations (metadata linking lemma to attestation)

Output

IngestResponse
  • normalizedText
  • attestationSignificance (34 dimensions)
  • lemmas[] with affiliation metadata
Coming Soon

Detailed documentation, metrics, and live stage monitoring will be available here.

1 2 3 4