Sources & cadence

The 12 MVP sources, what we fetch, how often, and under which licence.

IDSourceStrategyCadenceLicence
eurlexEUR-Lex (CELEX 32024R1689 family)SPARQL30 minDecision 2011/833/EU
aiofficeEU AI Office (Drupal news)RSS30 minDecision 2011/833/EU
aiboardEU AI Board (opinions, recommendations)HTML scrape2 hDecision 2011/833/EU
codeofpracticeGPAI Code of PracticeHTML scrape6 hDecision 2011/833/EU
haveyoursayEU Have-Your-Say (AI initiatives)HTML scrape12 hDecision 2011/833/EU
cenCEN/CENELEC JTC 21HTML scrape (metadata only)12 hStandards metadata only — full texts not redistributed
bnetzaBundesnetzagentur (DE market authority)HTML scrape4 h§5 UrhG (DE official works)
bfdiBfDI (DE data protection)HTML scrape6 h§5 UrhG
bsiBSI (DE cybersecurity)HTML scrape6 h§5 UrhG
cnilCNIL (FR data protection)RSS30 minEtalab Licence Ouverte 2.0
nl-algoritmeregisterNL AlgoritmeregisterHTML scrape4 hCC0
oecdaiOECD.AI Policy ObservatoryHTML scrape12 hCC-BY 4.0

What "cadence" actually means

Each source worker runs on a Cloudflare cron trigger at the listed interval. Inside each run we fetch the source, diff against the previous snapshot, and fan out only changed/new items to enrichment. Cadence is the upper bound on detection latency — typical end-to-end "publish → webhook" is the cron interval plus 1-2 minutes for enrichment + delivery.

What we don't do with the source

Sources for Phase 2

On the roadmap, not yet wired: