Sources & cadence

The 12 MVP sources, what we fetch, how often, and under which licence.

ID	Source	Strategy	Cadence	Licence
`eurlex`	EUR-Lex (CELEX 32024R1689 family)	SPARQL	30 min	Decision 2011/833/EU
`aioffice`	EU AI Office (Drupal news)	RSS	30 min	Decision 2011/833/EU
`aiboard`	EU AI Board (opinions, recommendations)	HTML scrape	2 h	Decision 2011/833/EU
`codeofpractice`	GPAI Code of Practice	HTML scrape	6 h	Decision 2011/833/EU
`haveyoursay`	EU Have-Your-Say (AI initiatives)	HTML scrape	12 h	Decision 2011/833/EU
`cen`	CEN/CENELEC JTC 21	HTML scrape (metadata only)	12 h	Standards metadata only — full texts not redistributed
`bnetza`	Bundesnetzagentur (DE market authority)	HTML scrape	4 h	§5 UrhG (DE official works)
`bfdi`	BfDI (DE data protection)	HTML scrape	6 h	§5 UrhG
`bsi`	BSI (DE cybersecurity)	HTML scrape	6 h	§5 UrhG
`cnil`	CNIL (FR data protection)	RSS	30 min	Etalab Licence Ouverte 2.0
`nl-algoritmeregister`	NL Algoritmeregister	HTML scrape	4 h	CC0
`oecdai`	OECD.AI Policy Observatory	HTML scrape	12 h	CC-BY 4.0

What "cadence" actually means

Each source worker runs on a Cloudflare cron trigger at the listed interval. Inside each run we fetch the source, diff against the previous snapshot, and fan out only changed/new items to enrichment. Cadence is the upper bound on detection latency — typical end-to-end "publish → webhook" is the cron interval plus 1-2 minutes for enrichment + delivery.

What we don't do with the source

We do not redistribute full texts of CEN/ISO/ETSI standards (they're paywalled — only metadata).
We do not republish individual consultation responses from third parties (rights belong to authors — only metadata).
We do not aggregate FLI artificialintelligenceact.eu as a primary source — it's a curated explorer; we link to it where helpful.

Sources for Phase 2

On the roadmap, not yet wired:

ISO/IEC SC 42 (standards metadata)
ETSI SAI
AESIA (Spain)
AgID + Garante (Italy)
IMY (Sweden)
EuGH/InfoCuria for the first AI Act case law (probably 2026/27)