Documentation v1
Lawstronaut MCP
First public draft, prepared for submission to Anthropic. Exposes filerskeepers' global legal-data corpus to AI agents through the Model Context Protocol.
1. What the Lawstronaut MCP is
Audience: everyone.
The Lawstronaut MCP server exposes filerskeepers' global legal-data corpus to AI agents and applications through the Model Context Protocol. It is a read-only research interface over the Lawstronaut crawled-data warehouse — a normalized store of legislation, case law, regulatory guidance, official notices, prospectuses, and government publications harvested from official portals in 69 countries and 63 sub-state jurisdictions (US states, German Länder, Belgian regions, the Emirate of Abu Dhabi, and others).
At a glance
132
Jurisdictions live
15
MCP tools
Corpus scale (2026-05-27): 132 jurisdictions, several hundred portals, millions of documents. The Belgian sub-corpus alone exposes 45+ official portals; a single query (list_documents iso="BE") reports total_count: 1,867,741. Ireland alone reports 47,501 documents on the Irish Statute Book portal.
What the MCP is good for:discovering what legal sources exist for a jurisdiction; navigating Lawstronaut's normalized legal taxonomy (Private/Public/Misc → subdomain → category → subcategory → law_type); pulling document metadata, plain text, markdown, or a PDF download URL; running freshness scans ("what landed in the last 7 days?").
What the MCP is not: it is not a legal-advice engine, a court-document filing tool, or a write/edit interface. It does not perform full-text semantic search across the entire corpus — filters are metadata-driven (jurisdiction, portal, authority, dates, IDs, title-substring).
Tool count: 15 tools, all under the namespace prefix mcp__ee059666-bffd-44c4-8cd5-086bd3ffcf5a__*.
2. Connection model & authentication
Audience: developers, external consumers.
The MCP is hosted by filerskeepers. Connection is established through Anthropic's MCP transport (HTTP-over-SSE or stdio depending on deployment). In a Cowork / Claude Code session the user authenticates through the standard MCP connector flow; the server validates the bearer token issued by the Lawstronaut developer portal at dev-portal.filerskeepersapi.co (Microsoft SSO).
Once connected, the 15 tools appear in the client's available-tools list under the namespace prefix mcp__ee059666-bffd-44c4-8cd5-086bd3ffcf5a__. The trailing 36-character identifier is the server's stable installation ID; it does not change across sessions but is opaque to consumers and should not be parsed.
There is no per-tool authentication — once the MCP session is authenticated, all tools are callable. Rate limits and entitlements (which jurisdictions / portals the token may query) are enforced server-side based on the user's Lawstronaut subscription.
Recommended quick check: list_jurisdictions takes no arguments, returns 132 items, and is the fastest way to confirm the connection is alive.
3. The Lawstronaut data model in one page
Audience: everyone, especially legal PMs and LLM consumers.
Lawstronaut organises legal content along two orthogonal axes: a geographic/institutional axis (where the law lives and who issued it) and a legal-taxonomy axis (what kind of law it is).
Geographic / institutional axis
Jurisdiction (iso: "BE", "IE", "US_CA", "EU", ...)
└─ Portal (e.g. "Belgium Federal Constitutional Court (Dutch)")
└─ Issuing Authority (e.g. "Constitutional Court")
└─ Authority Type (e.g. "Arresten", "Act", "Prospectus")
└─ Document (versioned, with text, markdown, optional PDF)Each portal exposes one or more authority_type values and one or more issuing_authority values. list_authority_types and list_issuing_authorities both pivot on jurisdiction and optionally portal.
Legal taxonomy axis
A 4-level hierarchy with deterministic dotted IDs:
Domain (A, B, C) — 3 entries (Private law, Public Law, Miscellaneous)
└─ Subdomain (A.1, A.2, ...)
└─ Category (A.1.1, A.2.5, ...) — 81 entries
└─ Subcategory (A.2.5.1, A.2.5.16, ...) — 21+ per category
└─ Law type (numeric id, e.g. 1507) — fine-grained typing
Concrete example:
A Private law
└── A.2 National law
└── A.2.5 Corporate law
└── A.2.5.1 Company formation and incorporation
└── 1507 Company Formation Policy and Framework Laws
└── 1508 Incorporation Procedures Laws
└── 1509 Company Name Registration LawsThe Document object
Every document carries (at minimum) document_id, title, jurisdiction, portal, portal_name, issuing_authority, type_of_authority, language, legal_link, version, status, repealed, crawling_date, last_updated, and an optional file_data block when an underlying PDF/binary is attached.
Date fields are present on the schema but often empty for documents that don't apply. Treat empty strings as "unknown / not extracted", not as "absent in reality".
4. Quickstart
Audience: developers and external consumers.
Three-call workflow to go from "I know nothing" to "I have the full text of a recent Irish Act":
Step 1 — Confirm the corpus has what you want:
list_jurisdictions() → 132 entries; pick "IE"
Step 2 — See what new content arrived recently:
horizon_scan(iso="IE", since="7d", max_per_jurisdiction=3)
→ returns 219 fresh docs; first hit has
document_id=27732019, title="Protection of Employees (Employers'
Insolvency) (Amendment) Act 2026"
Step 3 — Pull the full text:
get_document_text(iso="IE", document_id="27732019", limit=1)
→ returns full_text (the entire Act)
get_markdown(iso="IE", document_id="27732019", limit=1)
→ same content as markdown with inline links
For documents that are PDFs (case law, prospectuses, government decrees), substitute step 3 with:
get_source_url(document_id=20951014)
→ presigned S3 URL valid ~1 hourThat's the entire happy path. The 12 remaining tools exist to help you find the document_id you want, not to fetch it.
5. Tool reference
Audience: developers, external consumers, LLM authors.
Every tool here was called live against the production MCP on 2026-05-27. The example responses below are real, trimmed responses — not invented schemas.
Consistent shape: list-style tools return {pagination: {total_count, limit, offset}, data: [...]}. The exception is list_jurisdictions, which returns {data: [...]} with no pagination. Pagination defaults vary by tool; pass an explicit limit when in doubt (servers may cap at 100 internally).
5.1 list_jurisdictions
Purpose. Enumerate every ISO code the MCP recognises. Always call this first when an LLM doesn't already know the valid ISO for the user's target country/state.
Parameters. none.
Live example (trimmed).
{
"data": [
{ "name": "Belgium", "iso": "BE", "type": "country" },
{ "name": "European Union", "iso": "EU", "type": "country" },
{ "name": "Ireland", "iso": "IE", "type": "country" },
{ "name": "United States", "iso": "US", "type": "country" },
{ "name": "California", "iso": "US_CA", "type": "state" },
{ "name": "Bavaria", "iso": "DE_BY", "type": "state" },
{ "name": "Brussels", "iso": "BE_BR", "type": "state" }
// ... 132 entries total
]
}Notes.
- ISO uses underscore-prefixed sub-state codes (US_CA, not US-CA).
- "International Standards" is listed under ISO "ZZ" — convenient for ISO/IEC standards content with no national jurisdiction.
- The European Union is treated as a country-level jurisdiction (iso: "EU").
5.2 list_portals
Purpose. Discover the official portals (data sources) Lawstronaut has crawled for a given jurisdiction. Only portals that have actually produced crawled data are returned.
Parameters.
| Parameter | Required | Type | Description |
|---|---|---|---|
| iso | yes | string | ISO code from list_jurisdictions |
| name | no | string | Substring match on portal name / URL |
| tag | no | string | Filter by a portal tag |
| lang | no | string | Filter by language name (e.g. "French") |
Live example (trimmed).
[
{
"name": "Belgium Etaamb Openjustice (French)",
"name_en": "Etaamb Openjustice",
"url": "etaamb.openjustice.be/fr",
"language": "French",
"jurisdiction": { "country": "Belgium", "state": "" },
"portal_tags": ["General Legislation", "legal professionals", ...],
"total_links": 346363
},
{
"name": "JUPORTAL (Openbare databank voor Belgische rechtspraak)",
"name_en": "JUPORTAL – Public Database of Belgian Case Law",
"url": "juportal.be",
"language": "Dutch",
"portal_tags": ["General Case Law", "court and judiciary", ...],
"total_links": 246433
}
// ... 45 Belgian portals total
]Notes.
- total_links is the crawler-link count — a good proxy for how thoroughly the portal has been ingested.
- Many portals exist in multiple language variants ((Dutch), (French), (German)); choose lang deliberately when language matters.
- name is the canonical portal identifier used by list_documents(portal=...). Always use the exact name (case- and parenthesis-sensitive), not name_en.
5.3 list_domains
Purpose. Top-level legal-taxonomy buckets. Cached and tiny — there are only three.
Parameters.
| Parameter | Required | Type | Description |
|---|---|---|---|
| domain | no | string | Substring filter |
| limit | no | number | Page size |
| offset | no | number | Skip |
Live example (trimmed).
{
"pagination": { "total_count": 3, "limit": 10, "offset": 0 },
"data": [
{ "domain_id": "A", "domain": "Private law" },
{ "domain_id": "B", "domain": "Public Law" },
{ "domain_id": "C", "domain": "Miscellaneous" }
]
}5.4 list_subdomains
Purpose. Children of a domain.
Parameters.
| Parameter | Required | Type | Description |
|---|---|---|---|
| domain_id | yes | string | Parent domain ("A", "B", "C") |
| subdomain | no | string | Substring filter |
| limit | no | number | Page size |
| offset | no | number | Skip |
Live example (trimmed).
{
"pagination": { "total_count": 2, "limit": 10, "offset": 0 },
"data": [
{ "domain_id": "A", "subdomain_id": "A.1", "subdomain": "Private International Law" },
{ "domain_id": "A", "subdomain_id": "A.2", "subdomain": "National law" }
]
}5.5 list_categories
Purpose. Categories under a subdomain. subdomain_id is optional — calling without it lists all 81 categories across the corpus.
Parameters.
| Parameter | Required | Type | Description |
|---|---|---|---|
| subdomain_id | no | string | Parent subdomain |
| category_name | no | string | Substring filter |
| limit | no | number | Page size |
| offset | no | number | Skip |
Live example (trimmed).
{
"pagination": { "total_count": 81, "limit": 10, "offset": 0 },
"data": [
{ "domain": "Private law", "subdomain": "Private International Law",
"category_id": "A.1.1", "category_name": "General Private International Law" },
{ "domain": "Private law", "subdomain": "National law",
"category_id": "A.2.5", "category_name": "Corporate law" }
// ...
]
}5.6 list_subcategories
Purpose. Subcategories under a category.
Parameters.
| Parameter | Required | Type | Description |
|---|---|---|---|
| category_id | yes | string | Parent category id (e.g. "A.2.5") |
| subcategory_name | no | string | Substring filter |
| limit | no | number | Page size |
| offset | no | number | Skip |
Live example (trimmed).
{
"pagination": { "total_count": 21, "limit": 10, "offset": 0 },
"data": [
{ "category_id": "A.2.5", "category_name": "Corporate law",
"subcategory_id": "A.2.5.1", "subcategory_name": "Company formation and incorporation" },
{ "category_id": "A.2.5", "category_name": "Corporate law",
"subcategory_id": "A.2.5.16", "subcategory_name": "Restructuring, dissolution and bankruptcy" }
]
}5.7 list_law_types
Purpose. Leaf-level law-type taxonomy.
Parameters.
| Parameter | Required | Type | Description |
|---|---|---|---|
| subcategory_id | yes | string | Parent subcategory id (e.g. "A.2.5.1") |
| law_type | no | string | Substring filter |
| limit | no | number | Page size |
| offset | no | number | Skip |
Live example (trimmed).
{
"pagination": { "total_count": 25, "limit": 10, "offset": 0 },
"data": [
{ "law_type_id": 1507, "law_type": "Company Formation Policy and Framework Laws",
"subcategory_id": "A.2.5.1", "category_id": "A.2.5", "category_name": "Corporate law" },
{ "law_type_id": 1508, "law_type": "Incorporation Procedures Laws", ... }
]
}5.10 list_documents
Purpose. The workhorse. Returns full document metadata filtered by any combination of 25+ optional criteria.
Parameters.
| Parameter | Required | Type | Description |
|---|---|---|---|
| iso | yes | string | Jurisdiction ISO |
| portal | no | string | Exact portal name |
| document_id | no | string | Specific document |
| repealed | no | bool | Filter by repealed status |
| title | no | string | Substring on title |
| section_title | no | string | Substring on section |
| url | no | string | Substring on legal_link |
| status | no | string | e.g. "new", "updated" |
| crawling_date | no | string | Filter by crawl date |
| last_updated | no | string | Filter by last update |
| last_amendment | no | string | Filter by amendment date |
| publication_date | no | string | Filter by publication |
| expiration_date | no | string | Filter by expiration |
| effective_date | no | string | Filter by effective date |
| date_of_enactment | no | string | Filter by enactment |
| date_of_decision | no | string | Filter by decision (case law) |
| file_data_only | no | bool | Restrict to docs with attached files |
| issuing_authority | no | string | Exact match |
| type_of_authority | no | string | Exact match |
| source_identifier | no | string | E.g. "Rolnummer: 6776" |
| source_secondary_identifier | no | string | E.g. ECLI |
| tag | no | string | Filter by tag |
| lang | no | string | Language filter |
| version | no | string | Specific version |
| limit | no | number | Page size |
| offset | no | number | Skip |
Live example (trimmed).
{
"pagination": { "total_count": 5999, "limit": 1, "offset": 0 },
"data": [{
"document_id": 20951014,
"title": "Arrest nr. 24/2019 van 14 februari 2019",
"jurisdiction": { "country": "Belgium", "state": "" },
"publication_date": "2019-02-14T00:00:000Z",
"issuing_authority": "Constitutional Court",
"type_of_authority": "Arresten",
"language": "Dutch",
"legal_link": "https://nl.const-court.be/public/n/2019/2019-024n.pdf",
"repealed": false,
"status": "new", "version": 1,
"source_identifier": "Rolnummer: 6776",
"source_secondary_identifier": "ECLI:BE:GHCC:2019:ARR.024",
"file_data": {
"content_type": "application/pdf",
"file_size": 116081,
"timestamp": "2026-02-27T02:04:34.369535"
}
}]
}Notes.
- Date timestamps come back as YYYY-MM-DDTHH:MM:SS00Z (note the trailing 0 before Z — a serialization quirk; parse defensively).
- ECLI identifiers come through source_secondary_identifier.
- file_data: {} (empty) means no binary; populated means a PDF is available via get_source_url.
- The same document_id can be fed to get_document_text, get_markdown, get_source_url, and get_document_with_version.
5.11 get_document_text
Purpose. Return the plain-text rendition of one or more documents. Accepts the same superset of filters as list_documents.
Parameters.
| Parameter | Required | Type | Description |
|---|---|---|---|
| iso | yes | string | Jurisdiction ISO |
| document_id | no | string | Specific document |
| … | no | — | All other list_documents filters apply |
| limit | no | number | Page size — pass deliberately for sweeps |
Live example (trimmed).
{
"total_count": 1, "limit": 1, "offset": 0,
"data": [{
"document_id": "27732019",
"full_text": "Number 7 of 2026 PROTECTION OF EMPLOYEES (EMPLOYERS' INSOLVENCY)
(AMENDMENT) ACT 2026 An Act to amend the Protection of Employees ...
[full text continues]"
}]
}Notes.
- document_id is a string in this response even though it is an integer in list_documents. Cast defensively.
- Plain text is unformatted (no markdown, no inline links). For richer rendering use get_markdown.
- Sweeping calls with broad filters can return large payloads — pass limit deliberately.
5.12 get_markdown
Purpose. Same content as get_document_text but as markdown — bold/italic preserved, inline cross-references rendered as [text](url).
Parameters.
| Parameter | Required | Type | Description |
|---|---|---|---|
| iso | yes | string | Jurisdiction ISO |
| document_id | no | string | Specific document |
| … | no | — | All other list_documents filters apply |
| limit | no | number | Page size |
Live example (trimmed).
*Number* 7 *of* 2026
**PROTECTION OF EMPLOYEES (EMPLOYERS' INSOLVENCY) (AMENDMENT) ACT 2026**
An Act to amend the
[Protection of Employees (Employers' Insolvency) Act 1984](https://www.irishstatutebook.ie/1984/en/act/pub/0021/index.html)
; to give further effect to Directive 2008/94/EC ...Notes.
- Preferred over get_document_text whenever the downstream consumer can render markdown.
- Markdown is the canonical store; get_document_text is essentially get_markdown with formatting stripped.
5.13 get_document_with_version
Purpose. Retrieve a specific version of a specific document.
Parameters.
| Parameter | Required | Type | Description |
|---|---|---|---|
| document_id | yes | integer | Document id (positive integer) |
| version | yes | integer | Version number (positive integer) |
Live example (trimmed).
// Expected: single document object mirroring a list_documents row.
// Observed 2026-05-27: calls with version=1 against documents whose current version
// is 1 returned {} (empty). See Known Quirks — open question for the API team.Notes.
- Working theory: the endpoint may only return historical versions, not the latest. Until confirmed, prefer list_documents(document_id=..., version=...).
5.14 get_source_url
Purpose. Get a time-limited presigned URL to the original source file (typically a PDF).
Parameters.
| Parameter | Required | Type | Description |
|---|---|---|---|
| document_id | yes | integer | Document id (positive integer) |
Live example (trimmed).
// Document WITH file_data (document_id=20951014):
{
"document_id": "20951014",
"url": "https://lawstronaut-prod-files-us.s3.us-east-1.amazonaws.com/...pdf?X-Amz-Expires=3600&..."
}
// Document WITHOUT file_data (document_id=27732019, HTML-only Irish Act):
{}Notes.
- Presigned URLs expire after 3600 seconds (1 hour) — re-fetch if the user might use the link later.
- Files are served from lawstronaut-prod-files-us.s3.us-east-1.amazonaws.com.
- Always check file_data from list_documents first; calling get_source_url on a doc whose file_data is empty {} returns {} (not an error).
5.15 horizon_scan
Purpose. Mode 1, freshness-first. "What has Lawstronaut crawled / published / updated recently in this jurisdiction?"
Parameters.
| Parameter | Required | Type | Description |
|---|---|---|---|
| iso | yes | string | One ISO code |
| since | no | string | Default "yesterday". Also: "today", "Nd"/"Nh"/"Nw" (e.g. "7d"), or ISO "YYYY-MM-DD" |
| date_field | no | enum | "crawling_date" (default) | "publication_date" | "last_updated" |
| topic | no | string | Substring against title |
| issuing_authority | no | string | Filter |
| type_of_authority | no | string | Filter |
| status | no | string | e.g. "new", "updated" |
| include_repealed | no | bool | Default false |
| max_per_jurisdiction | no | integer | Default 50, capped at 100 |
Live example (trimmed).
{
"pagination": { "total_count": 219, "limit": 3, "offset": 0 },
"data": [{
"document_id": 27732019,
"title": "Protection of Employees (Employers' Insolvency) (Amendment) Act 2026",
"issuing_authority": "Office of the Attorney General (Ireland)",
"type_of_authority": "Act",
"legal_link": "https://www.irishstatutebook.ie/2026/en/act/pub/0007/index.html",
"status": "new", "version": 1,
"crawling_date": "2026-05-20T02:39:38.557Z",
"portal_name": "Irish Statute Book"
}]
}Notes.
- date_field="crawling_date" answers "what arrived in our warehouse?"; "publication_date" answers "what officially published in the period?".
- Use this instead of list_documents for time-windowed queries — it's optimized for the freshness path.
- Document object structure matches list_documents, so downstream parsers are reusable.
6. End-to-end recipes
Audience: developers and external consumers.
Recipe A — "Brief me on the last week of Irish primary legislation"
horizon_scan(iso="IE", since="7d", type_of_authority="Act", max_per_jurisdiction=20, date_field="crawling_date")- For each result,
get_markdown(iso="IE", document_id=<id>, limit=1) - Summarise titles + first paragraph for each → deliver to user
Recipe B — "Find Belgian Constitutional Court arrests citing a given ECLI"
list_portals(iso="BE", tag="General Case Law")→ confirm portal namelist_documents(iso="BE", portal="Belgium Federal Constitutional Court (Dutch)", source_secondary_identifier="ECLI:BE:GHCC:2019:ARR.024", limit=1)get_source_url(document_id=<id>)→ presigned S3 link to the PDF
Recipe C — "Show me all Belgian regulator guidance on data protection from 2025-2026"
list_portals(iso="BE", tag="Data Protection and Privacy Law")list_documents(iso="BE", portal="Belgium Federal DPA (Dutch)", type_of_authority="<discovered via list_authority_types>", crawling_date="2025-01-01..2026-12-31", limit=50)- For each:
get_markdown(...)and aggregate
Recipe D — "Map a portal to our internal Mongo store"
list_documents(iso=<iso>, portal=<exact portal name>, limit=N)- For each document, transform fields per Mongo mapping (§8)
- Where
file_datais non-empty, callget_source_urland persist the resolved URL only as a transient — re-fetch on demand (1h TTL)
7. LLM / agent guidance — tool-selection heuristics
Audience: LLM system-prompt authors and agentic-AI engineers.
Tool-selection decision tree
User mentions a country / state?
├─ Yes → list_jurisdictions (if iso unknown) → resolve to iso
└─ No → ask the user, do not guess
User wants recent / new / updated content?
└─ horizon_scan(iso=…, since=…, date_field=…)
User names a specific document, ECLI, Rolnummer, URL, or title?
└─ list_documents(iso=…, <matching filter>, limit=1)
User wants to know what kinds of legal content are available?
├─ "what portals" → list_portals
├─ "what kinds of instruments" → list_authority_types
├─ "which regulators / courts" → list_issuing_authorities
└─ "by legal subject area" → list_domains → list_subdomains → list_categories
→ list_subcategories → list_law_types
User wants the document's text?
├─ Plain text → get_document_text
├─ Markdown (preserves links) → get_markdown
└─ Original PDF → get_source_url (requires non-empty file_data)Conventions an agent should respect
- Never invent ISO codes — always confirm against
list_jurisdictions. - Always pass
isoon tools that require it; do not infer from a portal URL. - Use exact portal strings —
list_documents(portal=...)is case- and parenthesis-sensitive. Resolve names vialist_portalsfirst. - Respect pagination —
total_countcan be in the millions; never fetch unbounded. - Treat empty date strings as "unknown, not absent".
- Treat
file_data:as the signal for whetherget_source_urlwill succeed. - Presigned URLs are short-lived (1h). If you hand one to the user, hand them the
document_idtoo.
8. Mapping responses to the Mongo data model
Audience: internal engineers (Lawstronaut data platform).
The Lawstronaut MCP response shape was designed to be a near-direct mirror of CrawledDataModelMongo.
Core normalized fields (1:1 or near-1:1)
| Mongo core field | MCP response field | Notes |
|---|---|---|
| url | legal_link | Original portal URL |
| portal | portal | Hostname / URL fragment |
| portal_name | portal_name | Human-readable portal name |
| jurisdiction | jurisdiction.country (+ optional .state) | Compose from country + state |
| status | status | "new", "updated", etc. |
| title | title | — |
| section_title | section_title | Optional |
| document_number | derive from source_identifier or legal_link | Portal-specific |
| source_identifier | source_identifier | E.g. "Rolnummer: 6776" |
| source_secondary_identifier | source_secondary_identifier | E.g. ECLI |
| date_of_publication | publication_date | Field naming differs |
| date_of_effective | effective_date | Field naming differs |
| date_of_expiration | expiration_date | Field naming differs |
| type_of_law | derive via taxonomy → law_type | Use list_law_types |
| content_markdown | content_markdown (from get_markdown) | — |
| file_data | file_data ({content_type, file_size, timestamp}) | Empty {} if no binary |
| last_updated_at | last_updated | Field naming differs |
Versioning
The MCP returns a version integer on every Document. Persist it. When the same document_idis later returned with an incremented version, treat that as an "amend" event rather than a delete-and-insert.
9. Known quirks, gotchas, and open questions
Audience: everyone, but especially developers integrating against the MCP.
get_document_with_versionreturned{}for version=1 in 2026-05-27 probes. Preferlist_documents(document_id=..., version=...)until confirmed.- Date format quirk — date fields are returned as
YYYY-MM-DDTHH:MM:SS00Z(not strict ISO 8601). Parse defensively. - Empty strings ≠ null — many date fields come back as
"". Treat both as missing. - Date-range filters — range syntax (
2025-01-01..2026-12-31) should be validated before production use. document_idtyping inconsistency — integer fromlist_documents/horizon_scan, string fromget_document_text/get_markdown. Always cast.get_source_urlon docs withoutfile_datareturns{}rather than an error.- Portal name case sensitivity —
list_documents(portal=...)requires the exact string fromlist_portals. max_per_jurisdictiononhorizon_scanis hard-capped at 100.list_jurisdictionsreturns 132 entries — bothiso="EU"andiso="ZZ"(International Standards) are valid country-level codes.
10. Glossary
Audience: legal product managers and LLM consumers.
| Term | Definition |
|---|---|
| Jurisdiction | A country (ISO 3166-1, e.g. BE) or a sub-state region (e.g. US_CA, DE_BY, BE_BR). EU and ZZ (International Standards) are country-level entries. |
| Portal | An official source website Lawstronaut crawls (e.g. Etaamb OpenJustice, Irish Statute Book). One legal portal may have multiple per-language variants. |
| Issuing authority | The named body that produced the document (a court, ministry, regulator, agency). |
| Authority type | The kind of instrument (Act, Decree, Prospectus, Arresten/Judgment, Reports, Guidelines, …). |
| Domain / Subdomain / Category / Subcategory / Law type | Five-level legal taxonomy. Dotted-numeric IDs (A.2.5.1) at the first four levels; numeric law_type_id at the leaf. |
| Document | A single crawled record with metadata, text, optional file. Identified by integer document_id and integer version. |
| status | Corpus-level lifecycle marker ("new", "updated", …). Not the same as repealed, which is a legal-effect marker. |
| legal_link | The original source URL on the portal site; not a Lawstronaut URL. |
| file_data | Block describing an attached binary (PDF, etc.). Empty {} means no binary; populated means get_source_url will produce a presigned download link. |
| ECLI | European Case Law Identifier; lives in source_secondary_identifier for case-law documents. |
| Rolnummer | Belgian docket / case roll number; lives in source_identifier for Constitutional Court arrests. |
| Horizon scan | Lawstronaut's freshness mode: "what landed in the warehouse recently?", as opposed to relevance-ranked search. |
11. Change log
| Version | Date | Notes |
|---|---|---|
| v1.0 | 2026-05-27 | First draft. All 15 tools probed live against production. Two open questions logged in §9. |
