Why Data Contracts Matter
Data contracts are explicit, versioned agreements between producers and consumers that define schema, semantics, and service levels of datasets and events. In 2025, contracts underpin self-serve analytics, reliable ML, and governance in regulated industries.
What a Good Contract Includes
- Interface: Schema (fields, types, nullability, constraints), keys, and partitioning.
- Semantics: Business definitions, units, currencies, and allowed values.
- Operational SLAs: Freshness, completeness, availability, and incident process.
- Security: Classification (PII, PCI), masking rules, and access policies.
- Versioning: Backward/forward compatibility and deprecation timelines.
Diagram: Producer ā Contract ā Consumers
Schema Definition: JSON Schema
Define contract schemas using JSON Schema for analytics tables and Avro/Protobuf for events.
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "orders_v2",
"type": "object",
"required": ["order_id", "order_ts", "amount", "currency"],
"properties": {
"order_id": { "type": "string", "pattern": "^ORD_[0-9]{8}$" },
"order_ts": { "type": "string", "format": "date-time" },
"amount": { "type": "number", "minimum": 0 },
"currency": { "type": "string", "enum": ["USD", "EUR", "GBP", "INR"] },
"customer_id": { "type": "string" },
"status": { "type": "string", "enum": ["placed", "paid", "shipped", "cancelled"] }
}
}
API Contracts: OpenAPI & AsyncAPI
Use OpenAPI for RESTful data services and AsyncAPI for event streams.
# OpenAPI excerpt
openapi: 3.0.3
info:
title: Orders API
version: 2.0.0
paths:
/orders:
get:
parameters:
- in: query
name: since
schema: { type: string, format: date-time }
responses:
'200': { description: OK }
---
# AsyncAPI excerpt
asyncapi: 2.6.0
info: { title: Orders Stream, version: 2.0.0 }
channels:
orders.v2:
subscribe:
message:
name: OrderEvent
payload:
$ref: '#/components/schemas/orders_v2'
Enforcing SLAs with SLOs & SLIs
| Aspect | SLI | SLO | Notes |
|---|---|---|---|
| Freshness | Lag since last successful load | <= 15 minutes (95%) | Alert if > 30 minutes |
| Completeness | Records vs expected baseline | >= 99.5% | Per partition/day |
| Validity | Schema + constraint pass rate | >= 99.9% | dbt/GE checks |
Validation in CI/CD
Shift-left validation by gating pull requests with dbt tests and Great Expectations.
# dbt schema.yml excerpt
version: 2
models:
- name: fct_orders
columns:
- name: order_id
tests: [unique, not_null]
- name: currency
tests:
- accepted_values:
values: ['USD','EUR','GBP','INR']
- name: amount
tests:
- dbt_utils.expression_is_true:
expression: ">= 0"
Governance & Access Policies
- Classification: Tag PII/PCI and enforce column masking.
- Row-Level Security: Apply policies in engines (Trino, Snowflake) with contract-driven roles.
- Audit: Emit lineage and access logs to a central store for compliance.
Incident Management
runbook:
alerts:
- metric: freshness_lag_minutes
threshold: 30
action: page_oncall
triage:
- step: check last successful load marker
- step: compare partition counts vs baseline
rollback:
- step: restore last good snapshot
- step: re-run incremental load with safe window
Adoption Roadmap (90 Days)
- Weeks 1ā3: Pick 2ā3 critical datasets, define schemas + SLAs, publish in catalog.
- Weeks 4ā6: Wire CI gates (dbt/GE), add freshness/completeness monitors.
- Weeks 7ā9: Onboard producer teams to change management (compatibility, deprecation).
- Weeks 10ā12: Expand to event streams (AsyncAPI), integrate incident runbooks.
Checklist Before Go-Live
- ā Contract version and compatibility notes published
- ā dbt/GE tests passing and enforced in CI
- ā SLIs/SLOs monitored with alerts
- ā Access policies verified in staging and prod
- ā Incident runbook tested
Conclusion
Data contracts turn fragile pipelines into predictable products. By combining schemas, SLAs, and policy-as-code with robust validation and change management, organizations ship trustworthy analytics and reliable ML at scale.