Skip to main content

Open Data Contract Standard

The Data Contract CLI natively uses the Open Data Contract Standard (ODCS) as its data contract format. ODCS is an open, vendor-neutral standard maintained by the Bitol project under the Linux Foundation's AI & Data umbrella.

A data contract written in ODCS is a single YAML file that describes a data set's structure, semantics, quality, ownership, and the servers where the data lives.

A minimal contract

apiVersion: v3.0.2
kind: DataContract
id: urn:datacontract:checkout:orders-latest
name: orders
version: 1.0.0
status: active
description:
purpose: One record per order. Includes cancelled and deleted orders.
servers:
- server: production
type: postgres
host: localhost
port: 5432
database: orders
schema: public
schema:
- name: orders
physicalName: orders
properties:
- name: order_id
logicalType: string
physicalType: uuid
primaryKey: true
required: true
- name: order_total
logicalType: integer
physicalType: integer
required: true
quality:
- type: sql
description: 95% of order totals are between 10 and 499 EUR
query: SELECT quantile_cont(order_total, 0.95) FROM orders
mustBeBetween: [1000, 99900]
note

The CLI also accepts the older Data Contract Specification format (which uses models/fields instead of ODCS schema/properties), but new contracts should follow ODCS — all examples in this documentation use ODCS. Use datacontract init to start from a current template.

Key sections

SectionPurpose
apiVersion / kindIdentifies the document as an ODCS data contract and its version.
id, name, version, statusIdentity and lifecycle of the contract.
descriptionHuman-readable purpose, usage, and limitations.
serversWhere the data physically lives — the connection details used by test. One contract can have several servers.
schemaThe logical structure: schemas (tables/objects) and their properties (columns/fields), types, constraints, and semantics.
qualityData quality rules, attached to the schema or to individual properties. See Quality Rules.
slaPropertiesService-level expectations such as freshness, retention, and frequency.
team / rolesOwnership and access information.
customPropertiesExtension point for backend-specific settings (for example clickhouseType, trinoType, avroLogicalType).

Logical vs. physical types

ODCS separates the logical type (logicalType, e.g. string, integer, number, boolean, date, timestamp) from the physical type (physicalType, e.g. varchar, uuid, INT64).

  • The CLI uses the logical type as the portable, server-independent description.
  • When you select a server (via --server or the server type), the CLI maps logical types to that backend's physical types for exports and tests.
  • You can always override the physical type per field, or pin a backend-specific type via customProperties / config (for example clickhouseType, trinoType).

Working with ODCS in the CLI

Learn more