Best Practices
There are two proven ways to introduce data contracts with the Data Contract CLI. Pick the one that matches your situation.
Data-first approach
Create a data contract based on the actual data. This is the fastest way to get started and to get feedback from data consumers.
-
Use an existing physical schema (e.g. SQL DDL) as a starting point to define your logical data model in the contract. Right after the import, double-check that the actual data meets the imported model:
datacontract import sql --source ddl.sqldatacontract test -
Add quality checks and additional type constraints one by one, making sure the data still adheres to the contract:
datacontract test -
Validate that the
datacontract.yamlis correctly formatted and adheres to the Open Data Contract Standard:datacontract lint -
Set up a CI pipeline that runs daily for continuous quality checks. Use the
cicommand for CI-optimized output (GitHub Actions annotations and step summary, Azure DevOps annotations). You can also report results to tools like Entropy Data:datacontract ci --publish https://api.entropy-data.com/api/test-results
Contract-first approach
Create a data contract based on the requirements from use cases, before the data product exists.
-
Start with a
datacontract.yamltemplate:datacontract init -
Create the model and quality guarantees based on your business requirements. Fill in the terms, descriptions, etc., then validate the format:
datacontract lint -
Use the export functions to start building the providing data product as well as the integration into consuming data products:
# data providerdatacontract export dbt-models# data consumerdatacontract export dbt-sourcesdatacontract export dbt-staging-sql -
Test that your data product implementation adheres to the contract:
datacontract test