data-diff

Compare tables of any size across databases

18 followers

Data Diff is an open-source package that can be run in a CLI or wrapped into any data orchestrator such as Airflow, Dagster, etc. Compare datasets quickly (seconds/minutes) at a large (millions/billions of rows) scale across different databases.
This is the 2nd launch from data-diff. View more

data-diff

Efficiently diff data in or across relational databases
Open source data-diff keeps getting better! 💫 In our latest release: ⏱ Faster diffing 🦆 DuckDB support! ✨ Store diff results ➕ and more! Check out the full release notes here: https://github.com/datafold/data-diff/releases/tag/v0.3.0
data-diff gallery image
data-diff gallery image
Free
Launch tags:
GitHubData & AnalyticsData
Launch Team

What do you think? …

matthew david
We're excited to announce the biggest update to data-diff ! With this new release data-diff is significantly faster at comparing tables within the same database, especially when there are a lot of differences between the tables. We've also added the ability to materialize the diff results into a database table, in addition to (or instead of) outputting them to stdout. We've added support for DuckDB, and for diffing schemas. Finally we've improved support for alphanumerics, and threading, and generally improved the API, the command-line interface, and stability of our tool. Let us know what you think! Here is a tutorial on getting started with data-diff: https://github.com/leoebfolsom/d...
Rhymer Espinosa
@matthew_david1 Congrats on the launch. Best of luck.