PeerDB is a Postgres-first data-movement platform that makes moving data in and out of Postgres fast and simple. We do multiple postgres-native and infrastructural optimization to provide 10x faster data-movement for Postgres users.
Hello PH!
I'm Sai, the co-founder and CEO of PeerDB, a Postgres-first data-movement platform that makes moving data in and out of Postgres fast and simple.
PeerDB is free and open (https://github.com/PeerDB-io/peerdb) and we provide a Docker stack for users to try us out. Our repo is at https://github.com/PeerDB-io/peerdb and there’s a 5-minute Quickstart here: https://docs.peerdb.io/quickstart.
Why are we building PeerDB?
Existing ETL/ELT tools primarily focus on supporting a wide range of connectors at the expense of delivering high-quality ones. This becomes evident when your workloads need scale or have demanding feature requirements. While I was working at Microsoft and Citus Data, it was common to see Postgres customers try out the existing ETL tools and fail. Common issues with these tools included painfully slow syncs - syncing 100s of GB of data took days; flaky and unreliable - frequent crashes, loss of data precision on target etc., and; feature-limited - lack of configurability, unsupported data types and so on.
Our Solution:
PeerDB is an ETL/ELT tool built for PostgreSQL. We implement multiple Postgres native and infrastructural optimizations to provide a fast, reliable and a feature-rich experience for moving data in/out of PostgreSQL.
For performance - we can parallelize initial load for a large table, still ensuring consistency. Syncing 100s of GB reduces from days to minutes. Our architecture is designed for real-time syncs and implements multiple logical replication related optimizations (tuning Postgres configs, parallel reading of slot etc.). This enables 10x faster Change Data Capture with data-freshness of a few 10s of seconds even at large throughputs (10k+ tps).
We have fault tolerance mechanisms for reliability (https://blog.peerdb.io/using-tem...) and support multiple features including log-based (CDC) / query based streaming, efficient syncing of tables with large (TOAST) columns, configurable batching and parallelism to prevent OOMs and crashes etc.
For usability - we provide a Postgres compatible SQL layer for data-movement. This makes the life of data engineers much easier. They can develop pipelines using a framework they are familiar with, without needing to deal with custom UIs and REST APIs. They can use Postgres' 100s of integrations to build and manage ETL. We extend Postgres' SQL grammar with a few new intuitive SQL commands to enable real-time data streaming across stores. Because of this, we were able to add dbt integration via Dagster (in private preview) in a few hours! We expect data-engineers to unravel similar integrations with PeerDB easily, and plan to make this grammar richer as we evolve.
Currently we support 6 target data stores (BigQuery, Snowflake, Postgres, S3, Kafka etc) for data movement from Postgres. This doc captures the current status of the connectors: https://docs.peerdb.io/sql/comma....
Check out our github repo - https://github.com/PeerDB-io/peerdb and go ahead and give it a spin (5-minute quickstart https://docs.peerdb.io/quickstart).
PeerDB
Cheq UPI