@rdegges Citus focuses on much larger scale data. In general if you have 100 GB or less of data Heroku Postgres should work great for you. Single node Postgres still works great as long as your data fits on a single box and the bulk of it in memory.
When you start to outgrow a single node (which can be anywhere from 100GB up to 1TB) then getting responsiveness on queries can be hard, especially on aggregations. Due to how Citus shards your data and distributes the workload you get parallelism of multiple instances. This means across well over 10 TB of data you can still have real-time responsiveness: sub-second queries for aggregations, millisecond queries for small lookups, millisecond inserts/updates.
Hi, Craig here. I head up Citus Cloud and prior to Citus spent a number of years at Heroku primarily running product for Heroku Postgres. We've had Citus Cloud in a private beta for a number of months and excited to open for general availability. Happy to answer any questions the PH community has.
Hi Craig, you probably get that from time to time - how is Citus different from RedShift? Is Citus more responsive, i.e. real-time database, but RedShift can handle larger data sets?
@blukasz We do get it some :)
In general there are a couple of differences. With Redshift you batch load data, and queries are often run in a matter of seconds to minutes. These are often very long SQL queries created by an analyst. You typically only having a small number of concurrent queries run against Redshift. So in general when you have really complex SQL, large amounts of data Redshift can work well at speeding those up.
Citus tends to be used more directly within an application. Citus is able to ingest data in real-time (standard inserts). Because of the distributed nature of Citus not the full breadth of SQL maps directly, so some of those things that are common for analysts, such as common table expressions won't work on your sharded tables. It's not a requirement, but most Citus users tend to have a high number of queries run per second against the database as well. Due to the way parallelism works for us there's not a high overhead of over a second for any query, so you can have thousands of users (or well the app for thousands of users) running queries across a large dataset in under a second. Most of these queries tend to be already defined within your application code though.
PassProtect
Postgres Playground
Postgres Playground
Postgres Playground