Lariat Data

Lariat Data

Discover data bugs before your customers do

62 followers

Lariat Data is a Continuous Data Quality monitoring platform that allows Data Engineers to discover data bugs before their customers do. Our latest product release ensures that S3 data is complete and accurate upon ingestion. Automatically inspect objects to track health metrics and flag data anomalies. All it takes is a 5-min install to improve the reliability of your data products and stop bad data at the source.
S3 Data Monitoring by Lariat gallery image
S3 Data Monitoring by Lariat gallery image
S3 Data Monitoring by Lariat gallery image
S3 Data Monitoring by Lariat gallery image
S3 Data Monitoring by Lariat gallery image
Free Options
Launch Team

What do you think? …

Vikas Shanbhogue
We're really excited to announce our S3 Data Ingest Monitoring product on PH! Thanks @mwseibel for Hunting us. Lariat Data is a Continuous Data Quality monitoring platform that allows Data Engineers to discover data bugs before their customers do. We built the S3 monitoring product to allow our customers to resolve critical data issues at the raw data layer before any wasteful downstream processing on bad data. When our customers found issues with their datasets in Snowflake, Athena or Postgres, they often traced the root cause to problems with raw data ingestion onto object storage like S3. They faced some key pain-points that we knew we could extend our platform to solve: - Difficulty sifting through a large number of file-ingest events (anywhere from 1000 to 100,000) to figure out which ones had issues - Context about failed processing can slip away when data is partially written - Managing several sources of data like paid partners or external services is taxing to data engineering teams - Poorly formatted files can be written because not all formats on object storage enforce schema The growing use of formats like Parquet, Iceberg & Delta Lake means that object storage like S3 will house more business critical analytical and even model training datasets. We believe that monitoring data on object storage is only going to increase in importance and are excited to build on this! If you're interested, try us out for free via self-service https://www.lariatdata.com/try-lariat. It takes less than 5 minutes to set up the monitoring agent against a desired bucket and object prefix. You can try it out for free, no credit card required! You can also check out our docs here: https://docs.lariatdata.com/integrations-data-storage/s3-object-storage. Excited to hear your thoughts, feedback and questions in the comments!
Albert
congratulations on the launch, vikas and aaditya. monitoring s3 data at ingestion sounds like a crucial step for data reliability. i'm curious, how does your tool differentiate itself in handling data quality issues compared to traditional methods that also aim to prevent downstream consequences?
Vikas Shanbhogue
@mashy Thanks for the comment! Some of the key benefits of our approach is the speed to find issues, the adaptability to different data schemas and our understanding of the timeline of changes to objects. Since we are an event-driven system, we find issues as soon as a file is written to s3 (regardless of the throughput). Additionally, it means that we are able to construct a timeline of all object changes and catch issues like mistaken deletes. Another thing we do is build a sort of "pre-aggregation" layer that allows you to slice-and-dice your data with unlimited dimensionality. We've found that having such a layer helps both our customers and automated alerting system surface the "unknown-unknowns" in object data. Lastly, our data collection agent is able to adapt to a wide variety of semi-structured and unstructured data, and allows greater flexibility when trying to catch subtle data quality issues.
Thet Lin Thu
It could really help manage the chaos of multiple data sources and ensures nothing slips through the cracks. Fantastic work, Vikas—this is a must-have for data teams!