Lilac

Lilac

Better data, better AI.

61 followers

Lilac is an open-source tool that enables data and AI practitioners improve their products by improving their data.
Lilac Garden gallery image
Lilac Garden gallery image
Free Options
Launch Team

What do you think? …

Nikhil Thorat
We're so excited to release Lilac Garden, our hardware accelerated cloud platform for data transformations. Lilac is open-source, so you can try it for free on your device! If you want your transformations to go much faster, sign up for our waiting list! We believe that data quality will be the critical component of the next wave of progress in AI systems, and we think that a hybrid approach of human-in-the-loop and fully-automated will be the best strategy for improving data quality. Make sure to check our the live demo of clusters on a logs dataset of 1M conversations in Chatbot Arena linked from the blog post: https://docs.lilacml.com/blog/in...
Sambit Bhaumik
Congratulations on your launch! I'm glad I came across your tool, it's going to be very helpful for my current research. Easy upvote for me. I'm eager to take a close look. Since, I'm working on highly multilingual data, I see you have lang detection signals which is great. Is there also a possibility to find semantically similar data (across say, a column)? Or maybe the possibility to use custom models that can aid in such a task
Rajiv Ayyangar
Pandas was transformational in data processing pre-LLMs, and it seems like in an LLM world, the way we process, and refactor training data should be seriously re-examined. @nsthorat what's a typical use case? e.g. how does Lilac help with, say, clustering and editing?
Nikhil Thorat
@rajiv_ayyangar A typical use case for AI application is first taking your data, and clustering it, and using clusters to curate the data in a more organized way. Clustering will give you a birds-eye view of the shape of your data. Often we'll realize that some clusters are over-represented, and some are under-represented. For example, in a very popular coding dataset, 7% of the data is failure to order a pizza! Once in Lilac, we can curate at the cluster level, removing most of the pizza cluster, keeping a few representative points, and reducing the size of the dataset and thus fine-tuning costs. Here's a walkthrough for more details: