We're so excited to release Lilac Garden, our hardware accelerated cloud platform for data transformations. Lilac is open-source, so you can try it for free on your device! If you want your transformations to go much faster, sign up for our waiting list!
We believe that data quality will be the critical component of the next wave of progress in AI systems, and we think that a hybrid approach of human-in-the-loop and fully-automated will be the best strategy for improving data quality.
Make sure to check our the live demo of clusters on a logs dataset of 1M conversations in Chatbot Arena linked from the blog post: https://docs.lilacml.com/blog/in...
Congratulations on your launch! I'm glad I came across your tool, it's going to be very helpful for my current research. Easy upvote for me.
I'm eager to take a close look. Since, I'm working on highly multilingual data, I see you have lang detection signals which is great. Is there also a possibility to find semantically similar data (across say, a column)? Or maybe the possibility to use custom models that can aid in such a task
Pandas was transformational in data processing pre-LLMs, and it seems like in an LLM world, the way we process, and refactor training data should be seriously re-examined.
@nsthorat what's a typical use case? e.g. how does Lilac help with, say, clustering and editing?
@rajiv_ayyangar A typical use case for AI application is first taking your data, and clustering it, and using clusters to curate the data in a more organized way.
Clustering will give you a birds-eye view of the shape of your data. Often we'll realize that some clusters are over-represented, and some are under-represented. For example, in a very popular coding dataset, 7% of the data is failure to order a pizza!
Once in Lilac, we can curate at the cluster level, removing most of the pizza cluster, keeping a few representative points, and reducing the size of the dataset and thus fine-tuning costs.
Here's a walkthrough for more details:
Lilac
PixelFree Studio
Product Hunt
Lilac