Hadithi help developers build data factories so that they can easily generate, process and manage LLM video Datasets.
We take care of data collection, ingestion, processing, annotation, validation, storage and integration of video datasets with Large Language Models as developers focus on fine-tuning or training AI applications using our LLM video datasets.
Hadithi automates video processing: it organizes and renames videos with timestamps, segments them into clips, detects scenes, removes audio if needed, filters out short videos, rescales and extracts frames, batches videos, validates image counts in folders, and creates videos from images at the correct frame rate.
It is easy to use, open-source, and runs entirely on a CPU with minimal setup:
Developers simply point the path to their dataset folder and, with the click of a single button, start extracting structured datasets—a task that is usually time consuming, very expensive, and requires expert skill.
The source code is written in bash, which is lightweight and easy to understand.Developers can modify the source code to suit their needs. They can even use it to set up their own data foundry!
Unlike most video processing tools, it doesn't require a GPU.Anyone with a moderate cpu and sufficient storage hardware can create thousands of videos.
Only Bash, FFmpeg, and Exiftool are required to setup the system.Sorry, Windows and Mac OS users.,I developed the system on Ubuntu 18.04 but you can test it on your operating systems.