People Dataset - Full people data on hundreds of millions of profile
Power your product or AI Agent with billions of datapoints on hundreds of millions of people, refreshed monthly.
Build your AI SDR, recruiting platform, internal or commercial investment tool with Crustdata's Full People Dataset.
Delivered as Parquet files
Replies
Hi Product Hunt, Garry, CEO of Y Combinator, here and I’m excited to again support and help launch Crustdata (YC F24)'s newest product: Full People Dataset.
This product delivers billions of datapoints on hundreds of millions of people, refreshed monthly.
Crustdata tracks B2B people and company data. Most of Crustdata’s 200 customers (including Y Combinator!) build their platforms - whether AI SDRs, AI recruiting platforms, internal or external investment platforms - on top of Crustdata’s APIs, but for teams who need extremely high levels of volume, the Crustdata People Dataset is an alternative option.
Benefits of the Full Dataset over APIs:
-higher volume for lower cost
-lower latency
-more control over the data
Built as the infra layer for AI Agent platforms:
Access 180 million+ professional profiles
Delivered via S3 as Parquet files
Each profile includes:
Current job title
Current Company
Headline
Location
Profile photo
Past job titles, companies, durations, and descriptions
Education (university, school, degree)
Activities and societies
Skills
Bio
Licenses
Certifications
How do they do this?
They’ve developed technology that accesses the web in real time to gather information on people and companies. They unify live data and deliver it via APIs or Parquet files (full datasets).
Who is this for?
Recruiting Platforms
Use Crustdata’s full dataset product to build your tool/platform over their people data which would serve as your candidate data warehouse. You wouldn’t have to worry about getting the data yourself and spending time and manpower to keep it updated.
AI SDRs / Sales Automation platforms
Get access to tens of millions of decision makers and prospects without wasting credits with APIs. Use the dataset for bulk lead generation or for preloading your sales agent with a prospect database. See changes in monthly refreshes that can act as intent signals.
Investment platforms or teams
Get a large dataset of people to track founder movements, map leadership teams, or build internal founder databases without relying on fragmented data sources.
Anyone building a tool that uses people data
For people who need clean, up-to-date people profiles but don’t want to work with APIs or need more control over large amounts of data you can access Crustdata’s people dataset.
Crustdata
@garrytan Thanks for the hunt, Garry! Grateful to call Y Combinator a client and to hopefully power a lot more internal and external facing products in the weeks ahead!
@garrytan How do you balance freshness and duplication in monthly refreshes?
Crustdata
@masump our proprietary technology lets us refresh 100s of millions of entities seamlessly every month
LeadDelta
@garrytan congrats Chris and the team!
Crustdata
@vedranrasic thanks for the support
Kalyxa
@garrytan Huge. Crustdata is doing the hard work so others can build smarter tools faster. Love how this flips the usual API-first model — full control, low latency, and built for scale. Feels like real infra for the next wave of AI-native products.
Crustdata offers a powerhouse dataset with billions of fresh monthly updates, perfect for fueling AI SDRs, recruiting platforms, or investment tools. Reliable, massive, and ready to integrate — data-driven decisions just got easier!
CompanyGPT
thanks @supa_l for the support
Sequoia: Men's Sexual Wellness
Such a great product! Wishing you to become the Product of the Day! 😉
CompanyGPT
@denis_galka thanks for the support 🙏
Crustdata
@denis_galka Thanks!
Crustdata
@denis_galka Thanks for the support!
CoLaunchly
This sounds like a game-changer for teams that need comprehensive and up-to-date people data! The ability to access such vast and detailed datasets with lower latency and cost is definitely going to empower a lot of platforms. Congrats to the Crustdata team and thanks for supporting innovative tools, Garry! 🌟
CompanyGPT
thanks @alex_cloudstar for the support. would love to help @CoLaunchly with the gtm
Congrats on the launch! 🚀
Quick Questions:
1. From where you get these data and how you will update them.
2. is there any privacy concern?
3. What product can be build on top of this?
4. How to know data is correct?
@pratik_kesharwani
1. They scrape the internet.
2. No privacy, you are responsible for processing the data you bought from the "dark web"
4. No guarantees of correctness...bet they will not even refund if confronted. But look, even statistically, there will be 10% false positives.
Codejet
Congrats on the launch! 🚀 Super impressive scale. Quick question — do you offer real-time API access or is it only available as Parquet file downloads?
Crustdata
@patpijanowski thanks for the support. We offer both: full dataset as parquet files and realtime data via API
CompanyGPT
@patpijanowski thanks for the support
CompanyGPT
@patpijanowski thanks! we’ve got both—real-time API endpoints and full dataset parquet downloads. curious, which data points interests you the most
Manna
This dataset could be transformative for training AI sales agents! If we find outdated phone numbers or job titles in the Parquet files, does Crustdata have a feedback loop to correct errors mid-cycle, or are we solely dependent on monthly refreshes for accuracy?
Crustdata
@desmond_ren1 At the moment for the parquet files, you'd see the updates in the next refresh. We do not support phone numbers at the moment. Is that something critical to your use case? And where are you getting that data from today?
Lancepilot
This is super powerful. Being able to access billions of refreshed data points on hundreds of millions of people is a game-changer, especially for anyone building AI agents, SDR tools, recruiting platforms, or investment products. Love that you're delivering it in Parquet too, makes integration so much easier. Congrats on the launch.
Crustdata
@priyankamandal thanks for the support! Yes delivering the full dataset while refreshing every month has been a game changer for our customers. They no longer need to think about rate limits of the API
CodeDesign.ai
Crustdata
@rince than you for the support
CompanyGPT
thanks for the support@rince
STORI
Congrats on the launch! 🚀 This looks like a super valuable resource for anyone working with data-driven projects, especially in recruitment, analytics, or AI training. Curious to know more about how often the dataset is updated and if there are any privacy considerations built into how the data is sourced. Great work!
Crustdata
@elene_tandashvili Thanks for the support. All entities are refreshed monthly.
BeeDone
Crustdata
@layouceferie whether you are where?
AINave
Congrats on the launch, team!
Crustdata
@ramitkoul Thanks for the support!
Crustdata
@ramitkoul thanks for the support
this is so much easier than working with LI directly! love it
Crustdata
@jason_chernofsky thanks for the support
Crustdata
@jason_chernofsky Definitely is! Thanks for the support!
Grimo
Finally! Been waiting for something like this since Apollo.io's API started feeling clunky (´• ω •\`) Crustdata's dataset looks way more intelligent lol – already imagining how we'll build smarter AI SDR tools without burning API credits. Teams execution here is chefs kiss, congrats on launch!
Crustdata
@stainlu thank you for the support
How much of this data is LinkedIn data? Products like Apollo are practically 99% LinkedIn data
This is super useful, especially for early-stage projects that need solid data without jumping through hoops. Love how straightforward it is. Great job putting this together—and congrats to the team on the launch!
It is a great product
Hide the Pain Harold
So where is the data from and is there an option to opt out?
Penoola