Crawlee

Crawlee

Crawlee helps you build reliable crawlers. Fast.

4.4
8 reviews

520 followers

Crawlee is an intuitive, customizable open-source library for web scraping and browser automation. Quickly scrape data, store it, and avoid getting blocked with auto-generated human-like fingerprints, headless browsers, and smart proxy rotation.
This is the 2nd launch from Crawlee. View more
Crawlee for Python

Crawlee for Python

Build reliable scrapers in Python
We are launching Crawlee for Python, an open-source library for web scraping and browser automation. Quickly scrape data, store it, and avoid getting blocked, headless browsers, and smart proxy rotation.
Crawlee for Python gallery image
Crawlee for Python gallery image
Crawlee for Python gallery image
Free
Launch Team

What do you think? …

Saurav Jain
Hello Hunters and Makers, I am Saurav, Developer Community Manager of Apify, the company building Crawlee. I am happy to hunt Crawlee for Python today. We launched (Crawlee) in August 2022 and received an amazing response from the community, as well as continuous demand for building it in Python. Finally, after a lot of hard work from our team, we are launching Crawlee for Python today. It has all of these features: - Unified interface for HTTP & headless browser crawling. - Automatic parallel crawling based on available system resources. - Written in Python with type hints - enhances DX (IDE autocompletion) and reduces bugs (static type checking). - Automatic retries on errors or when you’re getting blocked. - Integrated proxy rotation and session management. - Configurable request routing - direct URLs to the appropriate handlers. - Persistent queue for URLs to crawl. - Pluggable storage of both tabular data and files. - Robust error handling. Why use Crawlee rather than Scrapy? - Crawlee has out-of-the-box support for headless browser crawling (Playwright). - Crawlee has a minimalistic & elegant interface - Set up your scraper with fewer than 10 lines of code. - Complete type hint coverage. - Based on standard Asyncio. Please pass on your feedback and thoughts in the comments below!
Csaba Kissi
@sauain, I wish I could see the version of this for PHP. Anyway... Great product!
Zeiki Yu
@sauain This is exactly what I needed. Thanks for building Crawlee!
Saurav Jain
@csaba_kissi thanks for the support, well you never know ;)
Khyati Agarwal
Congratulations on the launch🎉 Amazing work👏 Scraping in headless browser had so many gaps!
Saurav Jain
@khyati_tmw thanks for the support, Khyati!
Kyrylo Silin
This looks like a powerful tool for web scraping and browser automation. How does Crawlee's proxy rotation and session management compare to other tools on the market? Any plans to add more integrations? Congrats on the launch, Saurav!
Saurav Jain
hey @kyrylosilin! we use our [Session Pool](https://crawlee.dev/python/api/c...) system to rotate the sessions, and similar to Crawlee TS/JS we are going to use [Tiered Proxies](https://crawlee.dev/blog/proxy-m...) in Crawlee for Python as well.