@mutlu82 So cool to see someone using it like this! That crawl was started at the peak of the HN effect, so it may be a while before it completes, unfortunately I didn't design the MVP to handle anywhere near this kind of load!
@mutlu82 Good question! Usually 10 seconds to access and parse each link (most of that time is spent waiting to ensure it renders fully). Then crawling all found links for HTTP errors is sub-second, any links that return a text/html mime type are crawled with the browser (Selenium Server / PhantomJS - 10 seconds per link). So 1-2 minutes under normal circumstances.
Normally we'd only run around 10 crawls at a time, but with HN/PH there's thousands of requests coming through. The bottleneck is the selenium server, I'll probably have to kill it and restart all the "In Progress" crawls tonight.
As a product that users pay for this would never be an issue, because we're able to schedule crawls as part of a monitoring service and notify users by email of issues. Which means my focus is on product right now and not scale :)
EDIT: here's some results for product hunt: http://bughunt.io/results/543d8c...
Just found a Bug on Bughunt. ;)
When you're analyzing a site and click the link to actually visit the site you're analyzing the href tag is missing the ":" in the "http://"... :)
Ballpark
IdBloc
Ballpark
IdBloc
thirdweb
IdBloc
Roger