Powerpage Web Crawler is a portable lightweight web crawler using Powerpage. it is powerful and easy-to-use web scrawler suitable for blog site crawling and offline-reading.
Powerpage Web Crawler is a **portable** lightweight web crawler using Powerpage. It is a simple html/js application in about 350 lines code using Powerpage.
Powerpage Web Crawler is powerful and easy-to-use web scrawler suitable for blog site crawling and offline-reading. Just simply define below, for example
* base-url := https://dev.to/casualwriter // the home page of favor blog site
* index-pattern := none // RegExp of the url pattern of category page
* page-pattern := /casualwriter/[a-z] // RegExp of the url pattern of content page
* content-css := #main-title h1, #article-body //css selector for blog content
Program may
* crawl all category pages.
* find out all urls of content page.
* crawl content for one page, or all pages.
* save setting and links to database (support multiple sites)
* save content pages to local files.
* allow off-line reading from local files.
for more details, please visit https://github.com/casualwriter/...