Powerpage Web Crawler
p/powerpage-web-crawler
#powerpage, #javascript, #web-crawler
CK Hung
Powerpage Web Crawler — #powerpage, #javascript, #web-crawler
1
Powerpage Web Crawler is a portable lightweight web crawler using Powerpage. it is powerful and easy-to-use web scrawler suitable for blog site crawling and offline-reading.
Replies
CK Hung
Maker
Powerpage Web Crawler is a **portable** lightweight web crawler using Powerpage. It is a simple html/js application in about 350 lines code using Powerpage. Powerpage Web Crawler is powerful and easy-to-use web scrawler suitable for blog site crawling and offline-reading. Just simply define below, for example * base-url := https://dev.to/casualwriter // the home page of favor blog site * index-pattern := none // RegExp of the url pattern of category page * page-pattern := /casualwriter/[a-z] // RegExp of the url pattern of content page * content-css := #main-title h1, #article-body //css selector for blog content Program may * crawl all category pages. * find out all urls of content page. * crawl content for one page, or all pages. * save setting and links to database (support multiple sites) * save content pages to local files. * allow off-line reading from local files. for more details, please visit https://github.com/casualwriter/...