Web scraping

Turns out we can get information off the internet



Services to extract information from web pages.

Some of these use browser automation although that is kind of its own thing.

Scrapy

Scrapy is a python library to do that. Companion project scrapy-rss converts my parsings into RSS feeds.

Also there is a custom cloud service (scrapinghub) that will deploy it for you on a massive scale if you want.

Scrapoxy automates deployment of distributed cloud for this purpose.

Incoming


No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.