Web scraping

Turns out we can get information off the internet

2016-06-16 — 2023-01-24

Services to extract information from web pages.

Some of these use browser automation although that is kind of its own thing.

1 Scrapy

Scrapy is a Python library to do that. Companion project scrapy-rss converts my parsings into RSS feeds.

Also, there is a custom cloud service (scrapinghub) that will deploy it for you on a massive scale if you want.

Scrapoxy automates deployment of distributed cloud for this purpose.