Web scraping

Turns out we can get information off the internet

June 17, 2016 — January 24, 2023

Figure 1

Services to extract information from web pages.

Some of these use browser automation although that is kind of its own thing.

1 Scrapy

Scrapy is a python library to do that. Companion project scrapy-rss converts my parsings into RSS feeds.

Also there is a custom cloud service (scrapinghub) that will deploy it for you on a massive scale if you want.

Scrapoxy automates deployment of distributed cloud for this purpose.

2 Incoming