Web scraping

Turns out we can get information off the internet

2016-06-16 — 2023-01-24

browser

computers are awful together

confidentiality

diy

doing internet

faster pussycat

Services to extract information from web pages.

Some of these use browser automation although that is kind of its own thing.

1 Scrapy

Scrapy is a Python library to do that. Companion project scrapy-rss converts my parsings into RSS feeds.

Also, there is a custom cloud service (scrapinghub) that will deploy it for you on a massive scale if you want.

Scrapoxy automates deployment of distributed cloud for this purpose.