Tracking my website traffic

Optimising scarce attention into the shopping cart widget

2021-01-30 — 2022-04-16

Suspiciously similar content

I would like to record traffic to my website using some minimalist tracker system which ideally does not give my users’ data to Google.

This is often conflated with “analytics” which is, as far as I can tell, the process of putting your traffic data into a dashboard which looks colourful in the annual report.

For my purposes, I currently use Gauges, which at USD6/month is probably too expensive for the minimal service they actually provide me. Also, their site is so derelict that I am pretty sure the business will fold soon. On the plus side, they are easy to use, simple, and provide me the data I want. Hopefully, they do not share their data with anyone? I have not really been sufficiently diligent there? I just assumed that since they are not as all-pervasive as Google, they cannot be as toxic. There is a script to download all the data that I care about (just popularity) at the bottom of this page.

If I had time, I might build a lighter, cheaper option using some serverless functions. There are some examples of that below.

At some point, it probably becomes onerous to keep track of the various privacy requirements of web tracking in your jurisdiction. I am not an expert in that.

1 DIY

I can build my own analytics platform, which is insane NIH generally, except my needs are so extremely simple that it just might work.

sbstjn/serverless-analytics: Track website visitors with Serverless Analytics using Kinesis, Lambda, and TypeScript.
Wesley Stratton, Static Website Analytics
Christian Behler, Creating My Own Web Analytics Tool

2 Open source, self-hosted analytics

Alternatively, I can spin up a server and use an open-source analytics option, which is also insane because it is overkill. A whole server sitting there doing nothing but waiting for mouse clicks? That I am supposed to maintain? 🤮

If you are a more dedicated hobbyist than me, here are some open-source semi-DIY options, in descending order of modernity.

umami (node.js, postgres)
Plausible: Self-Hosted Google Analytics alternative (written in Elixir, but it is all dockerized so you can ignore that) (source: plausible/analytics.)
Matomo Analytics (PHP IIRC)
Open Web Analytics (PHP IIRC)

3 SaaS

Probably the most sensible except for the weird pricing structures that always seem to be targeting a web store with 100 times my traffic

Plausible Analytics has a hosted option
umami has a hosted option
Panelbear (has a viral tech stack info post)
Segment
Gauges is what I currently use, although they are not that cheap and have a suspicious air of disrepair
Loggly has Tracking Pixel and JavaScript Logging and their free tier is probably appropriate for my needs. They would need regular data export… oh but wait they do not offer data export on the free tier. Forget that.

A comprehensive list of alternatives is onurakpolat/awesome-analytics: A curated list of analytics frameworks, software and other tools.

4 To Google Analytics or not

Question: Are my Google search results harmed by not using Google Analytics to track my traffic? Obviously, I am keen not to do market research for them for free, but if my website tracking facilitates the prominence of my website then I guess I am no longer doing it for free? Is this a real thing? What is my price for selling your info to the man? It turns out to be extremely hard to get actual information on this. Google seems to be secretive, and all the obvious searches on this theme are squatted by SEO firms trying to sell me their product.

https://search.google.com/search-console/welcome
Analytics Help
Analytics panel

5 Geocoding

If I want to roll my own, I will possibly want geocoding so I can locate my readers. In fact, I would rather store a geolocation estimate than an actual IP for privacy reasons. Here are some options:

ipstack is an online-only API
IPGeolocation.io
Maxmind offers some useful stuff
- the low-resolution offline geoIP database, GeoLite2 Free Geolocation Data
- has an R backend, rgeolocate/README.md at master
- also a Python one, geoip2 (source).

# Database URL
https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-City-CSV&license_key=YOUR_LICENSE_KEY&suffix=zip
# SHA256 URL
https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-City-CSV&license_key=YOUR_LICENSE_KEY&suffix=zip.sha256

6 Getting data out in a sane format

I do not want their weird dashboards. I want numbers I can crunch.

6.1 Download my gaug.es data

#! /usr/bin/env python3
"""
Export traffic using
https://get.gaug.es/documentation/reference-listing/content/
"""
import requests
import sys
import re
from time import sleep
import datetime
from socket import timeout
import json

MAX_ATTEMPTS = 10

def next_chunk(
        url=f'https://secure.gaug.es/gauges/SITE_ID/content'
        ):
    for attempt in range(MAX_ATTEMPTS):
        r = requests.get(
            url,
            headers={
                "X-Gauges-Token": TOKEN
            },
            timeout=5.0
        )
        if r.status_code == 200:
            return r.json()
        else:
            r.raise_for_status()
        sleep(5)

def get_nexturl(resp):
    if resp["urls"]["next_page"] is not None:
        return resp["urls"]["next_page"]
    elif resp["urls"]["older"] is not None:
        return resp["urls"]["older"]
    return None

def squish(resp):
    return [_squish(record, resp["date"]) for record in resp["content"]]

def _squish(record, date):
    del(record["title"])
    del(record["url"])
    record["date"] = date
    return record

def main():
    tracks = []
    resp = next_chunk()
    print(resp)
    tracks.extend(squish(resp))
    nexturl = get_nexturl(resp)
    while nexturl is not None:
        sleep(1)
        resp = next_chunk(nexturl)
        print(resp)
        tracks.extend(squish(resp))
        nexturl = get_nexturl(resp)

        with open("track.json", "w") as h:
            json.dump(tracks, h, indent=1)

if __name__ == "__main__":
    main(*sys.argv[1:])