Tracking my website traffic


I would like to record traffic to my website using some minimalist tracker system which does not give my users’ data to google (So the classic google analytics tracker is right out.)

For my purposes I currently use Gauges, which at USD6/month is probably too expensive for the minimal service they actually provide me. Also their site is so derelict that I am pretty sure the business will fold soon. Anyway, there is a script to download all the data that I care about (just popularity) at the bottom of this page.

If I had time I might build a lighter cheaper option using some serverless functions. There are some examples of that below.

If this were a priority for me I might try to engineer my own solution using serverless.

At some point it probably becomes onerous to keep track of the various privacy requirements of web trackin in your jurisdiction. I am not expert in that.

Here are some DIY options/

Here are some open source semi-DIY options.

some SaaS options are here

  • Segment)
  • Gauges is what I currently use, although they are not that cheap and have a suspicious air of disrepair
  • Loggly has Tracking Pixel and JavaScript Logging and their free tier is probably appropriate for my needs. They would need regular data export… oh but wait they do not offer data export on the free tier. Forget that.

A comprehensive list of alternatives is onurakpolat/awesome-analytics: A curated list of analytics frameworks, software and other tools.

geocoding

If I want to roll my own I will possibly want geocoding so I can locate my readers. In fact, if would rather store a geolocation estimate than an actual IP for privacy reasons. Here are some options:

# Database URL
https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-City-CSV&license_key=YOUR_LICENSE_KEY&suffix=zip

# SHA256 URL
https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-City-CSV&license_key=YOUR_LICENSE_KEY&suffix=zip.sha256

Download my gaug.es data

#! /usr/bin/env python3
"""
Export traffic using
https://get.gaug.es/documentation/reference-listing/content/
"""
import requests
import sys
import re
from time import sleep
import datetime
from socket import timeout
import json

MAX_ATTEMPTS = 10


def next_chunk(
        url=f'https://secure.gaug.es/gauges/SITE_ID/content'
        ):
    for attempt in range(MAX_ATTEMPTS):
        r = requests.get(
            url,
            headers={
                "X-Gauges-Token": TOKE
            },
            timeout=5.0
        )
        if r.status_code == 200:
            return r.json()
        else:
            r.raise_for_status()
        sleep(5)

def get_nexturl(resp):
    if resp["urls"]["next_page"] is not None:
        return resp["urls"]["next_page"]
    elif resp["urls"]["older"] is not None:
        return resp["urls"]["older"]
    return None


def squish(resp):
    return [_squish(record, resp["date"]) for record in resp["content"]]


def _squish(record, date):
    del(record["title"])
    del(record["url"])
    record["date"] = date
    return record

def main():
    tracks = []
    resp = next_chunk()
    print(resp)
    tracks.extend(squish(resp))
    nexturl = get_nexturl(resp)
    while nexturl is not None:
        sleep(1)
        resp = next_chunk(nexturl)
        print(resp)
        tracks.extend(squish(resp))
        nexturl = get_nexturl(resp)

        with open("track.json", "w") as h:
            json.dump(tracks, h, indent=1)

if __name__ == "__main__":
    main(*sys.argv[1:])