Quarto integrated website system

Academic blog publishing that is easy on me, albeit hard on my computer

December 1, 2023 — April 23, 2024

academe
faster pussycat
how do science
javascript
julia
language
making things
plain text
premature optimization
python
R
writing
Figure 1

Quarto includes its own website system, which supplements pandoc’s inbuilt toolchain with a javascript-based build system using standard HTML tools such as Bootstrap, Sass and EJS.

Notably this site is built using quarto’s native website system.

1 Vibes

tl;dr

Does enough of what I want that I will probably use it, despite some qualms. I can ignore most of the complexity, it delivers what I would like, mostly, and has an active, friendly developer community. It was not really designed for websites as hefty as this one and has performance issues.

After giving it a thorough testing, I have feelings and thoughts about quarto.

Quarto’s strong ecosystem is a strong recommendation. There are vibrant discussion boards, many active developers, many active users, and good integrations into various IDEs (VS Code, RStudio etc). It has a corporate sponsor, Posit FKA RStudio, who have some extra add-ons they sell, some of which I am a fan of (Shiny, Posit Connect). I am on-record saying that a vibrant community is a better test of usefulness and predictor of future support for some software than my feeble, biased personal, aesthetic judgement. But read if for some reason you nonetheless desire to know my aesthetic judgement.

I can report that the resulting system is more tightly integrated than hugo is with blogdown, but not substantially simpler. I count that as a marginal win.

Quarto is more opinionated than blogdown if I use the built-in website system (although in principle I could build my own different website system). If I happen to like a 2-3 column layout blog with standard features (search, overview by date) everything is easy. OTOH, deviating from this layout is difficult and poorly documented. For example, it is not trivial to vary the CSS framework from the default Bootstrap.

The resulting websites are not high-performance for large website, as you might be noticing on this on. The are hefty, and slow to build. Since I am not a web developer but rather an academic, this price seems acceptable to me for my specific use case — the opinionated default is pretty close to what I want — but this might not be the optimal trade-off if your own needs differ.

There are some worrying signs of code chaos. Quarto websites will not win the The Grug Brained Developer seal of approval. The code is sprouting features at an alarming rate, for one thing. On the forums we learn that the code has some band-aid bits, e.g. there are two colliding template systems in use whose relationship is under-documented. A core developer has left the project and would like a minimalist holiday. Nonetheless, every system has its pain points, and I am willing to give this system a go. The code chaos is not yet worse than other systems I have tried.

The theming and site structuring is somewhat less flexible than hugo, the backend used by blogdown, but the integration with said backend is better, so is the overall experience is somewhat better on net, since much of the flexibility of hugo was useless to me in any case, hidden behind feature mismatch. By the same token, quarto leverages many more features of pandoc than was possible with blogdown, which leads to many well-supported advanced typographical features.

If one wished to use the quarto engine to experiment with weird alternate features (such as the content ranking, recommendation or the quirky indexing systems as seen on this site) then one is, AFAICT, out of luck we can use custom listings. That seems to do about 80% of what I want, albeit slowly. YOLO! Let us 80/20 it.

2 Community support

3 Quarto websites are slow to load

Figure 2: Quarto on this website produces a comically large front page download per default.

Quarto websites can enormous compared to the equivalent blogdown site. Browser memory usage by all the javascript wizardry etc is substantial. Even though they look like small and efficient static sites, the actual cost of all the dynamic behaviour adds up; Or we could say: quarto is probbly mostly thinking about tiny websites at this stage of its development, and big projects like the one you are reading now push its limits.

The listings in particular, can be huge. When I migrated this website to quarto the front-page download went from less than 1MB to 135MB, which is, for reference, comically huge for a 3 paragraphs and list of the titles of the last 10 blog posts.

I assume that this is because the listing on that front page, in order to provide dynamic sorting etc, loads essentially all the posts on the blog, no matter how old, and their associated images, at full resolution. AFAICT quarto does not generate image thumbnails, and also AFAICT the listing system is was not built around lazy loading.

Potential partial solution via image resizing:

Those extensions did not actually work for me, so I wrote a custom script that postprocesses the site.

4 quarto websites are slow to build

tl;dr a typical CLI invocation of blogdown was less than one second, maybe longer if I was working on a complicated computation, say 1 minute. A typical CLI invocation of quarto render for this site takes about 12 minutes.

I am surprised how much I miss the speed and efficiency of blogdown, with its smugly high-speed hugo backend. I honestly thought the speed was not an issue until switching from blogdown to quarto website made my site muuuuuuuuch slower. To build the 1000+ posts on this site typically took hugo a few seconds.

There seems to be a lot of re-rendering bullshit going on. There are various tricks to make rendering go faster, such as caching the code execution, but ultimately quarto render is still slow, with even the most aggressive cache settings, compared to hugo.

AFAICT the problem is partly that the quarto website engine is slower than hugo, and partly that quarto is re-rendering everything every time (in the sense of converting markdown to HTML, not of executing code inside the markdown, which we can avoid by using the cache and freezefacilities). Blogdown+hugo was smart enough to only re-render the things it needed to, and reused the HTML from before, so there were few things it needed to. I think? Or maybe hugo is just much faster because it is a compiled binary that doesn’t arse around with javascript and pandoc and stuff. Or both. Knowing which will not change much for me so I will not investigate further for now.

UPDATE: according to a core dev

In generalities, our runtime is roughly spent 2/3s inside Pandoc, and 1/3 in Deno. Our Pandoc filtering infrastructure is pretty extensive, and some of our early decisions there have performance consequences that we didn’t foresee: we’re now working on them. In Deno, our performance profile is relatively flat, and so the work is going to be more of the “continued small fixes” kind.

I suspect the friction might be that the default quarto workflow favours a small number of immaculate, unique snowflake documents, where as I am more like a sit on the snow machine and make a blizzard kind of guy.

The simplest workaround seems to be to use quarto preview --no-serve, which only renders the recently changed things, and so is much faster than rendering 1000+ things. You might think that is a sufficient solution. Unfortunately not. Because:

  1. quarto preview is still not that fast. It takes 32 seconds on this machine, on a typical invocation, to decide what part of this blog incrementally render, which is already 5 or 6 times slower than blogdown took to finish an incremental render and build.
  2. I can leave quarto preview running, which amortises the start-up time, but this uses colossal amounts of RAM if the site is large; I guess it is serving the site from memory?
  3. This quarto preview server process burns a surprising about of compute. It tanks my battery if I try to edit my blog on a long trip, much more rapidly than, for example, running a full-feature realtime audio workstation.
  4. The file watcher gets stuck sometimes, and does not notice changes. This annoyance is minor, but real, because I waste attention each time it happens trying to work out if the problem is quarto or me. As an intermittent bug involving concurrency, I expect this one to persist for a long time, based on experience of similar bugs in similar projects.
  5. Even if I leave quarto preview running it is still very slow, because it generates the index pages for this blog (which are large lists of 1000+ other blog posts) on every minor page change.
  6. The precise degree of slowness is particularly corrosive to my personal attention span. 30 seconds is long enough to drag productivity to a halt, but not long enough to make me go and do something else.
  7. A full re-render while running the preview server seems to lead to inconsistent and messy results, sometimes crashing the preview server, or leaving detritus lying around. So the server is not set-and-forget, but rather a thing I need to babysit, and make sure it does not clash with a full re-render.
  8. The quarto server itself is not that amazingly fast at serving files, for some reason. It seems to block a lot and take ages to serve me a page, even if it has already rendered the page.
  9. Sometimes, and I do not know why, the server decides a full-re-render of the site is necessitated, although I have done nothing, and suddenly previewing that one change is now a 12 minute wait.

4.1 Accelerate deployment

When deploying to a static server from the git repository such as netlify the build time is unfortunately prohibitive. I only get a few free build hours per month, which basically restricts me to a weekly publication schdeule.

We can economise on build-time by committing the raw site HTML/JS. This leads to a huge repository and horrible diffs and also tends to crash the preview server during merges. Merging can be made easier via the git merge theirs trick. This feels ugly though.

Maybe don’t publish from a git provider but rather use the publish command to upload files directly. Specifically, the variant that turns off the excessive, needy interaction:

quarto publish netlify --no-render --no-browser --no-prompt

This copies (in my case) thousands of files to the server every time I deploy, which feels like a waste of network time, but at least it is saving me time managing the merge failures and server compute budget.

5 Theming

Custom HTML theming is not too bad for simple CSS tweaks. Although the documentation is brusque, this part mostly “just works” in the sense that if I guess what to do it usually ends up being correct.

See

5.1 Template mechanics

General notes: there are two parallel template systems, Pandoc Templates and EJS Templates, which have a confusing and AFAICT undocumented separation of responsibilities.

  • although the pandoc templates are mentioned under the journal format, they are universal and apply to all formats. (Bigger lesson: The journal format documentation seems to function as the “generic advanced quarto” documentation and is much more general than you might assume)
  • EJS templates are website format specific.
  • although both EJS and pandoc formats include partial templates, these partials are not compatible or connected and have a different syntax. I suspect that means that if I wish to customise the metadata in a listing, and in a specific page, I will end up implementing it in two different syntaxes, in two different template systems
  • The relationship can be complicated; for example even though the HTML templates are rendered by pandoc, the website system performs major surgery on them by a combination of EJS templating and javascript post-hoc modification. Discovering which line of HTML output is generated by which system is a forensic operation.

Gotchas:

For some reason I do not understand, in EJS it is best to wrap even templates in markup:

```{=html}
<table>
<tbody>
<tr>
<th scope="row">Hello</th>
<td><strong style="background-color:purple; border-radius: 9px; padding: 5px;">text</strong></td>
<td>1</td>
</tr>
</tbody>
</table>
```

Symptoms of not doing that include batshit crazy bananas fuckery of an unpredictable nature, except when sometimes it just totally works as expected.

5.2 Listings

The next level of sophistication after customising CSS is customising content overviews.

Index pages are called “listings”, and customisation of listings is supported, and reasonably powerful, but fragile; The errors that I get if I do something wrong are utterly baffling. See Document Listings for the basics and Custom Listings to get fancy.

tl; dr:

title: "Listing Example"
listing:
  contents:
    - "reports/*.qmd"
    - "lab-notes/*reports.qmd"

Various things about them are not obvious to me. Here are some discussion I am having about them:

If you can set up what you want using just front matter YAML config, things are simple. OTOH, for this blog I needed to use custom listing templates, and that got complicated.

Currently the template development development workflow is stilted since “resource files” such as custom templates are not watched in preview mode. actually watched in v1.5. This fact necessitates a lot of restarting the preview server to display updates. The best solution I have found is to constantly restart the preview server on a consistent port, telling it not to open a new browser, and press refresh in the browser every time I kill one copy of the server, in a mindless loop:

while true; quarto preview; end

5.3 Individual pages

OK, what if we do not what to change CSS style, OR a custom listing, but do something more complicated, like change the layout of a single page?

At the single page level we need to know about the (at least) two interacting template systems involved in the websites per default, EJS and the pandoc template system. Poking around the code reveals that their interaction is messy and non-obvious to an outsider. Some stuff is generated by the lower level pandoc templates, but these are then thoroughly transformed by the EJS website-mashing system. It isn’t really clear what to update to what.

There is a system of template-partials which should allow us to override small bits of the page for minor adjustments, but documentation is incomplete. Custom templates are mentioned under HTML Options, and there is some incomplete documentation at Template partials, but it seems that the best reference of how to use them is the source code or perhaps user forums. Templates for individual pages are complex; AFAICT the default HTML page for a single post is the pandoc HTML template but then there is a whole bunch of EJS stuff that gets smushed into that grandaddy pandoc template in a non-trivial manner. AFAICS, you can override the pandoc stuff by defining a custom template or template-partial, but the EJS stuff is more of a look-but-do-not-touch thing that you modify through settings, unless we are talking about a listings page in which case there an EJS API which we are invited to fiddle with using a different syntax. Got it?

Gotcha: pandoc templates seem to include the similar-looking html.template and template.html. Which to use? AFAICT it is html.template; the other one is, I think, a copy of the pandoc default template, kept around for reference.

I am currently tracking the following forum discussions for help trying to improve the display of metadata on this blog:

After a while I settled on the following for title-block.html:

<header id="title-block-header">
$if(title)$<h1 class="title">$title$</h1>$endif$
$if(subtitle)$
<p class="subtitle">$subtitle$</p>
$endif$
$for(author)$
<p class="author">$author$</p>
$endfor$

$if(date)$
<p class="date"><span class="created">$date$ </span>$if(date-modified)$
<span class="modified">— $date-modified$</span>
$endif$</p>
$endif$
<span class="ratings">
    <span class="rating rating-usefulness-${if(usefulness)}${ usefulness  }${else}0${endif}"></span>
    <span class="rating rating-certainty-${if(novelty)}${ novelty  }${else}0${endif}"></span>
    <span class="rating rating-novelty-${if(certainty)}${ certainty  }${else}0${endif}"></span>
    <span class="rating rating-polish-${if(polish)}${ polish  }${else}0${endif}"></span>
</span>
$if(audience)$
<div class="audience">
    <span class="notification-title">Assumed audience:</span>
    <p>$audience$</p>
</div>
$endif$
$if(content-warning)$
<div class="content-warning">
    <span class="notification-title">Content warning:</span>
    <p>$content-warning$</p>
</div>
$endif$
$if(abstract)$
<div class="abstract">
<div class="abstract-title">$abstract-title$</div>
$abstract$
</div>
$endif$
</header>

5.4 Preview server

The quarto preview server invoked by quarto preview, is, per default, slightly too clever for my taste.

Excessive cleverness 1: It tries to do something fancy with process management. I am not sure what the nature of the fanciness is, but the upshot is that the server is a mediocre citizen of the command-line environment. If I run it in the background it magically daemonises or something, which makes it hard to kill. If I run it in the foreground, it is reluctant to die when I press ctrl-c. This is especially annoying because sometimes the build process will hang and cannot be quit from the CLI. One reason this seems to happen if a template pops the EJS stack, because I am building a custom listing or something. The server process is a deno executable, so the following will salvage the situation:

killall -9 deno

However, if I am running other deno processes on my computer, this will kill those too. I do not otherwise use deno so have not solve that problem.

Alternatively, it seems to get the message if I kill the parent process, so keeping a shell JUST for the server, and then nuking the shell gets the job done.

OTOH, if I run the preview server at the same time as a render process, it will die spontaneously sometimes.

Excessive cleverness 2: I am discombobulated when the quarto server tries to persuade my browser to view “latest” updated page, since I am usually editing a few pages at once, and do not enjoy having my 7 open tabs suddenly decided to show me the same thing, instead of the 7 different things I wished to see. Infuriatingly, the back button does not work to undo this. Avoid this behaviour with

quarto preview --no-navigate

Excessive cleverness 3: Quarto chooses a new random port for the server each time, which is cute, but makes those 7 preview tabs impossible to bookmark and terrible for my browser workflow. I fix a predictable port thusly:

quarto preview --port 8887

Putting these together, my invocation for a preview is

quarto preview --port 8887 --no-navigate --no-browser

We can equivalently encode that in a project setting in _quarto.yml:

project:
  type: website
  output-dir: _site
  preview:
    port: 4321
    browser: false
    navigate: false

5.5 Bootstrap, bootswatch, dark mode

There is a hairball of tangled themeing and variable systems involved in choosing te styling of the page. I am trying desperately not to understand it, but unfortunately it is obtrusive. The key thing to realise is that there are SCSS variables that are used to set the theme, and also CSS variables that are used to set the theme, and which one to use to change what or whose variables will get propagated to what is kind of a specialist engineering, where the “bootstrap” CSS themes clash with the CSS technology. I am not a neophyte to CSS, I’ve been doing it reluctantly for decades. This must be pure torture for people who do not have that background.

For one example, the navigation headers, for some unknowable reason, are not controlled by the SCSS variables, decided that they are in “dark mode” and made themselves illegibly pale even though I do not mention dark mode anywhere on the site and all the relevant colours in my stylesheet are dark. After trying to change many variable names to fix them I settled upon this SCSS

.navbar {
    // --bs-navbar-color: #050505;
    // --bs-nav-link-color: #050505;
    // --bs-navbar-color: $body-color;
    // --bs-nav-link-color: $body-color;
    font-family: $headings-font-family;
    background-image: $bg-shaded-image, $bg-image;
    background-color: $bg-color;
    color: $body-color;
    .navbar-brand{
        // Cannot fucking work out where the header color gets set to something dumb
        color: $body-color;
    }
    // Trying to eliminate that fucking pale header color die die die
    .navbar-nav .nav-link {
        // color: var(--bs-body-color) !important;
        color: $body-color !important;
    }
}

I have a vague suspicion that this leaves a half-digested bolus of undigestible CSS rules clogging the browser, but I have run out of care.

I switched different lines of this declaration on and off mindlessly until it worked. Key point: I will never support “light” and “dark” modes for this site. If that is your passion, write your own stylesheet..

6 Supporting scripts

The keyword to inject headings into the page is includes, for example, include-in-header or include-after-body.

8 Tips

9 Migrating from blogdown

A few people found it easy.

For my purposes I found it best to script a migration, so I could benefit from all the latest fun features. Here is the one-time migration script I used.

#! /usr/bin/env python
"""
walk tree, replace .Rmd with processed .qmd
"""
from pathlib import Path
import sys
from ruamel.yaml import YAML
import re

yaml = YAML(typ='rt')


def replace_internal_links(input_string):
    # Regex pattern to find internal links with optional leading '/',
    # and an optional fragment identifier
    pattern = r'\[([^\]]+)\]\((\.?\/.*?)(\.html)(#.*?)?\)'
    # Replacement pattern, including the fragment identifier if it exists
    replacement = r'[\1](\2.qmd\4)'

    # Replace the found patterns with the new format
    return re.sub(pattern, replacement, input_string)


def replace_math_delimiters(input_string):
    # Function to determine the replacement based on single-line or multi-line
    def replacement(match):
        text = match.group(1)
        if '\n' in text:
            # Multi-line match
            return f'$$ {text} $$'
        else:
            # Single-line match
            return f'${text}$'

    # Replace math: \( ... \) with $ ... $ or $$ ... $$ depending on single-line or multi-line
    math_pattern = r'\\\((.*?)\\\)'
    input_string = re.sub(math_pattern, replacement, input_string, flags=re.DOTALL)

    # Replace display math: \[ ... \] with $$ ... $$
    display_math_pattern = r'\\\[(.*?)\\\]'
    input_string = re.sub(display_math_pattern, r'$$\1$$', input_string, flags=re.DOTALL)

    return input_string

def read(fname):
    metadata = {}
    outlines = []
    with open(fname, 'r', encoding='utf8') as fp:
        lines = fp.readlines()

    if len(lines) == 0:
        return {}, ""

    if lines[0] == ('---\n'):  # YAML header
        # Load the data we need to parse
        to_parse = []
        for i, line in enumerate(lines[1:]):
            # When we find a terminator (`---` or `...`), stop.
            if line in ('---\n', '...\n'):
                # Do not include the terminator itself.
                break

            # Otherwise, just keep adding the lines to the parseable.
            to_parse.append(line)

        parsed = yaml.load("".join(to_parse))

        for k in parsed:
            name, value = k.lower(), parsed[k]
            metadata[name] = value

    else:
        for i, line in enumerate(lines):
            kv = line.split(':', 1)
            if len(kv) == 2:
                name, value = kv[0].lower(), kv[1].strip()
            else:
                break

    if len(lines) > i+2:
        for line in lines[i+2:]:
            outlines.append(line)
    return metadata, "".join(outlines)


def write(fname, metadata, content):
    with open(fname, 'w', encoding='utf8') as fp:
        fp.write('---\n')
        yaml.dump(
            metadata,
            fp,
        )
        fp.write('---\n')
        fp.write(content)


def massage_one_file(rmdname):
    """
    very minor tweaks to update for Quarto metadata.
    """
    stem = str(rmdname.stem)
    htmlname = rmdname.with_suffix('.html')
    yamlname = rmdname.with_suffix('.yaml')
    bibname = rmdname.with_suffix('.bib')
    is_listing = False
    is_tag = False

    if rmdname.parts[1] == 'tags':
        is_tag = True

    if stem.startswith('_index'):
        # ignore the path part, use the parent dirname as the base of a new file
        pathparts = rmdname.parts[1:-1]
        if len(pathparts):
            if is_tag:
                pathparts = tuple(['_tags', *pathparts[1:]])
            qmdname = Path(*pathparts).with_suffix('.qmd')
        else:
            qmdname = Path('index.qmd')
        is_listing = True
        indexpath = str(qmdname.stem)
    else:
        # Skip the first part and reconstruct the path
        qmdname = Path(*rmdname.parts[1:])

    qmdname = qmdname.with_suffix('.qmd')
    newyamlname = Path(*yamlname.parts[1:])
    newbibname = Path(*bibname.parts[1:])

    metadata, rmdcontent = read(rmdname)
    images = metadata.get('images', [])
    if len(images) > 0:
        metadata['image'] = images[0]
    if 'description' in metadata:
        #rename to 'subtitle'
        metadata['subtitle'] = metadata['description']
        del metadata['description']
    if 'modified' not in metadata and 'date' in metadata:
        metadata['date-modified'] = metadata['date']
    if 'modified' in metadata:
        metadata['date-modified'] = metadata['modified']
        del(metadata['modified'])
    if 'tags' in metadata:
        new_tags = [s.replace("_", " ") for s in metadata['tags']]
        # rename to 'categories'
        metadata['categories'] = new_tags
        del metadata['tags']
    if is_listing and not is_tag:
        metadata['listing'] = {
            'contents': indexpath,
            'feed': True,
        }

    qmdname.parent.mkdir(parents=True, exist_ok=True)
    qmdcontent = replace_internal_links(rmdcontent)
    qmdcontent = replace_math_delimiters(qmdcontent)
    write(qmdname, metadata, qmdcontent)
    print(f"writing {rmdname} to {qmdname}")

    if rmdname.is_file():
        rmdname.unlink()
    if htmlname.is_file():
        htmlname.unlink()
    if yamlname.is_file():
        yamlname.rename(newyamlname)
    if bibname.is_file():
        bibname.rename(newbibname)


def main():
    glb0 = "content/**/*.Rmd"
    #TODO: check for non-relative paths
    paths = Path('').glob(glb0)
    for fname in paths:
        massage_one_file(fname)
    # move the remains per default maybe that works OK
    glb1 = "content/**/*"
    paths = Path('').glob(glb1)
    for fname in paths:
        newfname = Path(*fname.parts[1:])
        newfname.parent.mkdir(parents=True, exist_ok=True)
        if fname.is_file():
            fname.rename(newfname)
            print(f"renamed {fname} to {newfname}")

if __name__ == "__main__":
    main(*sys.argv[1:])