davidyat.es

Up to eleven

Mon, 07 Apr 2025 07:11:00 +0200

Last year I celebrated this blog’s tenth anniversary, so this year’s retrospective post marks eleven. Since that last retrospective, my posts have mainly been spurred by two manic projects: doing increasingly complicated things with Hugo render hooks and doing 60% of Advent of Code 2024 using a different programming language for each challenge.¹ I also reviewed a bad movie and, mostly recently, covered the latest advancements in AI image generation.

Though this year was not the most prolific in post count, the three AoC posts are all in the top five longest articles on this blog, so it’s been a prolific year in word count. I could have split those into fifteen posts rather than three, but that felt a bit spammy and would have overwhelmed other site content.

Speaking of other site content, it’s been a while since I looked over my most popular posts in one of these retrospectives. Here are the top ten in analytics-enabled views over last year, in ascending order this time:

# 10. Notes on contentEditable and HTML injection (2016)

Some web security musings, based on my accidental discovery that dragging text from elsewhere on a web page into a contentEditable block would preserve the formatting.

Try it now right here! It still works!

# 9. Product placement in memes (2018)

An exercise in (over)analysing funny pictures online and contemplating the existence of a very stealthy product placement campaign through images like this one:

The eternal boomer

Many of the references in this article are as dated as you’d think (remember Tide Pods?), but the paunchy boomer Wojak has had surprising staying power, though he appears to have lost his Monster Energy sponsorship at some point in the last seven years. He has also flattened in connotation — you don’t hear much about the “30-year-old boomer”, a prematurely aging elder Millenial who acts like his Baby Boomer forebears. Now the image is used to denote actual Baby Boomers, and even that is flattening out to just “old people”.

It would be flattering to think that this post made the list for insightful cultural commentary, but it’s probably just because of people searching for memes and hotlinking my images.

# 8. Text editors II: Acme (2015)

Not long after I started this blog, I had the notion that I would write a series on different text editors. I started with one on Vim, my primary editor for a couple of years at that point. The obvious choice for the second article would have been Emacs, but I wanted to throw everyone a curveball and so I wrote about Acme instead.

This is probably only in the top ten because it’s one of the very few online articles about the Acme text editor, which never saw the kind of community support that grew up around Vim and Emacs. It was a futuristic, mouse-driven editor from the 90s, part of Plan 9, a futuristic operating system from the 90s. As everyone still uses operating systems and editors from the 80s, its time has yet to come.

This colour scheme is not compatible with the age of dark mode.

After this post, it took me five years to write something about Emacs, and it’s now been five years since that post. The great problem with this series is that I always go back to using Vim, which I still do most of my writing in. The series has been going on so long that I have a nascent post about Atom (RIP) in my drafts. I could probably do a post on VS Code or Cursor, but I don’t know that there’s much point in writing about the editor everyone already uses and its AI son.

# 7. The many (bad) interfaces of Substack (2024)

The most recent post on this list, a brief rant about how nice it would be if Substack had been designed their website as platform reading blogs, rather than an ersatz email inbox grafted to a Twitter clone grafted to a Telegram clone. I’m still kind of amazed that they found a way to apply my concept of toxic skeuomorphism to other digital interfaces.

# 6. Review: The King in Yellow (2015)

A good review that I still stand by. Read the first half of the book, and leave the rest.

# 5. Writing a LaTeX macro that takes a variable number of arguments (2016)

Previously seen on 2022’s list. I write a lot less LaTeX these days, and I don’t particularly miss it. Even when I wrote a lot of it, I usually found ways to do things in Lua instead. These days, I tend to reach for HTML and CSS when doing document rendering, despite their shortcomings for print. I’m watching Typst with some interest, but it’ll be a while before it reaches anything close to feature parity with more mature solutions.

# 4. Encrypting a second hard drive on Ubuntu (post-install) (2015)

Also on 2022’s list. It’s a useful tutorial and still works, as far as I’m aware.

# 3. Review: The General Zapped an Angel (2014)

The oldest post on this list and my first (non-backdated) book review. I would write it much differently today, and probably be more critical of the collection, but I still agree with the general thrust of the review.

Again, this article’s SEO probably benefits from the book’s relative obscurity. If it’s known for anything, it’s probably the cover art’s resemblance to the cover of a famous anime film.

The General Zapped an Angel (1970)

The End of Evangelion (1997)

Maybe Hideaki Anno saw the cover of this book in a library once and it stuck in his mind. Or maybe it’s just a wild coincidence.

A very wild coincidence

# 2. Review: The Man from Earth: Holocene (2020)

I’m glad this is getting attention because I really want to save people from watching this terrible sequel to a wonderful film. I don’t think it’s been very widely reviewed, so again my blog benefits from being one of the few places you can read about this particular subject. If you must watch it despite my warnings, at least stop before the post-credits scene.

I don’t have many film reviews on this site, but they’re all negative. I suppose I should get around to reviewing a movie I actually like, if only to prove I don’t have it out for the medium as a whole.

# 1. GPU passthrough: gaming on Windows on Linux (2016)

I have a confession to make: at some point between writing the post linked above and the one you’re reading now, I built a new PC. On this new PC, I dual-boot Linux and Windows.

Largely, this is because I have more use for my dedicated GPU on Linux, between running AI models and playing the increasing number of games that run here natively. My old passthrough setup meant using my dedicated GPU exclusively on the Windows VM and my onboard GPU exclusively on the Linux host.

A setup that reflects how I want to use my computer today would require a way to switch between cards relatively seamlessly. I’d want to start Linux using my dedicated GPU. Then, I would want to have the GPU passed to the Windows VM when/if it was started up, with the host falling back to the onboard GPU – being able to use both computers simultaneously is what makes the setup better than dual-booting. Finally, I’d want the dedicated GPU to return to the host’s control after I shut down the Windows VM.

Based on some things I’ve read and some configurations friends have shared with me, I think this is possible, or at least mostly possible. I just haven’t gotten around to actually implementing it. When I do, I’ll be publishing a new GPU passthrough article. If you’ve implemented anything like this, I’d appreciate a webmention or email from the controls at the bottom of this page.

Looking over this list, it’s clear that my most popular content is niche technical tutorials and niche media reviews. The 2010s are pretty well represented and the 2020s significantly less so, but my blogging cadence has been much slower this decade. I also haven’t really written tutorials in the same way – most of my recent technical posts are framed more as explorations or walkthroughs of something I did rather than instructions for doing that thing yourself. Worse for SEO perhaps, but it’s not like I’m serving ads.

The point of this blog has always been to write about things that interest me in a way that entertains and informs others. I think most of the posts above do that, and I hope to write many more, as the fancy takes me.

Energy and interest permitting, I may return to the final ten challenges, but I feel like I’ve gotten everything I wanted to out of the project for the moment. ↩︎

Reply via email

Adventures in pixel space

Sun, 30 Mar 2025 21:29:39 +0200

I ended my last post about AI image generation on the following note:

DALL-E 3’s ability to create pleasant, mostly coherent images with plenty of fine detail from text prompts alone makes many of the Stable Diffusion techniques above feel like kludgey workarounds compensating for an inferior text-to-image core.

[…]

I think it’s also likely that OpenAI is leveraging the same secret sauce present in ChatGPT to understand prompts. As far as they’re concerned, generating pictures is a side-effect of the true goal: creating a machine that understands us.

Eighteen months later, OpenAI has done it again: released a new text-to-image model that blows the competition out of the water and shows that the alpha in improved prompt adherence is far from exhausted.

The image generator has entered the public consciousness largely through a trend of converting memes, selfies, and historical photographs into pictures in the style of Studio Ghibli films, muppets, and South Park characters.¹

“Break a leg!”

This general idea has been a perennially popular way of using AI, but the results have never looked this good. Many different approaches are available:

You can prompt for “X in the style of Y”.
You can prompt for “X in the style of Y” and use the picture you want to change as an input image.
You can prompt for “X in the style of Y”, using a specially fine-tuned model or LoRA.
You can prompt for “X in the style of Y”, use IPAdapter with a style reference and a content reference.
You can prompt for “X in the style of Y”, using a reference or style ControlNet.

Methods 1 and 2 have historically been very hit-and-miss. Methods 3 to 5 can be combined, but require a lot of fiddling with settings and patience with RNG. With gpt4o’s image generation, method 2 beats everything I’ve seen from any other model, local or otherwise. But style transfer is far from gpt4o image generation’s sole use.

You can make precise edits to images, and produce paragraphs of 99% correct text.

Original

Watchmen-style textboxes added

More text generation

Prompt adherence is good enough to correctly compose an image from Scott Alexander’s 2022 stained glass challenge.

Generate a stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth

First attempt

Second attempt

It also rises to the more recent challenges of creating an analogue clock with correct numbers, and a full wine glass:

It doesn’t appear to solve the 10:10 problem though.

Full to the brim.

It’s possible to get images with transparent backgrounds:

Or recreate one image in the style of another:

Content – The Lost Room, 2006

Style – The Excavation of Hob’s Barrow, 2022

Result – Lost Room: The Game

Or combine the contents of three or more images in complex ways:

Images that have been through a few rounds of manipulation seem to develop a yellow filter.

This result was not terribly satisfactory as the wine glass also didn’t stay full and the clock’s numbering suffered a little. I also hit a rate limit when generating it, which may be partially responsible for the horizontal truncation. Still, I’m amazed that this was possible just by uploading the three images and asking for the clock to go on the wall and the wine glass to go on the table. Good luck doing that in any other diffusion model without opening an image editor!

Most impressively, gpt4o can generate multiple images with consistent characters in a consistent style. It can actually create coherent comics in response to detailed prompts specifying what each panel should contain!

This last one is really fun – you can tell ChatGPT a story and it will illustrate it in real time.

In my previous post, I briefly mentioned how ChatGPT would expand and vary user-supplied image generation prompts before feeding them to DALL-E 3. This allowed for levels of context not possible in standalone generation workflows – the language model could and would use information from your chat history to generate its prompts. So you could say things like, “Generate an image of a man named Bob with a curled moustache and a bowler hat”, followed by, “Now show Bob with his pal Charlie, who has a beard and a chef’s hat”, rather than having to write the prompt from scratch each time. Or you could say, “Generate another one of those but leave out the bowler hat.”

GPT4o’s image generation compounds this advantage. Rather than being an external model tool that GPT knows how to interact with, it’s a fundamental part of the model itself. Image generation benefits from the concepts understood by the text generation part of the model.

But that’s not all! The title of this post references the fundamental difference between this image generation technique and the open and proprietary image models that have dominated the scene since DALL-E 2. While DALL-E 2, Midjourney, Stable Diffusion, and most other image models use diffusion to generate images, this new model is autoregressive. This is actually an older approach to image generation, harking back to DALL-E 1, and a standard approach to text generation.

Diffusion models create images by starting with a canvas of random noise and repeatedly denoising it based on a text prompt. This means that the whole image is drawn at once, starting with blurry shapes that define the composition and progressing toward fine detail. This image from my first Stable Diffusion post gives an idea:

In contrast, autoregressive models generate images in batches of pixels, generally moving from the top-left to the bottom-right. Each patch is predicted with the context of the previous patches, just like how text generation works. You can see this at work in how gpt4o slowly draws images from top to bottom (occasionally getting stuck partway due to rate limits or emergent content violations).

What’s also interesting about this image is that the unrendered part looks a little like a very early-stage diffusion generation. There’s some speculation that 4o uses a combination of an autoregressive model and diffusion. This may explain why edited images always contain minor differences beyond the intended edit(s), as shown in a couple of outputs above.

Image generated by gpt4o.

OpenAI’s stuff is all proprietary and top secret for reasons of Safety™, so we don’t know exactly what they’re doing here, or exactly how they’ve made the autoregressive approach totally leapfrog diffusion for image generation.² We can speculate that this more sequential approach leads to 4o’s greater image coherence and relative lack of weird artefacts, extra limbs, and other problems that plague diffusion models. Or perhaps the output quality is more down to gains in prompt understanding brought about by the model’s multimodal nature. Notably, though, Gemini 2.5 and Grok 3 are also multimodal models with autoregressive image generation but do not achieve nearly the same output quality. Perhaps this comes down to model size or training data quality.

Jealous, Gemini?

Whatever the case, the results speak for themselves. Workflows that previously required an SSD full of LoRAs, monstrous ComfyUI workflows, meticulous inpainting, and endless re-de-noising can now be accomplished in a chat. There is probably still some utility to these and other diffusion workflows – some parts of the robot comic above could benefit from inpainting, for example. And until 4o-level models become more widely accessible, diffusion remains the state of the art for local generation. Perhaps until Deepseek releases a new iteration of the Janus series.

You can also make Studio Ghibli scenes in the style of real photographs. ↩︎
Amusingly, this is happening at the same time as researchers are looking into generating text with diffusion rather than autoregression, citing significant increases in generation speed. ↩︎

Reply via email

Advent of Code 2024: Days 11–15

Sat, 11 Jan 2025 06:13:44 +0200

Advent of Code is a yearly programming event. For the 2024 edition, I decided to complete each challenge in a different language. After trying a lot of different language paradigms over the last two parts of this series, from rules-based to functional to logic to array, this next part represents something of a comedown. All of the languages I used for these five days were some flavour of imperative, and most were languages I’ve used before.

2D grid puzzles are a recurring theme this year, and I took the opportunity to use some gamedev frameworks for a couple of the challenges below.

# Day 11: Rust

Challenge: iterate over a list according to a set of rules and see how long it gets.

This was a deceptively simple challenge. You’re given a starting list containing two integers and a set of three rules for producing a new list. Two of the rules increase the integer, and the other rule splits it into two elements.

I’ve been using Rust quite a bit lately for non-AoC reasons, so I had to fit it in somewhere and this seemed like as good a place as any. The first part of the challenge was a simple matter of implementing the rules as described, which Rust’s match statement was well-suited for.

    for _ in 0..blinks {
        let mut new_stones = Vec::new();
        
        for stone in stones {
            match stone {
                // 0 -> 1
                0 => {
                    new_stones.push(1);
                }
                // even-length number -> split in half
                n if n.to_string().len() % 2 == 0 => {
                    let s = stone.to_string();
                    let (first, second) = s.split_at(s.len() / 2);
                    new_stones.push(first.parse::<u64>().unwrap());
                    new_stones.push(second.parse::<u64>().unwrap());
                }
                // all else -> multiply by 2024
                _ => {
                    new_stones.push(stone * 2024);
                }
            }
        }
        stones = new_stones;
    }

    println!("{:?} stones", stones.len());

Full code for part 1 on GitHub.

The first part was simple and the second part was deceptive. All that was required in the second part was to up the iteration count from 25 to 75. However, while the code produced an answer for 25 iterations pretty much instantly, it was not up to task of doing 75 with any reasonable amount of memory or time.

The example input showed that some of the elements repeated, so the first change I made was to create a cache hash map that would store the next iteration for each number after calculating it.

    let mut cache: HashMap<u64, Vec<u64>> = HashMap::new();
    for _ in 0..blinks {
        let mut new_stones = Vec::new();
        
        for stone in stones {
            let mut added: Vec<u64> = Vec::new();

            match stone {
                // get from cache
                n if cache.contains_key(&n) => {
                    added.extend(cache[&n].clone());
                }
                // 0 -> 1
                0 => {
                    added.push(1);
                }
                // even-length number -> split in half
                n if n.to_string().len() % 2 == 0 => {
                    let s = stone.to_string();
                    let (first, second) = s.split_at(s.len() / 2);
                    added.push(first.parse::<u64>().unwrap());
                    added.push(second.parse::<u64>().unwrap());
                }
                // all else -> multiply by 2024
                _ => {
                    added.push(stone * 2024);
                }
            }
        }
        // add to stones
        for &n in &added {
            *stones.entry(n).or_insert(0) += count;
        }

        // add to cache
        if !cache.contains_key(&stone) {
            cache.insert(stone, added);
        }
    }

    println!("{:?} stones", stones.len());

This didn’t help all that much. I thought about the problem some more, and realise that I would still end up building a really long list. Seeing as there were going to be a bunch of repeated elements, to the point where I was caching them, wouldn’t it make more sense to store the list as a hash map of unique elements with their counts? The order of the elements didn’t matter for either the iterations or the solution. So that’s what I did.

    for (stone, count) in old_stones {
            let mut added: Vec<u64> = Vec::new();

            match stone {
                // get from cache
                n if cache.contains_key(&n) => {
                    added.extend(cache[&n].clone());
                }
                // 0 -> 1
                0 => {
                    added.push(1);
                }
                // even-length number -> split in half
                n if n.to_string().len() % 2 == 0 => {
                    let s = stone.to_string();
                    let (first, second) = s.split_at(s.len() / 2);
                    added.push(first.parse::<u64>().unwrap());
                    added.push(second.parse::<u64>().unwrap());
                }
                // all else -> multiply by 2024
                _ => {
                    added.push(stone * 2024);
                }
            }

            // add to stones
            for &n in &added {
                *stones.entry(n).or_insert(0) += count;
            }

            // add to cache
            if !cache.contains_key(&stone) {
                cache.insert(stone, added);
            }
        }
    }

    println!("{:?} stones", stones.values().sum::<u64>());

This implementation produced a result for 75 iterations pretty much instantly.

Full code for part 2 on GitHub.

Closing thoughts: The first time I looked at Rust, I came away scratching my head at all the crazy symbols and blocks everywhere. “What the hell is &mut?” I asked. Having now spent some time going through the official tutorial, working with some Rust codebases, and asking Claude about the weirder-looking bits of code I encounter, I’ve come to have quite an appreciation for the language. I’ve always been a big switch-case appreciator, so Rust’s match is something I like a lot. Having such expressive syntax and functional-type built-ins like map and fold in a fast, compiled language feels like cheating.

Rather than expecting the programmer to manage memory manually like in C or using a garbage collector like in a high-level language, Rust uses a concept called ownership to prevent memory bugs. Properly internalised, this should provide the speed of C with the safety of Java. My mental conception of it is still a bit fuzzy, so dealing with ownership largely consisted of asking Claude to do the correct referencing and dereferencing in my code.

# Day 12: C

Challenge: figure out the areas and perimeters of interlocking fields in a 2D grid.

After the previous day’s challenge, it was time to shake off all that rust (memory safety) and work in venerable old C.

Reading in the input file (a grid of characters) was a lot more manual than in higher-level languages, but it’s good to be reminded of such things now and then.

    // Read garden map
    rows = 0;
    cols = 0;
    char line[MAX_COLS];
    
    while (fgets(line, sizeof(line), file) && rows < MAX_ROWS) {
        // Remove newline if present
        size_t len = strlen(line);
        if (len > 0 && line[len-1] == '\n') {
            line[len-1] = '\0';
            len--;
        }
        
        // Skip empty lines
        if (len == 0) continue;
        
        // Set cols based on first line
        if (rows == 0) {
            cols = len;
        } else if (len != cols) {
            printf("Inconsistent line length at row %d\n. Expected %d, got %d", rows, cols, len);
            fclose(file);
            return 1;
        }
        
        for (size_t i = 0; i < len; i++) {
            garden[rows][i] = line[i];
        }
        garden[rows][len] = '\0';
        rows++;
    }

    fclose(file);

To calculate the areas and perimeters of interlocking fields, I did a depth-first search on the grid.

// DFS to calculate area and perimeter of a region
void calculate(int r, int c, char plant_type, int *area, int *perimeter, int *corners) {
    visited[r][c] = true;
    (*area)++;
    
    // Count perimeter edges
    for (int i = 0; i < 4; i++) {
        int nr = r + dr[i];
        int nc = c + dc[i];
        
        if (nr < 0 || nr >= rows || nc < 0 || nc >= cols || garden[nr][nc] != plant_type) {
            (*perimeter)++;
        } else if (!visited[nr][nc]) {
            calculate(nr, nc, plant_type, area, perimeter, corners);
        }
    }
}

The final result was the sum of the area times the perimeter of each region. So far, so simple.

For the second part of the challenge, it was necessary to figure out how many sides each region had.

AAA
AAA --> four sides
AAA

BBB
BBBB --> six sides
BBBB

Counting sides seemed kind of tricky. Counting corners was a lot easier, and would produce the same result, so I did that. The following function checks for corners by looking at the contents of the three cells around the target cell in each of the four cardinal directions.

// Count corners at a given cell
int count_corners(int r, int c, char plant_type) {
    int corners = 0;

    // Check if we have a corner in each of the 4 quadrants around this cell
    for (int i = 0; i < 4; i++) {
        // Get coordinates for the three cells we need to check
        int r1 = r + dr[i];
        int c1 = c + dc[i];
        int r2 = r + dr[(i+1)%4];
        int c2 = c + dc[(i+1)%4];
        int rd = r + dr[i] + dr[(i+1)%4];
        int cd = c + dc[i] + dc[(i+1)%4];


        // Inner corner if both adjacent cells are the same type
        // and the diagonal is different
        if (r1 >= 0 && r1 < rows && c1 >= 0 && c1 < cols && 
            r2 >= 0 && r2 < rows && c2 >= 0 && c2 < cols && 
            rd >= 0 && rd < rows && cd >= 0 && cd < cols &&
            garden[r1][c1] == plant_type &&
            garden[r2][c2] == plant_type &&
            garden[rd][cd] != plant_type) {
            corners++;
            continue;
        }
        
        // Outer corner if either adjacent cell is out of bounds or different type
        bool cell1_different = (r1 < 0 || r1 >= rows ||
            c1 < 0 || c1 >= cols || garden[r1][c1] != plant_type);
        bool cell2_different = (r2 < 0 || r2 >= rows ||
            c2 < 0 || c2 >= cols || garden[r2][c2] != plant_type);
        if (cell1_different && cell2_different) {
            corners++;
        }
    }
    return corners;
}

Full code on GitHub.

Closing thoughts: The only times when I really felt like I was writing in C was when I was checking for null bytes at the end of strings and passing pointers around. When working on the core of the puzzle, depth-first search and 2D grid navigation I felt like I could have been working in just about any language. Which I suppose is further evidence of C’s long shadow.

# Day 13: Octave

Challenge: find the right moves to win the prizes in a bunch of claw machines.

For this challenge, each claw machine has a single prize and two buttons. Each button moves the claw some distance along the X and Y axes. Button A costs three tokens and B costs one. You need to figure out the right number of times to press each button to reach the prize with minimum expenditure. Some of the machines are broken and have no solution.

After a little thinking, this challenge reveals itself to be a highschool algebra problem: each machine is a set of simultaneous equations. So this:

Button A: X+94, Y+34
Button B: X+22, Y+67
Prize: X=8400, Y=5400

Becomes this:

94a + 22b = 8400
34a + 67b = 5400

It therefore seemed appropriate to use a maths language. Matlab and Mathematica come most easily to mind, but both are proprietary and costly, so I went with GNU Octave, a free and open-source mathematical language designed to be very similar to Matlab.

pkg load symbolic

% Open the file and read the entire content
fileID = fopen('input.txt', 'r');
fileContent = fread(fileID, '*char')';
% ^ unmatched ' is transpose -- necessary to read file as lines rather than columns
fclose(fileID);

% Split the content into blocks
blocks = strsplit(fileContent, '\n\n');

% Initialize sums for A and B
sumA = 0;
sumB = 0;

% Process each block
for i = 1:length(blocks)
    lines = strsplit(strtrim(blocks{i}), '\n');

    % Extract numbers from the strings
    buttonA_nums = sscanf(lines{1}, 'Button A: X+%d, Y+%d');
    buttonB_nums = sscanf(lines{2}, 'Button B: X+%d, Y+%d');
    prize_nums = sscanf(lines{3}, 'Prize: X=%d, Y=%d');

    % Build the equations
    syms A B
    eq1 = A*buttonA_nums(1) + B*buttonB_nums(1) == prize_nums(1);
    eq2 = A*buttonA_nums(2) + B*buttonB_nums(2) == prize_nums(2);

    % Solve the equations (hardest part of the challenge in one built-in function)
    sol = solve([eq1, eq2], [A, B]);

    % Discard non-integer solutions
    if mod(sol.A, 1) == 0
        sumA = sumA + double(sol.A);
    end
    if mod(sol.B, 1) == 0
        sumB = sumB + double(sol.B);
    end

end

% Display the summed results
disp(sumA*3 + sumB);

Full code for part 1 on GitHub.

The second part of the challenge, explicitly designed for people like me, required the addition of 10 000 000 000 000 (ten trillion) to each prize co-ordinate. This was enough to make all co-ordinates significantly larger than the largest 32-bit integer – Octave’s default integer type. After converting all of my numbers to int64, I got a punch of precision errors. Following a bunch of fiddling with stubbornly broken code, I took a step back and asked what other ways there might be to solve simultaneous equations.

Cramer’s rule was not something I remembered from highschool algebra, but it was pretty simple to implement.

% Open the file and read the entire content
fileID = fopen('input.txt', 'r');
fileContent = fread(fileID, '*char')';
% ^ unmatched ' is transpose -- necessary to read file as lines rather than columns
fclose(fileID);

% Split the content into blocks
blocks = strsplit(fileContent, '\n\n');

% Initialize sums for A and B
sumA = int64(0);
sumB = int64(0);

% Process each block
for i = 1:length(blocks)
    lines = strsplit(strtrim(blocks{i}), '\n');

    % Extract numbers from the strings and convert to int64
    buttonA_nums = int64(sscanf(lines{1}, 'Button A: X+%d, Y+%d'));
    buttonB_nums = int64(sscanf(lines{2}, 'Button B: X+%d, Y+%d'));
    prize_nums = int64(sscanf(lines{3}, 'Prize: X=%d, Y=%d'));

    % Add the large number
    prize_nums = prize_nums + int64(10000000000000);

    % Build the Cramer equations
    a1 = buttonA_nums(1); a2 = buttonA_nums(2);
    b1 = buttonB_nums(1); b2 = buttonB_nums(2);
    c1 = prize_nums(1); c2 = prize_nums(2);

    % Calculate determinants
    D = a1 * b2 - a2 * b1;
    Dx = c1 * b2 - c2 * b1;
    Dy = a1 * c2 - a2 * c1;

    % Discard non-integer solutions
    if D ~= 0
        x = Dx / D;
        y = Dy / D;
        if mod(Dx, D) == 0 && mod(Dy, D) == 0
            sumA = sumA + x;
            sumB = sumB + y;
        end
    end
end

% Display the summed results
result = sumA * int64(3) + sumB;
disp(result);

Full code for part 2 on GitHub.

Closing thoughts: This was a place where the features of the language really came through for solving the challenge, to the point where the first part was really just a matter of string parsing and calling solve. I suppose the real trick was to figure out that you needed to use simultaneous equations.

I found Octave fairly simple to use, but in the usual course I would probably just do this sort of thing in Python. Octave’s symbolic package, which I used for the first part of the challenge, is largely a wrapper for SymPy.

# Day 14: Lua & Love2D

Challenge: simulate guard robot movements.

Lua was one of the languages on my shortlist for this challenge. My main previous experience with it was writing LuaLaTeX in Overleaf to create dynamic document templates – aside from that, I’d occasionally tinkered with Love2D, a Lua-based gamedev framework. A challenge involving guard robots moving through a 2D grid seemed like a good opportunity to bring in not only Lua, but Love2D.

Implementing the guard simulation in Love2D was pretty straightforward, but to actually solve the challenge you need pretty discrete results – which robots are in which grid spaces at precise times. Love2D is geared to show fluid movement in real-time, and getting it to do otherwise made me feel like I was fighting with my code. So I retreated to a pure Lua solution based on the text grid.

local function moveGuards(guards, mapWidth, mapHeight)
    for _, guard in ipairs(guards) do
        guard.x = guard.x + guard.vx
        guard.y = guard.y + guard.vy
        -- wrap at the edges
        if guard.x < 1 then
            guard.x = guard.x + mapWidth
        elseif guard.x > mapWidth then
            guard.x = guard.x - mapWidth
        end
        if guard.y < 1 then
            guard.y = guard.y + mapHeight
        elseif guard.y > mapHeight then
            guard.y = guard.y - mapHeight
        end
    end
end

local function calculateSafety(guards, mapWidth, mapHeight)
    local quadrants = {0, 0, 0, 0}
    local centerX = math.ceil(mapWidth / 2)
    local centerY = math.ceil(mapHeight / 2)

    -- Count robots in each quadrant
    for _, guard in ipairs(guards) do
        -- Skip robots exactly on center lines
        if guard.x == centerX or guard.y == centerY then
            goto continue -- Lua has no continue statement
        end
        -- Count each quadrant
        if guard.y < centerY then
            if guard.x < centerX then
                quadrants[1] = quadrants[1] + 1  -- Top-left
            elseif guard.x > centerX then
                quadrants[2] = quadrants[2] + 1  -- Top-right
            end
        elseif guard.y > centerY then
            if guard.x < centerX then
                quadrants[3] = quadrants[3] + 1  -- Bottom-left
            elseif guard.x > centerX then
                quadrants[4] = quadrants[4] + 1  -- Bottom-right
            end
        end
        ::continue:: -- We jump here (poor man's continue)
    end

    return quadrants[1] * quadrants[2] * quadrants[3] * quadrants[4]
end

local file = io.open("input.txt", "r")

local guards = {}
for line in file:lines() do
    local startX, startY, vx, vy = line:match("p=(%d+),(%d+) v=(%-?%d+),(%-?%d+)")
    -- ^ I can't believe it's not regex!
    table.insert(guards, {
        x = tonumber(startX+1), -- +1 to account for Lua's 1-based indexing
        y = tonumber(startY+1), -- +1 to account for Lua's 1-based indexing
        vx = tonumber(vx),
        vy = tonumber(vy),
    })
end

file:close()

local mapWidth = 101
local mapHeight = 103

for i = 1, 100 do
    moveGuards(guards, mapWidth, mapHeight)
end

print(calculateSafety(guards, mapWidth, mapHeight))

A few notes:

I learnt about Lua’s pattern matching doing this challenge. It is not regex, as implementing regex in Lua would practically double the size of its codebase. Luckily, I didn’t need to do any complicated text parsing for this challenge. This cheatsheet was helpful.
line:match is syntactic sugar for line.match(self,. I had also not previously used any of Lua’s OO capabilities.
Lua 1-indexes everything (as does Octave).
Lua does not have a continue keyword, so you have to use goto with a label. I am young enough that this maybe the third time I’ve ever written high-level code containing a goto.

Naturally, this code got me a result much quicker than my Love2D implementation.

Full code for part 1 on GitHub

The second part of the challenge comes totally out of left field: you have to find the smallest number of seconds it takes for the robots to arrange themselves in a Christmas tree pattern. I immediately had more questions, such as, should this be an actual tree or something in vague shape of one? At any rate, solving this problem with my current implementation would require a function for visualising the map:

local function visualizeMap(guards, mapWidth, mapHeight)
    -- Initialize empty map
    local map = {}
    for y = 1, mapHeight do
        map[y] = {}
        for x = 1, mapWidth do
            map[y][x] = 0
        end
    end

    -- Count guards at each position
    for _, guard in ipairs(guards) do
        local x, y = math.floor(guard.x), math.floor(guard.y)
        map[y][x] = map[y][x] + 1
    end

    -- Create string representation
    local result = {}
    for y = 1, mapHeight do
        local row = {}
        for x = 1, mapWidth do
            row[x] = map[y][x] > 0 and tostring(map[y][x]) or "."
        end
        table.insert(result, table.concat(row))
    end

    return table.concat(result, "\n")
end

I didn’t have any idea what the eventual tree would look like, so I didn’t really want to try writing code to detect shapes in guard positions. Luckily, I still had my abandoned Love2D implementation, which I could start up and just watch for patterns.

I soon noticed that the guards would periodically swarm to the vertical or horizontal center line of the map. As these lines were left out of the safety factor calculation, this meant that such a swarm would have to coincide with a low safety factor. Also it would make sense for the Christmas tree to appear in the middle of the map.

Initially, I tried looking for the lowest possible safety factor in a hundred, then one thousand, then ten thousand seconds. Although I could lie to myself that some of these outputs looked vaguely like Christmas trees, none were the correct answer. So I loosened my threshold a bit, and looked for outputs with safety factors lower than the first one I found, at the first second I’d noticed the middle convergence.

A few results down, I saw this, right in the centre of the map:

1111111111111111111111111111111
1.............................1
1.............................1
1.............................1
1.............................1
1..............1..............1
1.............111.............1
1............11111............1
1...........1111111...........1
1..........111111111..........1
1............11111............1
1...........1111111...........1
1..........111111111..........1
1.........11111111111.........1
1........1111111111111........1
1..........111111111..........1
1.........11111111111.........1
1........1111111111111........1
1.......111111111111111.......1
1......11111111111111111......1
1........1111111111111........1
1.......111111111111111.......1
1......11111111111111111......1
1.....1111111111111111111.....1
1....111111111111111111111....1
1.............111.............1
1.............111.............1
1.............111.............1
1.............................1
1.............................1
1.............................1
1.............................1
1111111111111111111111111111111

As we can see, none of the guards overlap in this pattern. That would probably also be a good heuristic for finding the Christmas tree, but I haven’t implemented it.

Full code for part 2 on GitHub.

After implementing some time manipulation functionality in my Love2D code, I managed to snap a screenshot of it as well.

Merry Christmas!

Full code for Love2D version on GitHub.

Closing thoughts: I think if I could get used to 1-indexing, I’d probably prefer it to 0-indexing. Sacrilege, I know. Lua is a great little language and I’m glad I used this opportunity to get a bit more experience with it and learn about things like its pattern-matching functionality.

Love2D is great fun, and I’m glad that my failed Love2D implementation of the first part of the challenge turned out to be useful for the second part.

# Day 15: JavaScript & KaPlay

Challenge: figure out where a malfunctioning robot will push boxes in a warehouse.

The challenge here was to build a self-playing sokoban game, making it the challenge most perfectly suited so far to a gamedev framework.¹

Initially, I thought about using PuzzleScript, a gamedev tool with a terse, rule-based language explicitly geared to the creation of sokoban-alikes. Half of the logic for the first part of this challenge is already implemented in the sample game, in a single line of code:

[ > Player | Crate ] -> [ > Player | > Crate ]

Translation: if player steps towards a crate, move both the player and the crate. To complete the logic, we just need to make crates push each other, which we can do with another very simple rule:

[ > Crate | Crate ] -> [ > Crate | > Crate ]

However, getting a result out of a PuzzleScript game would require some external method of:

Moving the player according to a preset list of moves.
Calculating and adding the positions of all the crates together.

This seemed like a lot more trouble than it was worth, even in the context of this self-imposed challenge. So I opted instead to write my self-playing sokoban game in JavaScript, using KaPlay, a lightweight and intuitive gamedev library I have a bit of prior experience with. KaPlay also happens to have functionality for storing levels as ASCII grids, making it well suited to AoC.

After creating a new project, loading up some sprites and making a sample level using 16x16 grid spaces, to semi-accomodate the large challenge input, I got to work with my implementation. Initially, I wrote a bunch of code for predicting which object(s) the player would collide with when moving in some direction, converting between real positions and grid positions, but then I realised that at some point since I last used it, KaPlay had added the method level.getAt which did all of this for me. That allowed me to implemented the movement and crate-pushing code in a few recursive functions.

    // Base movement function
    const move = (obj, dir) => {
        if (dir.x == 1) obj.moveRight();
        if (dir.x == -1) obj.moveLeft();
        if (dir.y == 1) obj.moveDown();
        if (dir.y == -1) obj.moveUp();
    };

    // Crate movement
    const moveCrate = (crate, dir) => {
        const crateMoveTo = crate.tilePos.add(dir);
        const crateDisplaced = level.getAt(crateMoveTo)[0];
        // ^ multiple objects could occupy the same grid space
        // but that shouldn't happen in this game

        // Moving into empty space: success
        if (crateDisplaced === undefined) {
            move(crate, dir);
            return true;
        }

        // Moving into a wall: failure
        if (crateDisplaced.is("wall")) {
            return false;
        }

        // Moving into another crate: pass the buck
        if (crateDisplaced.is("crate")) {
            const nextCrate = crateDisplaced;
            if (moveCrate(nextCrate, dir)) {
                move(crate, dir);
                return true;
            }
            return false;
        }
    }

    // Player movement
    const movePlayer = (dir) => {
        const moveTo = player.tilePos.add(dir);
        const displaced = level.getAt(moveTo)[0];

        // Moving into empty space
        if (displaced === undefined) {
            move(player, dir);
            return;
        }

        // Moving into a wall, cancel movement
        if (displaced.is("wall")) {
            return;
        }

        // Moving into a crate, try to push the crate
        if (displaced.is("crate")) {
            const crate = displaced;
            if (!moveCrate(crate, dir)) {
                return;
            }
        }

        // If we haven't returned yet, we can move
        move(player, dir);
    };

For testing, and to make it a real game, I bound movement to the arrow keys. I then implemented autoplay in two ways. Press Space, and all moves will be processed at once, changing the map and producing the answer in the blink of an eye. Press Enter, and you can watch the robot go through the moves one at a time.

Sokoban, featuring Bean, the default KaPlay player sprite

For the second part of the challenge, everything except the player/robot doubled in width, taking up two grid spaces. At this point, I regretted using GML back on day four, as GameMaker’s collision and grid functions are robust enough to handle this kind of thing with ease. Had I implemented part one in GameMaker, I would likely only have to have changed the width of the crates to get part two working as well.

However, I implemented it in KaPlay. And KaPlay’s level/tile system does not explicitly for cater double-wide objects. I would either have forgo a lot of the tile functionality or work with left and right crate halves, which would need to be manually kept together.

I chose to keep the very useful grid methods and represent each crate as left and right halves, which meant having to keep halves together. Due to the cascading push already implemented for the first part, horizontally pushing two separate crates positioned next to each other was functionally the same as pushing two linked crates. It was when crates had to be pushed vertically that I actually needed new code to keep halves together.

I tried a few different tweaks to my recursive code above, but couldn’t resolve this edge-case:

Pushing up…

Initial push fails, cascaded push succeeds

Annoyingly, this edge-case was not present in the example map, but did appear in my challenge input. All of the recursive methods I tried were just too eager to move boxes. I needed to change to something that would cancel the whole move as soon as one push failed.

So, taking a leaf from challenges 10 and 12, I rewrote the code to build up an array of things to move using a depth-first search.

    const movePlayer = (dir) => {
        const moveTo = player.tilePos.add(dir);
        let displaced = level.getAt(moveTo)[0];

        // empty space, go there
        if (displaced === undefined) {
            move(player, dir);
            return;
        }

        // wall, don't go there
        if (displaced.is("wall")) {
            return;
        }

        // not empty space or wall, must be a crate
        // let's build a stack of crates in our way
        let cratesToMove = new Set();
        let stack = [displaced];
        while(stack.length > 0) {
            let next = stack.pop();
            if (next) {
                if (next.is("wall")) return; // found a wall, cancel everything
                if (next.is("crate")) {
                    // add to crates to move
                    cratesToMove.add(next);
                    // add to displacement checking stack
                    stack.push(level.getAt(next.tilePos.add(dir))[0]);
                }
                if (dir.y != 0) { // special rules for vertical movement
                    // partner crate must also be able to move
                    if (next.is("left")) {
                        let partner = level.getAt(next.tilePos.add(vec2(1, 0)))[0]
                        cratesToMove.add(partner);
                        stack.push(level.getAt(partner.tilePos.add(dir))[0]);
                    }
                    if (next.is("right")) {
                        let partner = level.getAt(next.tilePos.add(vec2(-1, 0)))[0]
                        cratesToMove.add(partner);
                        stack.push(level.getAt(partner.tilePos.add(dir))[0]);
                    }
                }
            }
        }
        // move everything that needs to move
        for (const crate of cratesToMove) {
            move(crate, dir);
        }
        move(player, dir);
    };

What I’m happiest about with this code is that it works just as well for single and double-sized crates, and thus works as a solution for both parts of the challenge. It would also work on a map including a mix of short and long crates. To facilitate solving this part two, I added an upsize checkbox to the functionality that loads levels from files, and that’s all I needed.

Double-width Sokoban

Full code on GitHub

Closing thoughts: I have never been the biggest fan of JavaScript, but it’s one of those languages that you pretty much have no choice but to use in a lot of domains. As a result, I’m very comfortable with it, which has not been the case for most of what I’ve used for these challenges.

KaPlay is a great little library and one I’d like to use for a proper game in the future.

As may be evident from the date of this compared to previous posts, I’ve slowed down with this project a little. Write-ups for the remaining ten days of AoC 2024 will be posted eventually, probably with some other posts in between. I feel like I played it quite safe with the language choices for this selection of challenges, so I’m looking forward to trying some weirder ones in the next post.

One person went even further than this and implemented it in the meta-sokoban game Baba is You. ↩︎

Reply via email

Advent of Code 2024: Days 6–10

Mon, 23 Dec 2024 15:43:44 +0200

In the first part of this series, I starting writing solutions for the 2024 Advent of Code (AoC) challenges, each in a different language. To recap, Advent of Code is a yearly programming event. On each day of the advent leading up Christmas, a new challenge is released. Each challenge consists of a problem description and an input file. Challenges generally involve processing the input file to come up with a particular output, which serves as the solution. Each challenge consists of two parts, with the second part being revealed upon completion of the first. Both parts use the same input, which is slightly different for each participant.

Apart from the obvious goals of trying out new and different programming languages and revisiting old favourites, I’m also using this as a way to gauge how good AI (mostly Claude 3.5 Sonnet, running in Cursor) is at writing, explaining and refactoring code in different languages.

My single constraint was to use each language only once. See below for how I broke it.

# Day 6: Python & Inform 7

Challenge: count the locations a guard will visit on a map if he turns right at every obstacle.

As soon as I read this challenge, I regretted using Inform 7 for Day 1. As I noted then, Inform was not hugely suited for number crunching. However, something it is suited to is navigating grids! The problem immediately put me in mind of a maze of twisty little passages, all alike. And Inform even automatically records the number of rooms you visit!

The one problem would be generating the grid itself. As far as I’m aware, Inform 7 provides no method for programmatically creating rooms and the connections between them – you must manually specify each room and its connections, like this:

The Chamber is a room. The Chamber is north of the Hallway.

My input file for this challenge was a 129*129 map, and there was no way I was going to manually specify that many rooms. Thus I would need to generate the Inform code itself programmatically.

This dovetailed nicely with the problem of having already used Inform 7 once this advent. If I generated the Inform 7 code using Python, that would be sort of like using Python for this challenge, and I haven’t used Python yet. And as an added bonus/~~punishment~~, it would ensure I didn’t fallback to using Python in later, more difficult challenges.

I started with the Inform 7 code, which is very simple:

"Guard Gallivant Solution" by "David Yates"

Chapter 1 - Game Logic

Forward is a direction that varies. Forward is north. [i.e. var forward = north]

Patrolling is an action applying to nothing.

Understand "patrol" as patrolling.

Carry out patrolling:
	while the player is not in Endgame:
		try going forward. [move in the direction stored in forward]

Instead of going nowhere:
	now forward is right of forward; [i.e. forward = right_of(forward)]
	say "Turning right...";
	say "You have visited [number of visited rooms] rooms so far.";
	try going forward.

To decide what direction is right of (D - direction): [i.e. def right_of(D)]
	if D is north:
		decide on east; [i.e. return east]
	if D is east:
		decide on south;
	if D is south:
		decide on west;
	if D is west:
		decide on north.

Report going to Endgame:
    say "You have visited [number of visited rooms] rooms in total.";
    end the story.

Inform 7 is a rules-based language for building turn-based games. The code defines a bunch of rules (Instead of..., Report... and so), and every turn the engine evaluates and applies a subset of these rules. Rules can be generally applicable (Every turn) or highly specific (Instead of going from the ballroom when the player is Mr Mustard and has the candlestick). These rules are executed in a particular order, according to their placement in different rulebooks. Because the rule in Report going to Endgame is executed before the rule that marks Endgame as a visited room,¹ our room visited count will be accurate.

Instead of going nowhere is a rule that runs if the player tries to move in a direction that is not catered for. For example, if I’m in a room with other rooms to the south, east and west, and I try to go north, this rule will be triggered. Therefore, I just needed to create a game map in which rooms adjacent to obstacles did not have connections in that direction.

To solve the challenge, all the player would need to do is type patrol. Once I had the map generated, that is.

I generated the map by reading the file into a Numpy 2D array and then iterating over it. For the starting space and opening spaces on the map, I created rooms. Upon the creation of each room, I checked the cardinal directions from that room and created connections to surrounding open spaces. For rooms on edges of the map, I created connections to a special room, Endgame, which would signify the end of the patrol. For obstacles, I left a comment without creating a room or any connections.

print("INFORM CODE ABOVE")
print("""
Chapter 2 - The Map

Endgame is a room.
"""

import numpy as np

# create grid
grid = np.array([list(line.strip()) for line in open("small.txt")])

# generate I7 code
room_name = lambda y, x: f"P{y}-{x}"
for y, row in enumerate(grid):
    for x, cell in enumerate(row):
        # skip obstacle
        if cell not in '.^':
            continue

        # create room
        this_room = room_name(y, x)
        print(f"{this_room} is a room.")

        # create room connections
        for d, ay, ax in [("west", y, x-1), ("east", y, x+1), ("north", y-1, x), ("south", y+1, x)]:
            if not (0 <= ay < grid.shape[0] and 0 <= ax < grid.shape[1]): # out of bounds
                print(f"Endgame is {d} of {this_room}.")
            elif grid[ay, ax] in '.^': # in bounds and not an obstacle
                print(f"{room_name(ay, ax)} is {d} of {this_room}.")

        # create starting room
        if cell == '^':
            print(f"""
When play begins:
\tnow the player is in {this_room}.""")

The most annoying thing about printing the Inform code from Python, is that both languages have meaningful indentation, but Inform 7 insists that you use tabs and Python prefers spaces. I also might have simplified the code with more strategic room creation – Inform 7 doesn’t strictly require rooms to be declared with X is a room if there’s already a line that says X is [direction] of [existing room].

With my input file, this script produced ~78k lines of Inform code. Unfortunately, when I pasted it into a new project in the Inform 7 IDE, it refused to compile.

Some experimentation and a couple of questions on forums later, I figured out that I could get the code to compile from the command line by passing -no-index to the compiler, which prevented it from trying to generate such things as a world map for the IDE’s internal Index pane.

The world map for the challenge’s smaller sample map (the top right room should be on the bottom left)

The Inform 7 compiler produces a source file in Inform 6, which must then be run through the Inform 6 compiler to produce a playable game file for the chosen virtual machine. I wrote a script to go from Python to Inform 7 to Inform 6 to the Glulx format.² From there, I started up the packaged game file in glulxe-term and typed patrol to get my answer. The code for this ran much faster than the code for my Day 1 solution.

Playthrough for the challenge’s sample data.

Full code for part 1 on GitHub.

The second part of the challenge was to figure out how many different positions a single extra obstacle could be placed in to make the guard move in a loop. The approach to this that seemed obvious to me was this:

Run through the map once to build a set of the rooms visited.
Restart, run through again with the first room in the set blocked off.
Continue until we reach the edge of the map or encounter a loop.
Restart and run through again with the next room blocked off.
Repeat until we’ve tested blocking every room.

I think this is possible to do in Inform, but I’m not sure how. It would almost certainly be possible to do in Inform 6, a more standard language, though I rather feel that would defeat the point of the challenge. After experiencing repeated freeze-ups and crashes with my attempt at a solution for the sample input, I decided to shelve this one. I’ve uploaded my attempt here.

A pure Python solution is technically in line with the constraint I’ve set for myself, though not quite in the spirit of it (worse than using Inform 6? I’m not sure). With a heavy heart, I wrote the code to solving the second part in Python.

import numpy as np

def patrol(grid, loops=False):
    directions = [(-1, 0), (0, 1), (1, 0), (0, -1)]  # up, right, down, left
    in_bounds = lambda y, x: 0 <= y < len(grid) and 0 <= x < len(grid[0])

    y, x = np.where(grid == '^')[0][0], np.where(grid == '^')[1][0]
    visited = set()
    visited.add((y, x))
    direction = 0
    loop = False

    while in_bounds(y, x):
        my, mx = y + directions[direction][0], x + directions[direction][1]

        if not in_bounds(my, mx):
            break # we've left the map
        if grid[my, mx] == '#':
            direction = (direction + 1) % 4 # hit an obstacle, turn right
            continue

        y, x = my, mx # move

        # Record visited
        current = (y, x, direction) if loops else (y, x)
        if loops and current in visited:
            return (visited, True)

        visited.add(current)

    return (visited, loop) if loops else visited

grid = np.array([list(line.strip()) for line in open("input.txt")])

# Part 1
visited = patrol(grid)
print(len(visited))

# Part 2
loops = 0
for y, x in visited:
    if grid[y, x] == '^':
        continue # can't put an obstacle where the guard is
    grid[y, x] = '#' # new obstacle
    _, loop = patrol(grid, True)
    grid[y, x] = '.' # reset
    loops += 1 if loop else 0

print(loops)

Full code for part 2 on GitHub.

Closing thoughts: I remain very happy with the elegance of my Inform solution for part 1. The main difficulties it presented were in getting it to compile with the full input, not writing the code itself. Attempting and ultimately failing to complete part 2 in Inform was, out of all challenges so far, the one that took me the most time. I still think it’s possible, at least for very small maps, but I don’t quite understand the language or its environment well enough to make it work.

Claude 3.5 Sonnet surprised me with its competence at writing Inform for this challenge, though its contributions ultimately came to nothing. It still made some mistakes, but probably the same ones a human would. Inform might look like natural language, but entry number in my list is valid code and the number entry in my list is a syntax error.

# Day 7: Racket

Challenge: find valid operators for incomplete equations.

I’ve wanted to learn and use Lisp ever since I first read Paul Graham’s famous essay on the language. Over the years, I’ve occasionally toyed with Scheme, Clojure, Racket and Elisp, but I still have yet to do substantial original programming in any of them. So of course this exercise seemed like a good excuse to return to my favourite of the Lisps I’ve tried, Racket.

We begin, as always, by reading and parsing the input file:

#lang racket

; Load the input file
(define filename "input.txt")
(define file (open-input-file filename))
(define input-str (read-string (file-size filename) file))
(close-input-port file)

; Parsing function
(define (parse-input input-str)
  (let ([parse-line (lambda (line) ; Racket lets you use (), [] and {} interchangeably
          (map string->number (string-split (string-replace line ":" ""))))])
    (map parse-line
         (string-split input-str "\n"))))

; Parse the file contents
(define equations (parse-input input-str))

The lines of the file are in the form 190: 10 19, where the first number is the answer you need to find by inserting + or * operators between the rest. Like most functional languages, Racket has the concept of the head and tail of a list, i.e. the first element and the rest of the list.³ As the answer will always be the first element, I’m just stripping out the : and parsing every line into a simple list. So equations will be a list of lists.

To solve this, we’ll invoke a function that takes this list of lists and a list of valid operators. This function will find and apply all possible combinations of the operators to each list. The answers belonging to lists with at least one valid combination will then be summed.

(sum-calibration-equations equations '(+ *))
; the ' makes the operators an inert list rather than an instruction to be executed

But before we actually define sum-calibration-equations, we’re going to need some helper functions. First, apply-op, which we’ll use to apply an operator from our list to two numbers.

(define (apply-op op a b)
  (case op
    ['+ (+ a b)]
    ['* (* a b)]))

Next, a function to generate all possible combinations of length n:

; Recursively generate all possible operator combinations for n numbers
(define (generate-operator-combinations n operators)
  (if (= n 1)
      '(())  ; base case: no operators needed for 1 number
      (for*/list ([ops (generate-operator-combinations (- n 1) operators)] ; outer loop
                  [new-op operators]) ; inner loop
        (append ops (list new-op)))))

for*/list is a way to create a nested for-loop with a single function. The one above is equivalent to the following:

(for/list ([ops (generate-operator-combinations (- n 1) operators)])
    (for/list [new-op operators]
        (append ops (list new-op)))
)

Nested for loops are the hammer I use to solve all of these challenges, so I thought this was really cool.

The last helper function we need is try-combination, which will return the result from trying a list of operators on a list of numbers. Per the instructions, equations must be evaluated left to right.

(define (try-combination ns ops)
  (let loop ([result (first ns)] ; Start with first number
             [rest-nums (rest ns)] ; Rest of the numbers
             [rest-ops ops]) ; All operators
    (if (null? rest-nums)
        ; If there are no more numbers, return the result
        result 
        ; Otherwise apply operator to result & next number
        (loop (apply-op (first rest-ops) 
                       result
                       (first rest-nums))
              (rest rest-nums) ; Continue with remaining numbers
              (rest rest-ops))))) ; Continue with remaining operators

The let here defines a function named loop and immediately calls it with some initial values (so [ result (first ns)] means [parameter-name initial-argument]). It then recursively works down the numbers and operators, accumulating the final result.

Now we can define sum-calibration-equations:

(define (sum-calibration-equations input operators)
  (for*/sum ([calibration-equation input])
    (for/fold ([result 0]) ; assume result is 0, i.e. no valid combo
              ([combo (generate-operator-combinations
                     (- (length calibration-equation) 1) ; 1 fewer op than we have numbers
                     operators)])
    #:break (not (zero? result)) ; break when we find a non-zero result
    (if (= (first calibration-equation)
           (try-combination (rest calibration-equation) combo))
        ; ^ first and rest used to get answer and equation
        (first calibration-equation) ; found a valid answer, return it
        result))))

With Racket, I find myself writing code from the inside out. The core of this function is the if statement, which checks if a particular operator combo produces a valid result for a particular equation. I initially wrapped that in a for*/list, but that gave me every valid result when I only needed one per equation. So the AI suggested I replace it with a for/fold that uses this funky syntax to break. for/fold works a bit like let above.

Finally, I wrapped the for/fold that produced the answers for valid equations in a for/sum to add them all together and get the challenge solution.

I had an inkling that the second part of the challenge would involve additional operators and wrote my code with that assumption. I was correct – part 2 requires you to add the || or concatenation operator, which works like this:

123 || 456
=> 123456

To support this, we just have to add a third option to apply-op:

(define (apply-op op a b)
  (case op
    ['+ (+ a b)]
    ['* (* a b)]
    ['|| (string->number ; Convert numbers to strings, concat, convert back 
          (string-append 
            (number->string a)
            (number->string b)))]))

Then we can call sum-calibration-equations again with our new operator:

(sum-calibration-equations equations '(+ * ||))

Full code on GitHub.

Closing thoughts: I feel like I’m starting to see the much-vaunted power of Lisp through things like for*/list and let – these feel like programming idioms at such a high level that I hadn’t even considered them idioms before. At a few points, I was tempted to mush more of my code together with local function definitions, but that quickly becomes the kind of mess of brackets that makes people complain about Lisp.

This problem was not complex enough to require any of Lisp’s storied meta-programming, but I do feel as though a bit of the spirit of that came through in apply-op. It was immensely satisfying to complete part 2 with such a small change.

I had a bit of help from AI here, but not as much as I thought I’d need. Mostly I had Claude explain different functions to me – he it is pretty good at Racket.

# Day 8: Io

Challenge: map out the antinodes for pairs of antennas.

Io is another language I discovered through Seven Programming Languages in Seven Weeks. It’s a prototype-based language, like JavaScript. Unlike JavaScript, it commits to the prototype thing, implements it well, and doesn’t have any Java-style class syntax to hide it from you. As with Ruby, everything in Io is an object, even numbers. Method invocation (technically message-passing) is so central to the language that it’s automatically invoked when two items are placed side-by-side. For example:

"Hello world" println
==> Hello world

Chaining calls is simple:

"Hello world" exSlice(6)
==> world
"Hello world" exSlice(6) size
==> 5
"Hello world" exSlice(6) size sqrt
==> 2.2360679774997898

Io has very minimal syntax and implements everything with this kind of message passing. For example, if-else control flow can look like this:

(x == 3) ifTrue("Yes" println) ifFalse("No" println)

On the surface, working with operators (a := 1, 1 + 2, 5 - 4, 10 > 1, etc) appears to be an exception, but the operators themselves are only a thin layer of syntactic sugar over message passing. 1 + 2 sends the + message to 1 with 2 as an argument i.e. 1 +(2), and := wraps a setter method. Io even lets you define your own operators by adding them to the OperatorTable (which is used to determine precedence). Hmm…

This is another 2D grid challenge, so a bounds-checking operator could be quite useful. Let’s create one:

// Bounds-checking operator
OperatorTable addOperator("<>", 5)
// ^ put it in position 5 in the table so it will be
// evaluated with other comparison operators

Number <> := method(bound, (self >= 0 and self < bound))

Now we can use a <> b to check if a is less than b and non-negative. Because of how Io compiles code, this needs to go in a separate file from the implementation.

doFile("operators.io")
doFile("antinodes.io")

In antinodes.io, we’ll start by reading in the input file:

inputFile := File with("input.txt")
inputFile open
inputText := inputFile readToEnd
inputFile close

We then need to find all the antennas.

findAntennas := method(input,
    antennas := List clone
    input split("\n") foreach(y, line, # foreach provides an index and item
        line foreach(x, cell, // Python- and C-style comments are both valid
            if(cell asCharacter != ".", // without asCharacter we'd get 46
                antennas append(List clone append (x, y, cell))
            )
        )
    )
    antennas
)

As Io is a prototype language, we create new objects by cloning an existing one, usually a base object. List clone creates a new empty list, and List clone append(1, 2, 3) creates a new list containing (1, 2, 3).

Now we can calculate the positions of the antinodes. I found the challenge description a bit confusing, but the diagrams showed that each antinode should be as far from its antenna as it is from the other antenna, but in the other direction. Maybe there’s no way to explain that nicely in words.

calculateAntinodes := method(antennas, mapWidth, mapHeight,
    antinodes := List clone

    isValidPosition := method(x, y, width, height,
        x <> width and y <> height
    )

    addAntinodeIfValid := method(x, y, width, height,
        if(isValidPosition(x, y, width, height),
            antinode := List clone append(x, y)
            if(antinodes detect(n, n == antinode) == nil,
                antinodes append(antinode)
            )
        )
    )

    processAntinodePairs := method(a, b, width, height,
        x1 := a at(0); y1 := a at(1)
        x2 := b at(0); y2 := b at(1)
        dx := x2 - x1; dy := y2 - y1

        addAntinodeIfValid(x1 - dx, y1 - dy, width, height)
        addAntinodeIfValid(x2 + dx, y2 + dy, width, height)
    )

    antennas foreach(i, a,
        antennas foreach(j, b,
            if(a != b and a at(2) == b at(2),
                processAntinodePairs(a, b, mapWidth, mapHeight)
            )
        )
    )

    antinodes
)

Finally, we put it all together with this:

inputFile := File with("input.txt")
inputFile open
inputText := inputFile readToEnd
inputFile close

antennas := findAntennas(inputText)
mapWidth := inputText split("\n") at(0) size
mapHeight := inputText split("\n") size

calculateAntinodes(antennas, mapWidth, mapHeight) size println

For the second part of the challenge, we need to get antinodes at any (map-bound) distance from their antennas, as well as antinodes on the antennas themselves. This can be integrated into calculateAntinodes with an extra parameter:

calculateAntinodes := method(antennas, mapWidth, mapHeight, partTwo,
    antinodes := List clone

    isValidPosition := method(x, y, width, height,
        x <> width and y <> height
    )

    addAntinodeIfValid := method(x, y, width, height,
        if(isValidPosition(x, y, width, height),
            antinode := List clone append(x, y)
            if(antinodes detect(n, n == antinode) == nil,
                antinodes append(antinode)
            )
        )
    )

    processAntinodePairs := method(a, b, width, height,
        x1 := a at(0); y1 := a at(1)
        x2 := b at(0); y2 := b at(1)
        dx := x2 - x1; dy := y2 - y1

        if(partTwo,
            # Part 2: Add antennas as antinodes and calculate full path
            addAntinodeIfValid(x1, y1, width, height)
            addAntinodeIfValid(x2, y2, width, height)

            # Find antinodes in both directions
            pos := list(x1, y1)
            while(isValidPosition(pos at(0) - dx, pos at(1) - dy, width, height),
                pos = list(pos at(0) - dx, pos at(1) - dy)
                addAntinodeIfValid(pos at(0), pos at(1), width, height)
            )

            pos = list(x2, y2)
            while(isValidPosition(pos at(0) + dx, pos at(1) + dy, width, height),
                pos = list(pos at(0) + dx, pos at(1) + dy)
                addAntinodeIfValid(pos at(0), pos at(1), width, height)
            )
            ,
            # Part 1: Only check one point in each direction
            addAntinodeIfValid(x1 - dx, y1 - dy, width, height)
            addAntinodeIfValid(x2 + dx, y2 + dy, width, height)
        )
    )

    antennas foreach(i, a,
        antennas foreach(j, b,
            if(a != b and a at(2) == b at(2),
                processAntinodePairs(a, b, mapWidth, mapHeight)
            )
        )
    )

    antinodes
)

Full code on GitHub.

Closing thoughts: Io code looks a little strange at first, but once you get into it it doesn’t feel too different from any other high-level imperative language. I’m glad I used it right after Racket, because the languages have some interesting similarities and contrasts. Io is about as syntactically minimal as Lisp, but it’s much easier to read for two reasons:

Logic flows left to right.
Function execution is signalled by white space rather than brackets ("Hello world" println vs (display "Hello world")).

Fittingly, Io has similarly powerful reflection and metaprogramming abilities, which I didn’t really need for this simple challenge. I also made no use of its concurrency features or even custom objects, but I’m glad I found a use for OperatorTable.

Claude surprised me with its familiarity with this relatively obscure language, though I didn’t use it much besides having it do some refactoring. The official Io tutorial, which is more a of cheatsheet, was also very helpful.

# Day 9: J

Challenge: pack items from the right side of a list into empty spaces on the left.

For this challenge, I chose J, a language I’ve been intrigued by since reading about it on James Hague’s Programming in the 21st Century blog. It’s a very terse language capable of producing programs that look like a jumbled mess of characters, and so fairly intimidating at first glance.

J is an array programming language, in the lineage of APL. Unlike APL, it does not require a special keyboard to program in, as it only uses ASCII characters. An array programming language felt like a good choice for this challenge, as the input is a single very long list.

As the J Primer will tell you, users of J prefer to use natural language terminology when talking about elements of J. Sentences (statements) are made up of words, some of which are verbs (functions), others of which are nouns (variables/literals) and still others of which are adverbs or even copulas.

Despite all this talk of words, most of J’s syntax consists of symbols. There are single symbols, such as {, +, # and %, all of which do slightly different things than you might expect from other languages. Then there are words made out of groups of symbols, like |. <" and ~:. Occasionally you get single letters followed by dots: I. and E.. The things that look most like words, for. and if. and break., are not nouns or verbs, but control structures.

Each verb can have two forms: monadic and dyadic. The first takes one argument and the second takes two. These forms tend to do related but different things. For example, the monadic # a will count the values in a, and the dyadic 3 # a will create three copies of a.

This may start to make more sense as we get to some actual code. So let’s read the input file:

input =: fread 'input.txt' NB. read file
ints =: ". each input NB. convert each character to a number

The first line is simple enough, but what does ". mean? For the answer to this and many more questions we’ll have throughout this section, I direct you to NuVoc, a syntax cheat-sheet on the J wiki that I found invaluable for completing this challenge. This is the verb Do, which executes a sentence. If the sentence to be executed is a string containing numeric character(s), executing it will turn it into number.

   ". '1'
1

Now let’s write a function to expand the compressed representation of our disk map, as done in the challenge instructions.

NB. Expand to disk map
expand =: 3 : 0
    fileID =. 0
    disk =. 0 $ a: NB. empty array
    for_size. y do.
        if. 2 | size_index do. NB. free space
            NB. append a boxed . to disk SIZE times
            disk =. disk , (; size) # <'.'
        else. NB. file
            NB. append the boxed fileID to disk SIZE times
            disk =. disk , (; size) # <": fileID
            fileID =. fileID + 1
        end.
    end.
    disk
)

What does any of this mean? Like many functional languages, code here must be read right to left. Beyond that, writing J is an exercise in memorising symbols.

3 : 0 is used to define a monadic ~~function~~ verb. We could use 4 : 0 to define a dyadic verb. Monadic verbs take the argument y and dyadic verbs take the arguments x and y.
=. is used to define a local variable. =: is for globals.
Dyadic | is modulo.
Dyadic , is append (used for both arrays and strings).
Monadic < is used to box some input. Boxing allows us to make arrays containing different types. So we can make a single array containing numbers for the file IDs (which start at 0) and strings ('.') for the empty spaces. Without boxing, we’d need to make a string, which makes shifting things around problematic for file IDs greater than 9.
Monadic ; is used to unbox, or raze values. We have to unbox numbers before we can do numeric operations on them.
Dyadic # makes copies. For example, 3 # '.' would produce ....
) signals the end of the verb definition. It is intentionally unmatched.

Next, we’ll shift the file IDs on the right side of our disk map to free spaces on the left. We’ll do this by making lists of all the indices of spaces and files, reversing the list of file indices, and then using them to make substitutions.

NB. Move file segments into free spaces
fragment =: 3 : 0
    spaces =. I. y = <'.' NB. indices of free spaces
    files =. |. I. y ~: <'.' NB. indices of files in reverse order
    index =. 0
    for_space. spaces do.
        if. space > index { files do.
            break. NB. this space is to the right of this file, so we're done
        end.
        NB. replace current space with  current file segment
        y =. ((index { files) { y) space } y
        NB. replace current file segment with space
        y =. (<'.') (index { files) } y
        index =. index + 1
    end.
    y
)

Again:

Dyadic ~: is not equals.
Mondadic I. gets the indices of a list matching some condition. This was probably the second most useful word in this challenge.
Monadic |. reverses its input.
Dyadic { is array indexing (1 } y means y[1]).
Dyadic } is an adverb that makes a copy of y, using x to replace the contents of location(s) m – x m} y. For example, 'z' 0 } 'abc' would make 'zbc' and 'yz' 0 2 } 'abc' would make 'ybz'.

Making lists of indices was a little confusing at first, but it worked out well.

To get the answer for the first part of the challenge, we need a checksum function, which discards the spaces, multiplies the file IDs by their new indices and adds everything together.

checksum =: 3 : 0
    ints =. (<0) (I. y = (<'.')) } y
    +/ (i. # ints) * ; ints
)

In the first line, I used } to replace all .s with 0s. In the second, I did the checksum calculation. / is an adverb that inserts the verb on its left between the items of y. So +/ allows us to sum a list.

Appropriately for a challenge themed around disk defragmentation, the second part requires us to move files into free space contiguously, rather than chopping them up like we did in the first part.

At first, I considered trying to complete this challenge using the original input rather than the disk map. I made a bit of progress before realising that I would need to fit files into spaces larger than their sizes, thus creating new spaces and messing up the indices. Nevertheless, the defragment function I wrote to solve this part of the challenge does use the original input before expanding it into a disk map.

NB. Move files into free spaces, contiguously
defragment =: 3 : 0
    NB. make a backwards list of even integers half the length of y
    file_ids =. |. i.(>. -: (#y))
    disk =. expand y

    for_fid. file_ids do.
        positions =. (< fid) I.@:E. disk NB. indices of file ID in disk map
        file =. (# positions) # (< fid) NB. full size file
        space =. (# positions) # <'.' NB. same sized space
        space_enough =. {. space I.@:E. disk NB. find earliest big enough free space
        NB. only move if we have enough space and aren't moving right
        if. space_enough *. space_enough < {. positions do.
            disk =. (file) (space_enough +i.#file) } disk NB. replace free space with file
            disk =. (space) (positions) } disk NB. replace file with free space
        end.
    end.
    disk
)

This function loops backwards through file IDs, finds their positions in the disk map, finds the earliest free space large enough to fit the file, and then swaps the file for the free space, continuing until the next space is the right of the next file.

New symbols here:

-: is the idiomatic J way to halve something, rather than dividing it by 2 (% 2).
E. finds matches. @, a conjunction, composes two verbs. Composing I. with E. gives us the indices of each match.
{. gets the head, or first element, of y.
*. is boolean AND.
i.#file creates a new list the same size as file, starting at 0 (0 1 2...).⁴ Then space_enough + adds space_enough to each value therein. This gives us all the indices we need from the start of the space to insert our file with }.

Echoing the disk map at each step gave me the following output for the challenge’s sample data:

Full code on GitHub.

Closing thoughts: J is not a beginner-friendly language. Beyond just knowing what each symbol does, you need to make sure you’re using it in the right position relative to everything else on a given line. The error messages are singularly terse and unhelpful: the J console spits out the offending line and some text like domain error (problem with types), length error (array access issues), syntax error or even rank error. Because each line packs in so much functionality, knowing only the error type and line can still leave you scratching your head.

But enough complaining! J’s capable of some very interesting and powerful things if you can your head around the way it works. I’m sure there’s a way to write my solution for this problem in half the space. Conjunctions seem like a particularly powerful concept.

I found Claude Sonnet 3.5 markedly better than ChatGPT 4o at writing J, though neither could write very much correct J code at a time. To make matters worse, trying to Google anything about programming in J mostly gave me answers for Java (and sometimes Julia). Fortunately, the J Wiki is a pretty good resource, particularly the NuVoc page.

# Day 10: Solidity

Challenge: find all the possible hiking trails on a map.

Solidity is a C-like language for writing smart contracts for the Ethereum blockchain (and other EVM-based chains). Smart contracts are basically classes with (usually) a bunch of publicly callable functions. These functions can be called by other contracts and Ethereum end-users to do things like transfer and swap different cryptocurrency tokens, invest funds in lending pools, and much else besides. Code that makes changes to the chain state (e.g. sending tokens from one wallet to another) must be executed across the network, and therefore each opcode has an associated cost (called gas). For this reason and others, I’ll be running the code in a local dev environment, provided by Foundry.

Solidity is not particularly well suited to solving Advent of Code challenges, particularly ones involving strings. So it helps that the input for this challenge is a grid of numbers.

To start off with, a HikingTrail smart contract:

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;

contract HikingTrail {
    // the map
    uint8[50][50] public map;
    uint8 public rows = 50;
    uint8 public cols = 50;

    // navigation helpers
    struct Position {
        uint8 row;
        uint8 col;
        uint8 height;
    }
    int8[4] rowDirections = [int8(0), 0, -1, 1];
    int8[4] colDirections = [int8(-1), 1, 0, 0];

    // pass map in on deployment
    constructor(uint8[50][50] memory inputMap) {
        map = inputMap;
    }

    // call this function to get the answer
    function calculateTrailheadScores() external view returns (uint256 score) {
        for (uint256 i = 0; i < rows; i++) {
            for (uint256 j = 0; j < cols; j++) {
                // calculate the score for each trailhead (0) on the map
                if (map[i][j] == 0) {
                    score = calculateScoreForTrailhead(uint8(i), uint8(j));
                }
            }
        }
    }
}

A few notes:

constructor is a special function that gets called when a contract is deployed on the blockchain. It can take arguments, allowing us to make this contract a general Advent of Code solution.
I used the smallest signed and unsigned integer types, int8 and uint8, because the numbers in map go from 0 to 9. Everywhere else, I used uint256, as that’s the default.⁵

The real work gets done in calculateScoreForTrailhead, which uses a stack to exhaustively check each position on the trail for viable next steps.

    function calculateScoreForTrailhead(
        uint8 startRow,
        uint8 startCol
    ) internal view returns (uint256 score) {
        // init some vars
        bool[50][50] memory visited;
        Position[] memory stack = new Position[](uint256(rows) * uint256(cols));
        // uint8s in the above would overflow, causing a revert
        uint256 stackSize = 0;
        int8 x;
        int8 y;

        // push the start position onto the stack
        stack[stackSize++] = Position(startRow, startCol, 0);
        visited[startRow][startCol] = true;


        while (stackSize > 0) {
            // pop the top position off the stack
            Position memory current = stack[--stackSize];

            if (current.height == 9) {
                // we've reached the end of the trail
                score++;
                continue;
            }

            // check all 4 directions
            for (uint256 d = 0; d < 4; d++) {
                y = int8(current.row) + int8(rowDirections[d]);
                x = int8(current.col) + int8(colDirections[d]);

                // skip if out of bounds
                if (!(y >= 0 && newRow < int8(rows)
                      && x >= 0 && newCol < int8(cols))) continue;

                // if it's a new position at a reachable height, push it onto the stack
                if (!visited[uint8(y)][uint8(x)] &&
                    map[uint8(y)][uint8(x)] == current.height + 1
                ) {
                    visited[uint8(y)][uint8(x)] = true;
                    stack[stackSize++] = Position(
                        uint8(y),
                        uint8(x),
                        current.height + 1);
                }
            }
        }
    }

To fulfill my goal of reading the input file natively wherever possible, I wrote the following Forge test to execute the solution:

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.13;

import {Test, console} from "forge-std/Test.sol";
import {HikingTrail} from "../src/HikingTrail.sol";

contract HikingTrailTest is Test {
    HikingTrail public trail;

    function parseInput(string memory input) public pure returns (uint8[50][50] memory) {
        bytes memory inputBytes = bytes(input);
        uint8[50][50] memory map;

        uint8 x = 0;
        uint8 y = 0;

        for (uint256 i = 0; i < inputBytes.length; i++) {
            uint8 c = uint8(inputBytes[i]);
            if (c >= 48 && c <= 57) { // digit
                c -= 48;
                map[x][y] = c;
                x++;
            }
            else if (c == 10) { // newline
                x = 0;
                y++;
            }
        }

        return map;
    }

    function setUp() public {
        string memory input = vm.readFile("input.txt");
        uint8[50][50] memory map = parseInput(input);

        trail = new HikingTrail(map);
    }

    function test_calculateTrailheadScores() public {
        uint256 score = trail.calculateTrailheadScores();
        console.log("Score:", score);
    }
}

This required adding below line to the project’s foundry.toml, as Solidity code isn’t usually expected to access random files:

fs_permissions = [{ access = "read", path = "input.txt" }]

I could then execute the solution with forge test -vv. This spins up a local EVM blockchain, deploys HikingTrail and HikingTrailTest, and runs all the functions in the latter starting with test (also the setUp function).

The amount of processing needed quickly ran into EVM memory limits. To get around that, I rewrote calculateTrailheadScores as a batch function, which stores the last processed row and column. As calculating scores now involved writing to contract storage, it would no longer be a gas-free action for end users. If I were optimising for that, I would have made the batching take a start position. Or not written this code in Solidity.

    function calculateTrailheadScoresBatch(uint8 batchSize) public {
        for (uint256 i = lastProcessedRow; i < rows; i++) {
            for (uint256 j = (i == lastProcessedRow ? lastProcessedCol : 0); 
                j < cols; j++) {
                if (map[i][j] == 0) {
                    totalScore += calculateScoreForTrailhead(uint8(i), uint8(j));
                }
                if (--batchSize == 0) {
                    lastProcessedRow = uint8(i);
                    lastProcessedCol = uint8(j) + 1;
                    if (lastProcessedCol == cols) {
                        lastProcessedRow++;
                        lastProcessedCol = 0;
                    }
                    return;
                }
            }
        }
        lastProcessedRow = 0;
        lastProcessedCol = 0;
    }

The new test function looked like this:

    function test_calculateTrailheadScores() public {
        while (trail.lastProcessedRow() < 50) {
            trail.calculateTrailheadScoresBatch(100); // don't change this
        }
        console.log("Score:", trail.totalScore());
    }

Most other batch sizes still led to a MemoryOOG error, but 100 managed to get the right answer.

The second part of the challenge involved finding the trailhead rating, which was the number of unique trails it branched into. This was actually simpler than the first part. I used essentially the same code, minus the previously visited check.

    function calculateRatingForTrailhead(
        uint8 startRow,
        uint8 startCol
    ) public view returns (uint256 rating) {
        // init some vars
        Position[] memory stack = new Position[](uint256(rows) * uint256(cols));
        uint256 stackSize = 0;
        int8 x;
        int8 y;

        // push the start position onto the stack
        stack[stackSize++] = Position(startRow, startCol, 0);

        while (stackSize > 0) {
            // pop the top position off the stack
            Position memory current = stack[--stackSize];

            if (current.height == 9) {
                // we've reached the end of the trail
                rating++;
                continue;
            }

            // check all 4 directions
            for (uint256 d = 0; d < 4; d++) {
                y = int8(current.row) + int8(rowDirections[d]);
                x = int8(current.col) + int8(colDirections[d]);

                // skip if out of bounds
                if (!(y >= 0 && y < int8(rows)
                    && x >= 0 && x < int8(cols))) continue;

                // if it's at a reachable height, push it onto the stack
                if (map[uint8(y)][uint8(x)] == current.height + 1) {
                    stack[stackSize++] = Position(
                        uint8(y),
                        uint8(x),
                        current.height + 1
                    );
                }
            }
        }
    }

The same batching approach used with the score was required to calculate the rating.

As I briefly mentioned earlier, gas is the unit of cost for running code on the Ethereum blockchain. Each instruction, such as a variable read, a loop, or an assignment, has an associated gas cost. The price of gas fluctuates depending on how congested the network is. Right now, it would cost around $120 (US) to deploy HikingTrail to Ethereum mainnet. To actually calculate the answers would cost this much:

Solution	Gas	Estimated USD Cost
Part 1	1065457787	$28 400
Part 2	731364166	$19 500

Full code on GitHub.

Closing thoughts: As much as Solidity is not designed for this sort of thing, it was pretty simple to get this solution working. Solidity developers prize readability and efficiency,⁶ so you won’t see a lot of weird functional programming-style trickery. Its syntax is mercifully free of weird sigils (a relief after J). In many other languages, the memory and storage keywords would probably be % and $ or something equally obscure.

Next up, Days 11–15. I make no promises about having it up before Christmas of this or any other year.

The check new arrivals rule to be specific. Advanced use of Inform often involves naming and specifically ordering rules. ↩︎
Inform also supports compiling to the Z-Machine, the virtual machine used by Infocom in the 1980s. I did not attempt to do this with my 16 083-room source. ↩︎
In Lisp, the functions for getting the head and tail of a list are traditionally car and cdr, but Racket lets you do first and rest. ↩︎
i. is often used with numbers to make lists of arbitrary length. For example, i.5 produces 0 1 2 3 4. ↩︎
Solidity is primarily designed for handling monetary units in cryptocurrencies, many of which have 8 to 18 decimal places. To avoid the pitfalls of floating point maths, Solidity only has integer types, and so must represent most values as very big numbers. ↩︎
The practice of trying to make functions as gas-efficient as possible is called gas golfing. ↩︎

Reply via email

Advent of Code 2024: Days 1–5

Mon, 16 Dec 2024 19:43:44 +0200

Advent of Code (AoC) is a yearly programming event. On each day of the advent leading up Christmas, a new challenge is released. Each challenge consists of a problem description and an input file. Challenges generally involve processing the input file to come up with a particular output, which serves as the solution. Each challenge consists of two parts, with the second part being revealed upon completion of the first. Both parts use the same input, which is slightly different for each participant.

To give a trivial example, the first part of a challenge could be to count the number of times the word “Christmas” appears in a text file and the second part could be to count the number of lines in the file which contain the word “Christmas”. In both instances, the count serves as the challenge solution.

Because all of the challenges are designed to provide a short answer string, you can solve them in whatever way you choose, pursuing other goals along the way. You might choose to do them all in your favourite language and see how concise and/or fast you can make the solution. Or maybe you want to use the challenges to help you learn a new language. You could do each challenge in a different language. It’s even conceivable that you could print the challenge input files and solve them with a notebook and pen.

For this year’s event, I decided it would be fun to try each one in a different language. Some of the languages I’ve chosen are ordinary, mainstream programming languages that I’ve used in the past or want to use more in the future. Others are domain-specific and/or idiosyncratic languages that may not be obvious (or good) choices for the challenges I’ve applied them to – though I’ve not gone so far as to torture myself with Brainfuck or Malbolge.

With the constraint of using each language only once, and a bias for languages I know, I’ve either gone for the language that best fits the problem, or one where the problem won’t be overly painful to solve. I’ve used AI here and there to assist with the languages I know less well, but not by prompting it with the puzzle text wholesale. Part of the aim of this exercise is for me to gauge how good common AI models are at different languages.

# Day 1: Inform 7

Challenge: find the sum of diffs between two lists.

Inform 7 is a programming language designed to closely resemble normal human prose and used for making text adventure games (interactive fiction if you’re feeling literary). I’ve written about it previously here. I chose it for the first day because it is highly unsuited to the sort of programming required for these challenges, so it seemed like a good idea to get it out of the way for an easy one.

The kind of thing you’re suppose to make with Inform.

I thought I might be the first person to attempt an AoC challenge in Inform, but as it turns out someone else beat me to it, encountering pain points such as having to implement arithmetic operations for strings because Inform only offers up to 32-bit signed integers.

This challenge was a lot simpler, though it still presented its share of snarls. The input data consisted of two columns of numbers, so I decided to represent it with a table. Initially, I wanted to populate the table by reading from the input file, but the input file was too large for this to work. I considered a few different ways to solve this, but ultimately went the easy route and copied the input into a manually defined table, replacing the three-space column separators with tabs.

"Historian Hysteria Solution" by "David Yates"

[Inform games need at least one room to compile.]
The Chief Historian's Office is a room. "Elves run amuck looking for location IDs."

Chapter - Inputs

[Really more of an appendix]

Table of Location IDs
First (a number)	Second (a number)
123 456
124 457
[etc... (not my real input values)]

With the input in place, it was time to begin solving the puzzle, which required me to sort the columns, get the distance between each row, and add that all together. As the columns had to sorted separately, it soon became apparent that a table was not the right data structure for this problem.¹ So I extracted each column into a list and sorted both lists.

Chapter 1 - Definitions

List 1 is a list of numbers that varies.

List 2 is a list of numbers that varies.

When play begins:
	repeat with N running from 1 to the number of rows in the Table of Location IDs:
		add first in row N of the Table of Location IDs to List 1;
	repeat with M running from 1 to the number of rows in the Table of Location IDs:
		add second in row M of the Table of Location IDs to List 2;
	sort List 1;
	sort List 2.

Next, I would need to get the distance between each value in both lists by subtracting the smaller number from the larger number. I wrote a deciding phrase (basically a function with a return value) for this:

To decide what number is the distance between (N - number) and (M - number):
	if M is greater than N:
		decide on M minus N;
	decide on N minus M

i.e.

function distance(int n, int m) {
    if m > n {
        return m - n;
    }
    return n - m;
}

Inform allows you to use > instead of greater than and - instead of minus, but I felt like spelling everything out was more in the spirit of things.

To get the final answer, I will need to sum the contents of a list. This will require another deciding phrase, which we can use in a reduction later.

To decide what number is the sum of (N - number) and (M - number)
	(this is summing):
	decide on M plus N.

With all the components of the solution now in place, I implemented it in the next section of the code:

Chapter 2 - The Answer

The Chief Historian's Office is a room. "Elves run amuck looking for location IDs."

[Type Z in game to trigger this.]
Instead of waiting:
	say "The elves diff the lists...[line break]";
	let diffs be a list of numbers;
        [construct diffs list]
	repeat with X running from 1 to the number of entries in List 1:
		add the distance between entry X of List 1 and entry X of List 2 to diffs;
	say "The elves add it all together...[line break]";
        [reduce diffs list to its sum]
	let answer be the summing reduction of diffs;
	say "The elves tell you that the answer is [answer]."

To get the solution, I compiled and ran the game and entered z into the parser. As if to further confirm Inform’s unsuitability for such a task (and perhaps the inefficiency of my code), getting to the answer took quite some time even on my reasonably powerful PC.

The second part of the challenge required counting the times each value in the first list appeared in the second list. After spending some time with filters and reductions, and then some more time with manual list traversal in the name of efficiency, I threw up my hands and nested a couple of repeat loops.

[Type YES in game to trigger this.]
Instead of saying yes:
	let similarity be 0;
	repeat with N running through List 1:
		let count be 0;
		repeat with M running through List 2:
			if N is M:
				increment count;
			if M is greater than N: 
				break; [we know the list is sorted; a sop to efficiency]
		now similarity is similarity plus N times count;
	say "The elves tell you the answer is [similarity]."

It wasn’t quick, but it got the right answer.

Full code on GitHub

Closing thoughts: A more complete solution would read the input file and use custom parsing and casting to construct the two lists. I decided not to do this because I didn’t want to spend more time writing code for reading files than writing code for solving the actual challenge. In retrospect, this wasn’t the most interesting challenge to do with Inform – my code ended up being pretty boring imperative loops and conditionals, just spelled out.

While Inform 7 may look like English prose, it has quite strict syntax. Especially in the age of AI prompting, it’s very easy to get loose and start writing words Inform doesn’t expect or understand – for example, it took me a bit of time to remember that the syntax for setting a variable is now the VAR is VALUE rather than set VAR to VALUE. When prompted for Inform code, AI is liable to make the same mistake and mix in regular English prose, making it fairly useless for this language in particular.

# Day 2: Haskell

Challenge: find which lists increase or decrease smoothly.

I first encountered Haskell in a third-year Computer Science functional programming course and really enjoyed using it. The sheer amount of functionality you could pack into a line or two of code amazed me. Since then, I haven’t found much occasion to return to it, apart from when I went through Bruce Tate’s Seven Programming Languages in Seven Weeks a few years ago.

This puzzle involved a bunch of lists, so I chose Haskell because I remembered it being good at working with lists. Skimming through the Seven Languages book helped me get back up to speed, as did pair-programming with my good buddy Claude 3.5. The main thing that makes Haskell counterintuitive is that you have to write functions right to left most of the time.

The core of the problem was checking whether lists were ascending or descending, and checking that the differences between subsequent entries were within a given range. For the first part, we can do this:

-- check each pair of values is in asc order
-- produce a list of bool results
-- then AND that list
-- read right to left
isAscending :: [Int] -> Bool
isAscending xs = and $ zipWith (<=) xs (tail xs)

-- check each pair of values is in desc order
-- produce a list of bool results
-- then AND that list
-- read right to left
isDescending :: [Int] -> Bool
isDescending xs = and $ zipWith (>=) xs (tail xs)

For the second, we can do this:

validDistance :: [Int] -> Bool
validDistance xs = and $ map isValidDist $ zip xs (tail xs) -- check each pair in list
  where isValidDist (a, b) = dist a b >= 1 && dist a b <= 3 -- require dist between 1 & 3
        dist x y = abs (x - y) -- check dist by getting the absolute value of x-y

You can think of where as a way to define inner functions, similar to nesting defs in Python. With this core logic in place, we can do a bunch of file input and parsing and then get our solution.

The second part of the puzzle asks you to recompute the number of valid lists with an added tolerance for a single mistake in each. So we can just loop through each list, removing one value at a time and checking if that makes it valid.

Full code on GitHub

Closing thoughts: Haskell remains the language where I feel most like I’m writing horizontally rather than vertically. The syntax is quite funky and it’s a bit brain-bending to get back into after not looking at for a few years. I leaned on the AI a fair bit for bug-fixing and code explanations.

# Day 3: Ruby

Challenge: parse valid instructions out of corrupted text.

This seemed like a regex problem, so I grabbed Ruby, the language I know with the best regex. I’ve used Ruby quite a lot, so this problem was very quick and easy compared to the last couple of days.

input = File.read('input.txt')
matches = input.scan(/mul\((\d{1,3}),(\d{1,3})\)/)
puts matches.sum { |first, second| first.to_i * second.to_i }

The second part required some simple context-aware parsing, which for the sake of my sanity I did not attempt to do with look-behinds.

input = File.read('input.txt')
lines = input.split(/(do\(\))|(don't\(\))/)

valid_lines = []
deleting = false
lines.each do |line|
    deleting = line.match(/^don't\(\)$/) or (deleting and !line.match(/^do\(\)$/))
    valid_lines << line unless deleting
end

matches = valid_lines.join.scan(/mul\((\d{1,3}),(\d{1,3})\)/)
puts matches.sum { |first, second| first.to_i * second.to_i }

Full code (also) on GitHub

Closing thoughts: If I was going for time or code succinctness, I would do every challenge in Ruby.

# Day 4: GML

Challenge: find all instances of XMAS in a wordsearch.

For this challenge, I returned to my roots. The Game Maker Language (GML), a C-like language designed for use in the 2D game development tool GameMaker, was the first programming language I ever wrote significant code in, and so it holds a special place in my heart. I wrote about it previously here.

The kind of thing you’re suppose to make with GameMaker.

I chose GML for this challenge specifically because it involves a grid of letters, and GML has a grid data structure. This data structure was introduced somewhere around Game Maker 6.1 or 7,² and I remember thinking it seemed really cool and useful but never actually having a practical use for it.

GML has changed somewhat since I last wrote about it – it now has functions with named parameters. Previously, all functions (or scripts) took up to 16 arguments and you’d have to reference them as argument0–argument15. Other than that, it seems mostly backwards compatible with the code I wrote almost twenty years ago.

To solve the puzzle, I created a new game with one room and one script. I invoked the script in the room’s creation code, and disabled GM’s sandbox so I could access arbitrary files.³

My script starts by prompting the user for the input file and then reads it into an array line-by-line:

function aoc4(){
	// prompt for input file
	var filename, file; // semicolons are only strictly necessary after var declarations

    switch (os_type)
    { // in my GML days I always put open braces on their own lines
      // I also really loved switch-cases
      case os_windows: filename = get_open_filename("text file|*.txt", ""); break
      default: filename = get_string("Please specify the full path of your input file:", "")
    }

	if (filename != "")
	{ 
	    file = file_text_open_read(filename)
	}

	// read lines
	var n = 0;
	while (!file_text_eof(file))
	{
		lines[n++] = file_text_readln(file)
	}
}

Next, we assemble a grid data structure from our array of lines:

	// assemble grid
	wordsearch = ds_grid_create(string_length(lines[0]), array_length(lines))
	for (var i = 0; i < ds_grid_height(wordsearch); i++)
	{
		for (var j = 0; j < ds_grid_width(wordsearch); j++)
		{
			wordsearch[# i, j] = string_char_at(lines[i], j+1)
		}
	}

A few notes:

I forgot to declare wordsearch with var here, which, due to GM’s lax approach to scoping, would turn it into an instance variable of whatever calls the script. Not quite sure what happens here, as it’s called in the room creation code.
In older versions of GM, data structures did not have accessor syntax, so the line in the innermost for loop here would have to have been ds_grid_set(wordsearch, i, j, string_char_at(lines[i], j+1). Retrieval would have been done with ds_grid_get.
Some languages do 1-indexing and other languages do 0-indexing. GML does 0-indexing… except for strings, which are 1-indexed. Hence j+1 in string_char_at.
I initially booted into Windows to complete this challenge, but then I found out that modern GameMaker has an Ubuntu beta, which is also possible to get running on Arch. The function that brings up a filepicker only works on Windows though, so I had to fall back to text path input.

Next, we have to find occurrences of the word XMAS in the grid. The solution must account for horizontal, vertical and diagonal words, both forwards and backwards. For this, I used a nested for loop and a whole lot of if statements.

	// find xmases
	var	xmas_count = 0;
	for (var v = 0; v < ds_grid_height(wordsearch); v++)
	{
		for (var h = 0; h < ds_grid_width(wordsearch); h++)
		{
			if (wordsearch[# h, v] = "X") // = and == are equivalent in GML
			{
				// east
				if (h < ds_grid_width(wordsearch) - 3)
				and (wordsearch[# h + 1, v] = "M")
				and (wordsearch[# h + 2, v] = "A")
				and (wordsearch[# h + 3, v] = "S")
					xmas_count++
				// west
				if (h >= 3)
				and (wordsearch[# h - 1, v] = "M")
				and (wordsearch[# h - 2, v] = "A")
				and (wordsearch[# h - 3, v] = "S")
					xmas_count++
				// south
				if (v < ds_grid_height(wordsearch) - 3)
				and (wordsearch[# h, v + 1] = "M")
				and (wordsearch[# h, v + 2] = "A")
				and (wordsearch[# h, v + 3] = "S")
					xmas_count++
				// north
				if (v >= 3)
				and (wordsearch[# h, v - 1] = "M")
				and (wordsearch[# h, v - 2] = "A")
				and (wordsearch[# h, v - 3] = "S")
					xmas_count++

				// southeast
				if (h < ds_grid_width(wordsearch) - 3)
                                and (v < ds_grid_height(wordsearch) - 3)
				and (wordsearch[# h + 1, v + 1] = "M")
				and (wordsearch[# h + 2, v + 2] = "A")
				and (wordsearch[# h + 3, v + 3] = "S")
					xmas_count++

				// southwest
				if (h >= 3) and (v < ds_grid_height(wordsearch) - 3)
				and (wordsearch[# h - 1, v + 1] = "M")
				and (wordsearch[# h - 2, v + 2] = "A")
				and (wordsearch[# h - 3, v + 3] = "S")
					xmas_count++

				// northeast
				if (h < ds_grid_width(wordsearch) - 3) and (v >= 3)
				and (wordsearch[# h + 1, v - 1] = "M")
				and (wordsearch[# h + 2, v - 2] = "A")
				and (wordsearch[# h + 3, v - 3] = "S")
					xmas_count++

				// northwest
				if (h >= 3) and (v >= 3)
				and (wordsearch[# h - 1, v - 1] = "M")
				and (wordsearch[# h - 2, v - 2] = "A")
				and (wordsearch[# h - 3, v - 3] = "S")
					xmas_count++
			}
		}
	}

	// show result
	show_message_async("XMASs: " + string(xmas_count))

As noted, GML is one of the few languages that allows you to make comparisons with a single =, the same as assignment. This is discouraged, but it’s how I wrote code when I was 16, along with the nasty ifs and nested fors.

The second part of the challenge involved finding Xs of the word MAS, e.g.

M-S M-M
-A- -A-
M-S S-S ...

I did this with some more ugly ifs:

	// find x-mases
	var x_mas_count = 0;
	for (v = 0; v < ds_grid_height(wordsearch); v++)
	{
		for (h = 0; h < ds_grid_width(wordsearch); h++)
		{
			if (wordsearch[# h, v] = "M")
			{
				if (h < ds_grid_width(wordsearch) - 2)
                                and (v < ds_grid_height(wordsearch) - 2)
				and (wordsearch[# h + 2, v] = "M")
				and (wordsearch[# h + 1, v + 1] = "A")
				and (wordsearch[# h, v + 2] = "S")
				and (wordsearch[# h + 2, v + 2] = "S")
					x_mas_count++
				else if (h < ds_grid_width(wordsearch) - 2)
                                and (v < ds_grid_height(wordsearch) - 2)
				and (wordsearch[# h + 2, v] = "S")
				and (wordsearch[# h + 1, v + 1] = "A")
				and (wordsearch[# h, v + 2] = "M")
				and (wordsearch[# h + 2, v + 2] = "S")
					x_mas_count++
			}
			else if (wordsearch[# h, v] = "S")
			{
				if (h < ds_grid_width(wordsearch) - 2)
                                and (v < ds_grid_height(wordsearch) - 2)
				and (wordsearch[# h + 2, v] = "M")
				and (wordsearch[# h + 1, v + 1] = "A")
				and (wordsearch[# h, v + 2] = "S")
				and (wordsearch[# h + 2, v + 2] = "M")
					x_mas_count++
				else if (h < ds_grid_width(wordsearch) - 2)
                                and (v < ds_grid_height(wordsearch) - 2)
				and (wordsearch[# h + 2, v] = "S")
				and (wordsearch[# h + 1, v + 1] = "A")
				and (wordsearch[# h, v + 2] = "M")
				and (wordsearch[# h + 2, v + 2] = "M")
					x_mas_count++
			}
		}
	}

	// show result
	show_message_async("X-MASs: " + string(x_mas_count))
}

Full code on GitHub

Closing thoughts: Not too painful. While a more expressive language would have made the solutions shorter and more elegant, this code has the virtue of at least being pretty obvious. When I was first writing GML, I did it without access to the Internet, using just the helpfile. I have enough of GML etched into the deep recesses of my skull that I didn’t feel the need to ask the AI anything, or even look at StackOverflow.

# Day 5: GNU Prolog

Challenge: find which lists adhere to all of the ordering rules.

This was clearly a constraints problem, so Prolog seemed like the right choice. I’ve only used Prolog once before, when going through Bruce Tate’s Seven Programming Languages in Seven Weeks book mentioned above. It is not an imperative or functional language, but a logic language – its closest sibling in wide use is probably SQL. Apart from that, its list handling and function definition syntax have some similarities with Haskell – you deal with lists through patterns, and can define multiple versions of the same function with different pattern inputs.

The core of the puzzle is about determining whether one element precedes another in a list. Lists containing the two elements in the correct order are valid, and so are lists containing only one of the two elements. So I started by encoding this rule into Prolog (thanks to this answer on StackOverflow):

precedes(X, Y, L):-
    \+ member(X, L); % either X must not (\+) be in L
    \+ member(Y, L). % or Y must not be in L
    (append(_, [X|Tail], L), % or find X and everything after it (Tail)
    append(_, [Y|_], Tail)); % then find Y in Tail, ergo X precedes Y

From there, I had the AI help me do the input file reading and text processing to get all the rules and lists I would need. This was annoying enough that at one stage I considered just writing a program in a different language that would output a Prolog file with the input hard-coded in. But as I often find with AI, I get much better results by prompting for each part of the code individually than by trying to prompt everything at once. Once I broke the problem down enough, Claude gave me the processing code I wanted.

I then ran every rule against every list. My code was highly inefficient, checking unnecessary rules and continuously reparsing the same data, and so it failed to complete in a reasonable amount of time. With the help of the AI, I optimised out some of its most obvious deficiencies and got it working. To start with, precedes was rewritten like this:

precedes(X, Y, L):-
    (   memberchk(X, L), % find only the first X in L (member tries to find all Xs)
        memberchk(Y, L) -> % find only the first Y in L
        once((append(_, [X|Tail], L), memberchk(Y, Tail))) % find only the first Y in Tail
    ;   true  % succeed if either X or Y is not in L
    ).

The second part of the puzzle required fixing the invalid lists. Again, my first stab at a solution – testing all rules against all list permutations – was too inefficient to complete in a reasonable amount of time (obviously). A more deliberate approach of finding individual rule violations and then swapping the elements involved worked much better.

fix_invalid_lists(Rules, InvalidLists, FixedLists) :-
    findall(FixedList,
            (member(List, InvalidLists),
             fix_list(Rules, List, FixedList)),
            FixedLists).

fix_list(Rules, List, FixedList) :-
    find_violation(Rules, List, X, Y),
    !,  % Cut to make sure we stop after finding one violation
    swap_elements(List, X, Y, NewList),
    fix_list(Rules, NewList, FixedList).
fix_list(_, List, List).  % No violations found, list is fixed

find_violation(Rules, List, X, Y) :-
    member(Rule, Rules),
    parse_rule(Rule, X, Y),
    \+ precedes(X, Y, List).

swap_elements(List, X, Y, NewList) :-
    append(Before, [A|Rest1], List), % split list at 1st element
    append(Middle, [B|After], Rest1), % split rest of list at 2nd element
    ((A = X, B = Y) ; (A = Y, B = X)), % match 1st and 2nd elements
    !,  % Cut to prevent multiple solutions
    append(Before, [B|Middle], Temp), % construct new list
    append(Temp, [A|After], NewList).

! here is the cut operator, which we use to force Prolog to commit to fixing the first violation it finds in fix_list and swapping the first instances of X and Y it finds in List in swap_elements. This forces our code to fix violations one at a time with a single swap rather than looking for multiple violations and multiple solutions to each one.

Full code on GitHub

Closing thoughts: The AI had to help me quite a lot with this one, which wasn’t totally smooth sailing, because it very quickly gets confused between different versions of Prolog, and I had to keep reminding it I was using GNU Prolog rather than SWI-Prolog. Next time I’ll probably just bite the bullet and use SWI-Prolog instead.

This was quite a difficult language to use for a single challenge. I’d previously used Prolog mostly on the level of toy demos like this:

likes(alice, bob).
likes(bob, carol).

friend(X, Y) :- \+(X = Y), likes(X, Z), likes(Y,Z).

Needless to say, the differences between this sort of thing and a program that has to operate on large amounts of data are legion. One reason is that Prolog will usually try to find every solution that fits a set of constraints, and must be explicitly told to settle for just the first solution (hence once and !).

This seems to be the general approach to Prolog programming: first write a program that finds every solution for every problem, then prune the ones you don’t care about. For very small toy problems, you may never arrive at this second step, but AoC challenges, which I get the sense are designed to thwart brute-forcing, really reward efficiency.

I’ll cover Days 6–10 in the next part. Highlights include breaking my main rule for this exercise on day 6.

I would still say it was better to start with a table than try to read in and parse a text file of numbers in a language with no obvious way to cast strings to numbers. ↩︎
Having been around since 1999, GameMaker has gone through numerous rewrites, rebrands and version numbering resets. Originally it was Animo, then Game Maker, through versions 1 to 8.1. It was then rewritten and rebranded to GameMaker: Studio, which had versions 1 through 1.4. A second massive rewrite came in form of GameMaker Studio 2, which later adopted a rolling release cycle and rebranded as just GameMaker. ↩︎
The sandbox was introduced in Studio. Previous versions of GM included functionality for accessing disk drives. Technically, it’s not necessary if you use the filepicker function, which I did on Windows, but it is necessary to access files by path as I had to on Linux. ↩︎

Reply via email

Image upload with render hooks

Sat, 19 Oct 2024 06:41:36 +0200

Something I still miss about the old, Ghost-powered version of this blog is the ease of adding images to posts. My favourite thing about Ghost was its two-paned markdown editor, which handled image uploads like this:

In the editor pane, type the Markdown syntax for displaying an image but do not specify a path, i.e. ![]().
In the preview pane, this empty image syntax produces a box onto which you can drag an image file.
Once dragged, the image file will be uploaded and its location filled in, producing something like ![](/content/images/2024/10/myimage.png).

My Hugo blog, being a static site, does not have a post editor. I write posts in a text editor and must manually place any images I want to include somewhere in Hugo’s static directory (or a page bundle, but I’ve never gotten into the habit of using those). This is not by any means a huge burden, but it requires fiddling with file managers and/or terminal and thus takes me out of the writing flow.

I’ve tried a few different static site CMSs, from self-hosted like Netlify CMS (latterly Decap) to online services like Forestry (since discontinued in favour of TinaCMS) and even editor plugins like Front Matter for VS Code. While I appreciate the cleverness of these solutions, none of them really stuck. Ultimately, they all had too many features that I didn’t need and required me to write posts in something other than Vim, which I could never get into the habit of doing – especially when the alternative presented was a slightly souped-up </code>. And in any case, I didn’t have a great need for the other CMS stuff – I’m happy typing <code>hugo new</code> instead of clicking a button that says New Post and I’ll gladly fill in the TOML frontmatter without the help of drop-downs or toggle switches. But I did want a more integrated way to upload images. In my <a href="/2024/10/06/deprecating-shortcodes/">first post in this series</a>, I talked about how Hugo’s new <a href="https://gohugo.io/render-hooks/introduction/">Markdown render hooks</a> allowed me to replace a whole bunch of common shortcodes with slightly more complex Markdown. Shortly after writing that, I started wondering if I could use render hooks to do more complex things, like syntax highlighting for languages not supported by Hugo’s built-in highlighter Chroma. Answer: <a href="/2024/10/18/render-hook-syntax-highlighting/">it could, sort of</a>. With that accomplished, I wanted to take things further still. Could I use an image render hook to enable Ghost-style image uploads? <figure> <figcaption> The answer is yes, and it’s only a little bit convoluted. </figcaption> </figure> Here’s what I came up with: Image uploading in action</video> How it works: <ol> <li>My theme’s image render hook includes some code to show an upload box if a given image’s path is empty and Hugo is running in the <a href="https://gohugo.io/functions/hugo/environment/">development environment</a>.</li> <li>My theme’s <a href="https://gohugo.io/templates/base/#define-the-base-template">base template</a> includes some JavaScript for handling image dragging and uploading, only included if Hugo is running in the development environment.</li> <li>I’ve written a basic NodeJS app that lives in my site’s base directory, and which receives image uploads and saves them to an appropriate directory (<code>static/content/images/YYYY/MM/imagename.jpg</code>).</li> <li>After a successful upload, the uploader JavaScript replaces the box with the image and copies the image’s path to the clipboard. The included image is shown partially transparent to indicate that is temporary, and I still need to modify the post Markdown.</li> <li>I paste the image path into my post Markdown. When the post is saved, Hugo detects a change and reloads the page.</li> </ol> The only appreciable different between this and my old Ghost editor is that filling in the path requires a manual paste step. If anything, that gives it more flexibility. To avoid having to start up two different webservers whenever I want to edit my blog, I also wrote a Bash script, <code>blog.sh</code>, which runs both <code>hugo server</code> and <code>nodejs uploader.js</code>. Any arguments passed to the script are forwarded to the former command, so I can still use <code>--buildDrafts</code> and <code>--buildFuture</code>. I’ve made the code available in <a href="https://github.com/dmyates/hugo-image-uploader">this repository</a>. I’ve done my best to package it as a <a href="https://gohugo.io/hugo-modules/theme-components/">theme component</a>, though it doesn’t fit perfectly into that frame. I haven’t done much to make it very general or user-friendly, as it is designed first and foremost for my own use. The following additional features might be nice to have: <ul> <li>Visual feedback to confirm that the image path has been copied to the clipboard.</li> <li>Support for saving images to page bundles.</li> <li>Processing of image uploads (compression, conversion, etc).</li> <li>Image uploads from a file picker.</li> </ul> If you’d like to be able to upload more than just images, you should be able to do it by removing the content validation in <code>uploader.js</code>. That’s it for render hooks, at least until Hugo releases footnote hook support. <a href="mailto:?subject=RE: Image%20upload%20with%20render%20hooks">Reply via email</a> </article> <article> <h1>Syntax highlighting with render hooks</h1> Fri, 18 Oct 2024 18:41:36 +0200 <aside class="hint-box alert-update"> Update 2024-12-22 I’ve disabled the render hooks detailed here because they made the site take 20 times longer to build, in addition to their many other shortcomings. Therefore, the code they supported on this page and others will no longer be highlighted (until I find another solution). I would not recommend taking this approach to syntax highlighting. </aside> After <a href="/2024/10/06/deprecating-shortcodes/">my previous post</a>, I got to wondering if there were any other ways I could use Markdown render hooks to improve this blog’s appearance and editing experience. One thing that immediately came to mind was syntax highlighting for obscure languages. Hugo uses <a href="https://github.com/alecthomas/chroma">Chroma</a> for syntax highlighting, which supports <a href="https://gohugo.io/content-management/syntax-highlighting/#list-of-chroma-highlighting-languages">these languages</a>. Support is quite comprehensive, even including Go HTML Templating syntax and <a href="https://web.archive.org/web/20241006233811/http://www.codersnotes.com/notes/a-constructive-look-at-templeos/">Holy C</a>. But it doesn’t support everything. If you need highlighting for another language, you’ll need to implement it in a PR to Chroma and wait for it to get merged, and then incorporated into Hugo. Or at least, that is the proper way to do it. The hacky way to do it is via <a href="https://gohugo.io/render-hooks/code-blocks/">code block render hooks</a> and liberal use of <a href="https://gohugo.io/functions/strings/replacere/">replaceRE</a>. This may be more appropriate if: <ol> <li>The language you want highlighting for is one of your own making, maybe not even intended for public release.</li> <li>The snippets you want to highlight are simple and do not require a fully functional lexer.</li> <li>You just want a local solution that doesn’t rely on getting PRs approved and waiting for software updates.</li> </ol> As Hugo supports <a href="https://gohugo.io/render-hooks/code-blocks/#examples">language-specific codeblock render hooks</a>, it almost feels as though it’s encouraging you to do this. I’ve written two: one for Gruescript<a href="https://davidyat.es/2024/10/18/render-hook-syntax-highlighting/#fn:1" class="footnote-ref" role="doc-noteref">1</a>, used mainly in <a href="/2023/05/20/postmortem-ludum-dare-53/#game-code">this post</a>: <pre tabindex="0"><code class="language-gruescript" data-lang="gruescript"># A ROOM room before_barricade You're standing in front of an enormous barricade made of junk room. prop display Before the junk barricade prop year Unknown year tags start # AN OBJECT thing sword Courier's Blade desc This is the first job you've had where a massive sword is considered required equipment. carried tags portable # A PERSON thing fletcher shifty character name shifty character desc The man looks to be in his sixties, with a white beard and a mane of grey hair. tags alive male conversation loc wooden_shed prop callme Fletcher prop start_conversation "You must be the courier," says the man, looking you up and down. "Name's Fletcher." Your collection tag mentions a Darius Fletcher – this must be him. prop end_conversation "G'bye then." </code></pre>…and one for Inform 7<a href="https://davidyat.es/2024/10/18/render-hook-syntax-highlighting/#fn:2" class="footnote-ref" role="doc-noteref">2</a>, used mainly in <a href="/2019/10/03/programming-in-inform-7/">this post</a>: <pre tabindex="0"><code class="language-inform7" data-lang="inform7">"The Lady and the Tiger" The lady is on top of the tiger. The tiger is in a room. "This lady hails from Niger Niamey, I presume." The lady wears a smile. Her arm is in a sling. Riding is an action applying to one thing. "The lady on the tiger Up and down they sped." After the lady rides the tiger Now it wears the smile instead. </code></pre>The approach I took was as follows: first, start with a render hook that replicates the HTML of a Chroma-highlighted codeblock (in <code>layouts/_default/_markup/render-codeblock-mylang.html</code>): <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go-html-template" data-lang="go-html-template">{{ $code := .Inner }} {{ $highlightedCode := $code }} <div class="highlight"> <pre tabindex="0" class="chroma"> <code class="language-MYLANG" data-lang="MYLANG"> {{ $highlightedCode | safeHTML }} </code> </pre> </div> </code></pre></div>Next, add a bunch of <code>replaceRE</code> lines right after the initial assignment of <code>$highlightedCode</code> that search for keywords and syntactical constructions and wrap them in <code></code>s with the appropriate Chroma class. For example, here’s some code for highlighting strings: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go-html-template" data-lang="go-html-template">{{ $highlightedCode := replaceRE `\"([^\"]*)\"` `"$1"` $highlightedCode }} </code></pre></div>For this to work, your site needs to be set up to use <a href="https://gohugo.io/content-management/syntax-highlighting/#generate-syntax-highlighter-css">highlighting classes and a stylesheet</a>. You can use the comments in the Hugo-generated stylesheet to determine which class names to use for which elements (or just go by which colour combos you like). There are obvious and severe limitations to using regular expressions to highlight code syntax. Go’s regex does not support lookarounds, and even if it did, <a href="https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454">that way lies madness</a>. You can get around some problems by rearranging the order of your <code>replaceRE</code>s, but there’s a hard limit to what you’ll be able to achieve without proper parsing. The specific cases I’ve implemented work well enough with the snippets I’ve used them for, but are bound to fail for complex code that includes a lot of nesting (string interpolation, commented-out code, etc). Even code that uses language keywords in strings will probably be highlighted incorrectly. I’ve got one more render hook post coming after this one, showcasing an even crazier hack. It has to do with image render hooks. Stay tuned. <div class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1"> Given that I wrote <a href="https://github.com/dmyates/vim-gruescript">a Vim syntax highlighting file</a> for the language, to not have it highlighted on my blog seemed a pity. <a href="https://davidyat.es/2024/10/18/render-hook-syntax-highlighting/#fnref:1" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:2"> This was a very small one. Being intended to mimic natural language, Inform 7 uses very little highlighting. <a href="https://davidyat.es/2024/10/18/render-hook-syntax-highlighting/#fnref:2" class="footnote-backref" role="doc-backlink">↩︎</a> </li> </ol> </div> <a href="mailto:?subject=RE: Syntax%20highlighting%20with%20render%20hooks">Reply via email</a> </article> <article> <h1>Deprecating shortcodes with render hooks</h1> Sun, 06 Oct 2024 15:39:57 +0200 <![CDATA[ When I <a href="/2016/08/19/moving-to-a-static-site/">moved this blog to Hugo</a>, one of the features that most impressed me was <a href="https://gohugo.io/content-management/shortcodes/">shortcodes</a>. These are little snippets that can be used to add complex formatting and content to posts without having to use raw HTML. For example, instead of pasting in the embed code for a Xweet or YouTube video, you use a shortcode like this: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-markdown" data-lang="markdown">{{< youtube VIDEO-ID >}} {{< tweet USERNAME ID >}} </code></pre></div> <aside class="hint-box alert-tip"> Shortcode escaping A codebox was not enough to prevent the above shortcodes from rendering. I had to write them with this syntax: <code>{{</* tweet USERNAME ID */>}}</code>. And to show the syntax, I had to write this: <code>{{</*/* tweet USERNAME ID */*/>}}</code>. And to show that, I had to… you get the idea. This is not explicitly documented – I figured it out from looking at the source of the Hugo documentation. </aside> You can write <a href="https://gohugo.io/templates/shortcode/">custom shortcodes</a> in the Go HTML template language Hugo uses, and they can become quite elaborate. It’s a great way to reuse complex, non-standard HTML formatting, though I’ve sometimes found the syntax a bit cumbersome. Recently, however, Hugo has been adding <a href="https://gohugo.io/render-hooks/introduction/">render hooks</a>, which allow site developers to override the default Markdown-to-HTML conversion process for several elements. It started with headings, links and images, which allowed me to replace my captioned image shortcode with plain Markdown. As a result, this shortcode: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-markdown" data-lang="markdown">{{< figure src="/path/to/image" caption="Image caption" >}} </code></pre></div>Has now been replaced with this Markdown, which produces the same HTML: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-markdown" data-lang="markdown">![Image caption](/path/to/image) </code></pre></div>More recently, Hugo added <a href="https://gohugo.io/render-hooks/blockquotes/">blockquote render hooks</a>.<a href="https://davidyat.es/2024/10/06/deprecating-shortcodes/#fn:1" class="footnote-ref" role="doc-noteref">1</a> The blockquote is not a particularly complex element. <a href="https://gohugo.io/render-hooks/blockquotes/#examples">Per the Hugo docs</a>, default behaviour is to turn this: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-markdown" data-lang="markdown">> Some text </code></pre></div>Into this: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-html" data-lang="html"><blockquote>Some text</blockquote> </code></pre></div>However, between the use of <a href="https://gohugo.io/content-management/markdown-attributes/">attributes</a> and the devs’ inclusion of <a href="https://gohugo.io/render-hooks/blockquotes/#alerts">GitHub/Obsidian-style alert syntax</a>, this one feature has allowed me to deprecate multiple shortcodes, while also providing enhanced formatting possibilities (hitherto unrealised). First, I replaced the info box shortcode, which looked like this: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-markdown" data-lang="markdown">{{% info-box %}} **Did you know?** An interesting tangent. {{% /info-box %}} </code></pre></div>With Markdown like this: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-markdown" data-lang="markdown">> [!note] Did you know? > An interesting tangent. </code></pre></div> <aside class="hint-box alert-note"> Did you know? An interesting tangent. </aside> The first line of the alert blockquote includes a type (in square brackets) and a title (the rest of the line). Within the blockquote template, you can do anything with this information – display the title with special formatting, apply different formatting to different alert types, etc… This is much more flexibility than my old shortcode provided, for less typing. The render hook also provides an optional sign argument, which can be <code>+</code> or <code>-</code>. In <a href="https://obsidian.md/">Obsidian</a>, this is used to make the block foldable, like the <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element/details"><code><details></code></a> element in HTML. I use this element occasionally on this blog, usually for hiding spoilers. Previously, I had a shortcode for it: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-markdown" data-lang="markdown">{{% accordion "Click to reveal spoilers" %}} Soylent Green is made of people. {{% /accordion %}} </code></pre></div>But now I’ve been able to fold that formatting into my blockquote template, allowing for this syntax: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-markdown" data-lang="markdown">> [!spoilers]+ Click to reveal spoilers > Soylent Green is made of people. </code></pre></div> <details> <summary>Click to reveal spoilers </article> <article> <h1>Review: Bezos</h1> Sat, 21 Sep 2024 20:20:26 +0200 Recently, I found myself scrolling through a catalogue of in-flight films when the name Bezos caught my eye. Surely enough, it was a biopic about Jeff Bezos and Amazon’s founding. Having greatly enjoyed the recent BlackBerry as well as older films about tech startups such as The Social Network, Pirates of Silicon Valley and even the more obscure Micro Men,<a href="https://davidyat.es/2024/09/21/review-bezos/#fn:1" class="footnote-ref" role="doc-noteref">1</a> I was hoping for something in a similar vein about the founding and history of Amazon and the life of its singular founder Jeff Bezos. What I got instead was a smaltzy “inspirational” TV movie straight out of the Hallmark catalogue, consisting mostly of enthusiastic but superficial performances to the tune of a script that occasionally made half-hearted attempts to reach for the heights of these other films but fell well short. The film sketches Bezos’s character with a level of subtlety you might expect from ChatGPT output. The film opens with footage of a Blue Origin launch. This is followed by narration from present-day Bezos about his family, beginning with: <blockquote> I was born into great wealth, not the monetary kind, but of a loving family. </blockquote> He speaks in vague but positive tones about his single mother who had him young, his adoptive father who escaped the communist regime in Cuba, and his grandparents, as B-roll of latter-day bald Bezos walking around an Amazon office plays. There’s a touch of the defensive in this opening line, as if Bezos is trying to get ahead of the oft-levelled accusation that receiving Amazon’s initial startup capital from his parents (a key scene in the film) lessens his achievements. Snarky listicles dedicated to busting the myth of the self-made, bootstrapped billionaire often bring this up, as if anyone in the world could turn $300k into a ~$1.9 trillion company in less than three decades. As the film itself states in its white-text-on-black epilogue, Mike and Jackie’s investment had the greatest return in human history. This theme of “my real familial wealth was emotional” is furthered by a peppering of syrupy flashbacks to a young Jeff learning folksy wisdom from his grandfather, after which the actor playing Jeff Bezos will drop out of a thousand-yard stare and shoe-horn into the movie one of the man’s oft-repeated mantras, such as regret minimisation or customer obsession, which rather cheapens the insights. The film fails to connect these anecdotes to Bezos’s actual business experiences. The film covers the period from when Jeff Bezos hit upon the idea of starting an internet commerce company to when Amazon received its first paid order. One early scene shows Jeff Bezos reading an online article about the extraordinary growth of internet business. The camera zooms in on an italicized sentence. The next day, he reads the article again and highlights the italicized sentence. <figure> <figcaption> “Internet sales are currently growing at 2300% annually.” </figcaption> </figure> <a href="https://en.wikipedia.org/wiki/Jeff_Bezos#Amazon">According to Wikipedia</a>, Jeff’s reading an online article really was Amazon’s inciting incident, so points for accuracy, but there have got to be more engaging ways to show that on screen. For all that other movies in this genre have been criticised by their subjects for playing fast and loose with the truth, at least they weren’t boring. After a painfully inauthentic conversation with his wife and a series of montages, Bezos pitches the idea of an online bookstore to his hedge fund boss. His boss, played by one of the few actors in this movie who seems to be trying to act, shoots down the idea. While initially discouraged by this, Bezos receives some encouragement from his wife and a smaltzy flashback, prompting him to launch into a monologue on the Regret Minimization Framework™ and quit his job to pursue entrepreneurship. From here, we go through Jeff phoning people to get investment, hiring programmers and moving to Seattle. The film makes the occasional attempt to introduce conflict or drama by having Jeff and Mackenzie argue about something, or Amazon’s programmers complain about long hours and bad pay. The acting does little to sell these moments. Slightly more engaging are Bezos’s confrontations with the CEO of Barnes & Noble, Leonard Riggio.<a href="https://davidyat.es/2024/09/21/review-bezos/#fn:2" class="footnote-ref" role="doc-noteref">2</a> Kevin Sorbo, who plays Riggio, has the distinction of being the sole actor in the film who actually seems to inhabit his character. But the film’s tiny scope prevents any real development of this conflict – by the end, Barnes & Noble is still on top and Amazon has only sold one book. <figure> <figcaption> “Hmm, that’s an interesting idea.” </figcaption> </figure> One of my favourite scenes in BlackBerry features technical Co-CEO Mike Lazaridis explaining his plan for a push service through RIM’s servers to get around the problem of bandwidth constraints on early cellular networks to a skeptical AT&T board. This is the sort of thing I’m looking for from a movie like this – moments of genius and real innovation. Bezos seems to be reaching for this sort of moment in a couple of places. In the most successful scene of this nature, the Amazon employees figure out they can skirt book suppliers’ minimum order volumes by padding orders of books they actually want with copies of obscure, out-of-print books which will invariably fail to arrive and not be charged for. Worse moments include Jeff vaguely telling his programmer to make the user interface friendlier and Jeff presenting to a boardroom about how Amazon will ultimately crush its small competitors.<a href="https://davidyat.es/2024/09/21/review-bezos/#fn:3" class="footnote-ref" role="doc-noteref">3</a> There’s a recurring joke in which Jeff comes up with a new, usually bad, name for the company, cycling from Cadabra (“like Abracadabra, not like a dead body”), Relentless.com, TheWorldsBiggestBookstore.com and until finally settling on Amazon.com. I almost laughed at this one. In the film’s closing scene, Bezos and company are pulling a characteristic all-nighter at their garage office during the website’s beta testing stage when they hear a ding on one of their computers and realise they’ve just made their first sale. Everyone celebrates, and the screen blacks out for the customary series of white sentences on black backgrounds describing Amazon’s current valuation and Bezos’s current wealth, leaving the viewer with a profound sense of “That’s all? Really? That’s where you’re going to end it?” <figure> <figcaption> In a scene not intended to be paused on, the Amazon team watches in suspense as their lead dev pushes the beta site to prod by typing out <a href="https://github.com/volca02/openDarkEngine/blob/7a2d7baaf0fc5194a9066a635c6f44b0f7b26c56/src/main/OgreFixedZip.cpp#L29">code from a random Github repo</a>. </figcaption> </figure> As much as I didn’t want to watch any more of this awful movie, the 99 minutes on offer here didn’t really do justice to the history of the world’s largest online retailer (and cloud hosting provider!). Its subtitle, “The Beginning”, seems to promise multiple sequels, but a better movie wouldn’t need any. But we can’t blame everything on the scriptwriters – this film was based on a written biography of Bezos. <figure> <figcaption> </figcaption> </figure> Specifically, <a href="https://www.goodreads.com/book/show/57141511-jeff-bezos---zero-to-hero-biographies-for-young-readers">Zero to Hero: Jeff Bezos</a>, part of the Zero to Hero series of illustrated biographies for children five years and up. <figure> <figcaption> </figcaption> </figure> The choice of an obscure children’s book <a href="https://www.amazon.sg/Jeff-Bezos-Biographies-Biography-children/dp/B08WZL1Q9N">that is only available on Amazon.sg and has no reviews</a> for a biopic of Amazon’s founder is baffling. I can find no information on whether Bezos approved of this project or what he even thinks about it, so I’ll refrain from accusing him of commissioning a hagiography. But I do think he deserves a better movie. <div class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1"> Featuring Martin Freeman as the founder of <a href="https://en.wikipedia.org/wiki/Acorn_Computers">Acorn Computers</a>. <a href="https://davidyat.es/2024/09/21/review-bezos/#fnref:1" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:2"> <a href="https://www.npr.org/2024/08/27/nx-s1-5091174/barnes-noble-leonard-riggio-obituary">RIP</a>. <a href="https://davidyat.es/2024/09/21/review-bezos/#fnref:2" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:3"> There’s a brief moment after this last scene where the film, speaking through Mackenzie, almost criticises Bezos’s ruthlessness, but you can tell its heart isn’t in it. The Social Network this ain’t. <a href="https://davidyat.es/2024/09/21/review-bezos/#fnref:3" class="footnote-backref" role="doc-backlink">↩︎</a> </li> </ol> </div> <a href="mailto:?subject=RE: Review%3a%20Bezos">Reply via email</a> </article> <article> <h1>Year X</h1> Sun, 07 Apr 2024 21:34:38 +0200 I published this blog’s <a href="/2014/04/07/this-is-the-first-post/">first post on 7 April 2014</a>.<a href="https://davidyat.es/2024/04/07/ten-years/#fn:1" class="footnote-ref" role="doc-noteref">1</a> It’s mostly about the different tech stacks I considered and discarded before setting up the initial iteration of the blog on a pre-1.0 version of <a href="https://ghost.org/">Ghost</a>.<a href="https://davidyat.es/2024/04/07/ten-years/#fn:2" class="footnote-ref" role="doc-noteref">2</a> Writing that post, I hoped to continue writing additional posts and make the blog a regular habit. That I’m writing this ten years later means that I succeeded. When I started this blog, I was in my final year of university, and today I’m a real adult and have been for quite some time. I’ve learnt a lot and even changed some opinions over that time, and this blog’s been a constant through all of it. I like to think that the writing has improved somewhat. I also like to think the design is better now, or at least more memorable. This site launched with a Ghost version of the WordPress <a href="https://wordpress.org/themes/landscape/">Landscape theme</a>. In the early days, Ghost didn’t have a built-in navigation bar or next/previous post links, so I had to hack/add those in as the software was updated. Sometime in 2016, I swapped Landscape out for a custom-written theme, largely because I could never find a site header image I was happy with.<a href="https://davidyat.es/2024/04/07/ten-years/#fn:3" class="footnote-ref" role="doc-noteref">3</a> This initial theme did not represent a radical departure from Landscape – you can see what it looked like <a href="https://github.com/dmyates/hugo-allover-theme">here</a>. No individual site design has been static – I tinker with them constantly. At some point after writing this custom theme, I started using boxes with post images for my next/previous post links, like the ones below. <div class="nav-list-headingless"> <a href="/2024/04/07/ten-years/" class='title-link nav-box h-entry u-url ' style='--bg: url(/content/images/2024/04/x.png); ' > <div class="nav-item p-name">Year X</div> <time class="dt-published" datetime="2024-04-07 21:34:38 +0200">7 April 2024</time> <a href="/" class="u-author" style="display:none;"></a> </a> <a href="/2023/04/07/nine-years/" class='title-link nav-box h-entry u-url ' style='--bg: url(/content/images/2023/04/balloons-gf19899a05_640.png); ' > <div class="nav-item p-name">Nine years go by</div> <time class="dt-published" datetime="2023-04-07 16:57:19 +0200">7 April 2023</time> <a href="/" class="u-author" style="display:none;"></a> </a> <a href="/2022/04/07/eight-years/" class='title-link nav-box h-entry u-url ' style='--bg: url(/content/images/2022/04/Eight_ball_%283524033235%29.jpg); ' > <div class="nav-item p-name">Eight bloggy years</div> <time class="dt-published" datetime="2022-04-07 16:57:19 +0200">7 April 2022</time> <a href="/" class="u-author" style="display:none;"></a> </a> <a href="/2021/04/07/seven-years/" class='title-link nav-box h-entry u-url ' style='--bg: url(/content/images/2021/01/2-d_heptagon_packing_dual.svg); ' > <div class="nav-item p-name">Lucky number seven</div> <time class="dt-published" datetime="2021-04-07 05:37:02 +0200">7 April 2021</time> <a href="/" class="u-author" style="display:none;"></a> </a> <a href="/2020/04/07/six-years/" class='title-link nav-box h-entry u-url ' style='--bg: url(/content/images/2020/03/rover.jpg); ' > <div class="nav-item p-name">Year Number Six</div> <time class="dt-published" datetime="2020-04-07 07:10:47 +0200">7 April 2020</time> <a href="/" class="u-author" style="display:none;"></a> </a> <a href="/2019/04/07/five-years/" class='title-link nav-box h-entry u-url ' style='--bg: url(/content/images/2019/02/5-tally.png); ' > <div class="nav-item p-name">Five years</div> <time class="dt-published" datetime="2019-04-07 09:57:16 +0200">7 April 2019</time> <a href="/" class="u-author" style="display:none;"></a> </a> <a href="/2018/04/07/four-years/" class='title-link nav-box h-entry u-url ' style='--bg: url(/content/images/2018/04/quickmaths.png); ' > <div class="nav-item p-name">Going fourth</div> <time class="dt-published" datetime="2018-04-07 18:52:28 +0200">7 April 2018</time> <a href="/" class="u-author" style="display:none;"></a> </a> <a href="/2017/04/07/three-years/" class='title-link nav-box h-entry u-url ' style='--bg: url(/content/images/2017/04/suggestionbox.jpg); ' > <div class="nav-item p-name">The first three years</div> <time class="dt-published" datetime="2017-04-07 07:36:00 +0200">7 April 2017</time> <a href="/" class="u-author" style="display:none;"></a> </a> <a href="/2016/04/07/two-years/" class='title-link nav-box h-entry u-url ' style='--bg: url(/content/images/2016/04/Disk1.png); ' > <div class="nav-item p-name">End of Year Two</div> <time class="dt-published" datetime="2016-04-07 17:14:00 UTC">7 April 2016</time> <a href="/" class="u-author" style="display:none;"></a> </a> <a href="/2015/04/07/one-year/" class='title-link nav-box h-entry u-url ' style='--bg: url(/content/images/2015/04/subtime.jpg); ' > <div class="nav-item p-name">YEAR OF THE GHOST BLOGGING PLATFORM</div> <time class="dt-published" datetime="2015-04-07 05:58:31 UTC">7 April 2015</time> <a href="/" class="u-author" style="display:none;"></a> </a> <a href="/2014/04/07/this-is-the-first-post/" class='title-link nav-box h-entry u-url ' > <div class="nav-item p-name">This is the first post</div> <time class="dt-published" datetime="2014-04-07 09:24:32 UTC">7 April 2014</time> <a href="/" class="u-author" style="display:none;"></a> </a> </div> I fell so in love with these boxes that I ended up <a href="/2019/05/05/site-redesign/">designing a whole new theme around them in 2019</a>. And that’s what this blog has been using ever since, with some minor touch-ups and a <a href="/2019/06/24/indieweb/">few</a> <a href="/2019/11/04/dark-mode/">additional</a> <a href="/2020/12/31/footnote-previews/">features</a>. It’s gotten some nice compliments from readers and I have yet to come up with something better. As for actual posts, some years have been more fruitful than others. <a href="/2023/">2023</a> saw a good deal more posts than <a href="/2022/">2022</a>,<a href="https://davidyat.es/2024/04/07/ten-years/#fn:4" class="footnote-ref" role="doc-noteref">4</a> and <a href="/2024/">2024</a> is doing okay so far. I have never kept an explicit schedule for posting, nor has this blog ever had an official subject, and I intend to keep it that way. And so this blog will remain a repository for both painstakingly detailed tutorials assembled over months, rants slapped together in an hour or two and reviews of stuff I feel is worth reviewing (in any medium). Some posts might even be informative and/or amusing. I also hope to unveil some new <a href="/projects/">projects</a> as well, and it’s probably about time I added something fresh to the <a href="/fiction/">fiction</a> section. Apart from its primary blog function, this site serves as a great way to keep all that kind of stuff together. The main subject of <a href="/2023/04/07/nine-years/">my last retrospective post</a> was generative AI as it pertains to writing, a topic which has not declined in relevance over the last twelve months. I still agree with the perspective I put forward there, but I have a lot of other thoughts about this subject, which I’ve been stuffing into a draft post that is becoming unwieldy and will definitely need to be split into multiple pieces before it’s ready to share. The process of writing it will also help me <a href="https://www.paulgraham.com/essay.html">discover what my thoughts ultimately actually are</a>. Generative AI is well enough established and understood now that I feel comfortable making some pronouncements. So here’s to the next decade of blogging. Thank you for reading. <div class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1"> There are a few posts with earlier dates that – these are reviews I wrote in other places and later added to my blog out of a desire to have all my writing in one place. <a href="https://davidyat.es/2024/04/07/ten-years/#fnref:1" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:2"> I later <a href="/2016/08/19/moving-to-a-static-site/">moved to Hugo</a>, which I still use. <a href="https://davidyat.es/2024/04/07/ten-years/#fnref:2" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:3"> This was long before the advent of generative AI made such problems obsolete (though perhaps I could have used a <a href="https://en.wikipedia.org/wiki/DeepDream">creepy dogscape</a>). <a href="https://davidyat.es/2024/04/07/ten-years/#fnref:3" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:4"> Making something for Ludum Dare <a href="/tags/ludum-dare/">was the main reason</a> – I should do that again sometime. <a href="https://davidyat.es/2024/04/07/ten-years/#fnref:4" class="footnote-backref" role="doc-backlink">↩︎</a> </li> </ol> </div> <a href="mailto:?subject=RE: Year%20X">Reply via email</a> </article> <article> <h1>The many (bad) interfaces of Substack</h1> Fri, 29 Mar 2024 16:58:35 +0200 Substack is platform for creating email newsletters that people pay money to read. The platform was inspired by Ben Thompson’s <a href="https://stratechery.com/">Stratechery</a> and has largely supplanted Medium as the modern, trendy place to publish. Unlike Medium, it has a clear funding model that doesn’t change every six months. It also doesn’t attempt to present itself as a singular, <a href="https://practicaltypography.com/billionaires-typewriter.html">homogeneous</a> platform – individual Substack sites have their own subdomains and even a limited amount of custom styling. Substacks are email newsletters first and foremost. Visit just about any stack, and the first thing you see is a full-screen pop-up asking for your email address. <figure> <figcaption> Please commit to receiving emails from us before reading a single article. </figcaption> </figure> Start reading an article, and you’ll get an overlay pop-up asking the same thing once you scroll past the first screen of text.<a href="https://davidyat.es/2024/03/29/substack-ux/#fn:1" class="footnote-ref" role="doc-noteref">1</a> <figure> <figcaption> Are you ready to commit to receiving emails from us now, after reading three paragraphs? </figcaption> </figure> Switch to another tab and come back later, and you’ll have to click through the first overlay again to get back to your article. <figure> <figcaption> Ready to subscribe yet, huh, huh?! </figcaption> </figure> All this is highly irritating to me as a person who reads many Substacks but categorically does not want to receive them as emails. To their credit, Substack provides RSS feeds for their newsletters, allowing me to subscribe to ones I like <a href="/2020/09/05/fraidycat/">using Fraidycat</a>, where they appear among other blogs and extra-special social media accounts I follow in the same way. In other words, Fraidycat gives me have a screen full of Substacks, listed vertically, showing links to the ~5 most recent posts from each horizontally, something like this: <ul> <li>Substack #1 <ul> <li>Post / Post / Post / Post / Post</li> </ul> </li> <li>Substack #2 <ul> <li>Post / Post / Post / Post / Post</li> </ul> </li> <li>Substack #3 <ul> <li>Post / Post / Post / Post / Post</li> </ul> </li> <li>Substack #4 <ul> <li>Post / Post / Post / Post / Post</li> </ul> </li> <li>Substack #5 <ul> <li>Post / Post / Post / Post / Post</li> </ul> </li> </ul> The horizontal list can be expanded in-place to a vertical list of ten. This seems to me like a very sensible way to display this information: here are the blogs you’re interested in and here’s what each one has posted recently, all on one screen. Recently, I made a Substack account and officially subscribed to everything I read,<a href="https://davidyat.es/2024/03/29/substack-ux/#fn:2" class="footnote-ref" role="doc-noteref">2</a> mostly to get rid of the email nag screens.<a href="https://davidyat.es/2024/03/29/substack-ux/#fn:3" class="footnote-ref" role="doc-noteref">3</a> Secondarily, I was interested in what the reading/subscription management user interface was like. As it turns out, there are a few different interfaces for this: <dl> <dt>Home</dt> <dd>This shows a gallery of recent posts by newsletters you’re subscribed to at the top, and Substack’s Twitter clone named Notes at the bottom. Posts on Notes mostly consist of authors linking to their Substack articles, but sometimes to other articles. Other users can reply to these posts.</dd> <dt>Inbox</dt> <dd>This shows a list of posts by newsletters you’ve subscribed to, sorted by recency. For the most part, posts only appear in the Inbox if they were published after you subscribed, but if you pay for a subscription, you’ll get a few older posts after your first payment goes through.</dd> <dt>Chat</dt> <dd>This shows an interface that looks a bit like Telegram. Subscriptions are shown as channels, and posts in these channels largely consist of authors linking to their Substack articles. Users can reply to these posts.</dd> <dt>Library</dt> <dd>This shows a list of Substacks you’re subscribed to, differentiating free and paid subscriptions. You can also make a list of “saved” posts.</dd> </dl> Of these four interfaces, three appear to be half-baked copies of other apps and all are vastly inferior to the nested list I’m accustomed to. The inbox view provides a firehose of content, sorted only be recency and filterable only on what you’ve paid for and what you’re getting for free. It also doesn’t show you anything from before you subscribed to a given newsletter. The library view is a simple list of bookmarks. And the Twitter and Telegram views cloud the actual newsletter content with other nonsense that I could get on Twitter or Telegram or one of their five million other clones. What’s worse, these different interfaces scatter user replies – users can leave a public reply to a given post in one of four ways: <ol> <li>As a comment on the post itself.</li> <li>As a reply to the post announcement on Notes.</li> <li>As a quote of the post announcement on Notes.</li> <li>As a reply to the post announcement on Chat.</li> </ol> 1 and 2, at least, appear to be different views of same thing, but the other two seem like an unnecessary dispersion of discourse within a single website. Each of these interfaces was designed to mimick another application, and one with a different purpose – an email inbox, Twitter, an instant messenger. Each one is suboptimal at the task of showing me a digest of the blogs I’m interested in and what they’re put out recently. Substack is also terrible at making older content accessible. The Archive screen on each stack is an endless scroll, with no way to see posts from any particular time. You just have to descend with a patient scroll-wheel and watch Substack’s JavaScript gremlins spend multiple full seconds on the tough wizardry of… rendering a handful of images and a few lines of text. The point of <a href="/2017/05/18/rss-nothing-better/">my RSS post</a> was that email newsletters were a regression from what came before. I still think that’s true. Substack seems almost specifically designed to hit the criticisms in that piece – they started with email newsletters, and then built Twitter newsletters and Telegram newsletters on top of that, aping the limitations of each format. This is <a href="/2021/01/10/death-to-the-document/">toxic skeuomorphism</a>, though it’s aimed at other digital interfaces rather than physical objects. It feels like Substack looked out at how newsletters and blogs were being distributed and read on the web and then directly copied a few of those methods rather than considering whether they could create something better, or what an interface for following and reading newsletters/blogs might look like if it were designed from the ground up. Maybe this is just me complaining that Substack has failed to cater to my particular and idiosyncratic whims (how dare they!) but these weird and suboptimal interfaces did really baffle me, and I thus have retreated from the app built by a $650 million Silicon Valley startup to the superior UX of <a href="https://github.com/kickscondor/fraidycat">this open source browser extension made by one person</a>. <div class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1"> Even worse, Substack chose to spite me personally by changing the appearance of their nag screens right after I took these screenshots but before I finished writing this post. <a href="https://davidyat.es/2024/03/29/substack-ux/#fnref:1" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:2"> You can subscribe to newsletters and not receive them as emails through the site’s “Disable all emails” setting. <a href="https://davidyat.es/2024/03/29/substack-ux/#fnref:2" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:3"> <a href="https://www.astralcodexten.com/p/logistics">When setting up Astral Codex Ten, Scott Alexander got Substack to implement a feature disabling the email nag popup.</a> This is sadly not the default setting and I haven’t seen it used by anyone else. <a href="https://davidyat.es/2024/03/29/substack-ux/#fnref:3" class="footnote-backref" role="doc-backlink">↩︎</a> </li> </ol> </div> <a href="mailto:?subject=RE: The%20many%20%28bad%29%20interfaces%20of%20Substack">Reply via email</a> </article> <article> <h1>Print stylesheets</h1> Sat, 17 Feb 2024 12:59:57 +0200 If you’re reading this on a device with a keyboard, press <kbd>Ctrl+P</kbd> now. On most browsers and systems, that should bring up a Print Preview dialogue for printing this webpage. You should notice quite a few differences between how this blog post appears in your browser and how it appears on the page. For one thing, the text is split into two columns, and for another, many parts of the page – the site header, footer and several buttons – have disappeared. What is this wizardry, you may ask? As the title of this post indicates, it’s all about print stylesheets. <h1 id="media-queries-and-the-promise-of-page"> <a class="heading-anchor" href="#media-queries-and-the-promise-of-page">#</a> Media queries and the promise of <code>@page</code> </h1> Media queries have been an important tool for building responsive CSS since their introduction with CSS 3 in 2012. Many of their original uses have since been superseded by natively responsive constructs such as <a href="https://developer.mozilla.org/en-US/docs/Learn/CSS/CSS_layout/Flexbox">Flexbox</a>, but they’re still the best way to make specific and potentially quite radical changes in layout between different screen sizes. For example, I use the rule below to display <a href="/2017/10/18/description-list/">description lists</a> in two columns on larger displays, as opposed to their browser-default vertical appearance: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-css" data-lang="css">@media (min-width: 600px) { dd { grid-column: 2; } } </code></pre></div>I’ve also used media queries (and CSS variables) to create a <a href="/2019/11/04/dark-mode/">dark mode</a> for this site without the need for a day/night button and all the extra JS that would go with that: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-css" data-lang="css">@media (prefers-color-scheme: dark) { /* ... */ } </code></pre></div>Both of the above snippets use media queries without a media type, but perhaps you’ve seen directives that look like this before: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-css" data-lang="css">@media screen and (min-width: 600px) { /* ... */ } </code></pre></div>As it turns out, media queries are a modern extension to media types, which was <a href="https://www.w3.org/TR/CSS21/media.html">first introduced in CSS 2.1</a>. Before media queries, all you could do was write rules like this: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-css" data-lang="css">@media screen { body { color: black; background-color: white; } } </code></pre></div>Or this: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-css" data-lang="css">@media print { h1 { page-break-before: always; } } </code></pre></div>Wait a minute, page break? In CSS? Yes – the purpose of media types is to apply different styles for different media, including screens, printed pages <a href="https://www.w3.org/TR/CSS21/media.html#media-types">and more</a><a href="https://davidyat.es/2024/02/17/print-stylesheets/#fn:1" class="footnote-ref" role="doc-noteref">1</a>. <a href="https://meyerweb.com/eric/articles/webrev/200001.html">Here’s an Eric Meyer blog post from January 2000 on writing stylesheet rules for print</a> so that you don’t have to maintain a separate printer-friendly version of each page on your site (remember those?). Print stylesheets have been with us since the early days of the web, but they’ve never been a very well-known or publicised feature of CSS. Apart from <code>@media print</code> and <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_paged_media">paged media properties dealing with page breaks</a>, the other key element for creating a print stylesheet is <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/@page"><code>@page</code></a>, an at-rule for controlling aspects of individual printed pages. When I was putting together this site’s <a href="/2020/12/31/footnote-previews/">footnote previews</a>, I came across <a href="https://www.smashingmagazine.com/2015/01/designing-for-print-with-css/">this Smashing Magazine article</a>, which made it seem as though I might be able to implement per-page footnotes for printed versions of my site. However, a bit of experimenting followed by a more careful read of the article revealed that the majority of <a href="https://www.w3.org/TR/css-gcpm-3/"><code>@page</code>’s proposed features</a> have not been implemented in any mainstream browser. As of now, you can configure how big your page is, what its margins should be, and whether it ought to be portrait or landscape, and you can also do different things with left and right pages. The footnote rules and the margin rules intended for creating page headers and footers <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/@page#margin_at-rules">do nothing</a>. So, after excitedly implementing a bunch of print stylesheet rules, I abandoned my attempt at full printer-friendliness and forgot about the subject for the next three years. <h1 id="enter-pagedjs"> <a class="heading-anchor" href="#enter-pagedjs">#</a> Enter Paged.js </h1> Recently, a project unrelated to this website gave me cause to think about this print stylesheet stuff again. While browsers don’t support most of this functionality, it’s a defined specification that has been implemented by several libraries and paid services for turning HTML into PDFs and books. The website <a href="https://print-css.rocks/">print-css.rocks</a> provides a wealth of information on the subject as well as a <a href="https://print-css.rocks/lessons">handy index of features</a> showing which implementation supports what. There’s also <a href="https://printcss.live/">a companion playground site</a> that lets you test out the different implementations. As far as free-to-use implementations go, <a href="https://pagedjs.org/">Paged.js</a> appears to be the most fully featured and <a href="https://pagedjs.org/documentation/">well documented</a>. It <a href="https://pagedjs.org/documentation/2-getting-started-with-paged.js/">can be used</a> as a <a href="https://developer.mozilla.org/en-US/docs/Glossary/Polyfill">polyfill</a> script or through the command line. The polyfill script converts the page where it’s injected into an on-screen simulation of a printed document – you end up with your entire webpage in a tiny corner with scrollbars and page separators. You can then save the page as a PDF (or print it directly) using <kbd>Ctrl+P</kbd>, and this time the preview will show all those nice <code>@page</code> features. My preferred method, though, is to use the CLI, which converts any HTML file to a PDF in one step: <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sh" data-lang="sh">pagedjs-cli input.html -o output.pdf </code></pre></div>Under the hood, this injects the Paged.js polyfill script into <code>input.html</code>’s <code><head></code>, serves it to a headless Chromium and prints to PDF. Note that <code>input.html</code> should not already be using the polyfill – this causes crazy things to happen. For the project I was working on, my pre-Paged.js print stylesheet had: <ul> <li>A couple of <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/page-break-before"><code>page-break-before</code></a> rules for specific heading elements</li> <li><code>@page</code> size and margins</li> <li>various <code>@media print</code> rules that followed the opposite of responsive design principles – <code>font-size</code> specified in <code>pt</code> rather than <code>em</code> and sizes of elements in fixed <code>mm</code> and <code>cm</code>. It’s rather freeing to write CSS for a page that you know won’t ever change in width or height.</li> </ul> From there, I delved into Paged.js-specific features such as page <a href="https://pagedjs.org/documentation/7-generated-content-in-margin-boxes/#named-string%3A-classical-running-headers%2Ffooters">headers and footers</a>, and a <a href="https://pagedjs.org/posts/build-a-table-of-contents-from-your-html/">table of contents complete with page numbers</a>. I also had to deal with a few bugs (mostly of my own making). Here’s what I learnt: <ol> <li>Writing CSS for a JavaScript polyfill provided some harsh reminders about how forgiving browsers are in comparison. Leave the second colon off a pseudo-element (e.g. <code>h1:before</code> instead of <code>h1::before</code>) and your browser will understand what you mean and apply the rule, but Paged.js won’t. Similarly, the browser will let you increment a counter in a pseudo-element, whereas Paged.js will not.</li> <li>MDN <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/page-break-after">will tell you</a> that <code>page-break-after|before|inside</code> is deprecated and you should use <code>break-after|before|inside</code> instead, but Paged.js still very much expects that preceding <code>page-</code>.</li> <li>There are many small differences between how a given browser will render a page for printing and how Paged.js will do it, beyond the additional feature support. For example, Chrome is a lot better at printing tables – it will automatically repeat table headers at the top of new pages for long-running tables,<a href="https://davidyat.es/2024/02/17/print-stylesheets/#fn:2" class="footnote-ref" role="doc-noteref">2</a> and is less prone to cutting off rows.<a href="https://davidyat.es/2024/02/17/print-stylesheets/#fn:3" class="footnote-ref" role="doc-noteref">3</a></li> <li>There are some non-obvious things Paged.js doesn’t support, such as <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/grid">CSS Grid</a>, and some things it supports that Chrome doesn’t, such as <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/element"><code>element()</code></a>.<a href="https://davidyat.es/2024/02/17/print-stylesheets/#fn:4" class="footnote-ref" role="doc-noteref">4</a></li> <li>The most maddening thing about working with Paged.js is dealing with multi-page tables.<a href="https://davidyat.es/2024/02/17/print-stylesheets/#fn:5" class="footnote-ref" role="doc-noteref">5</a> At the time I’m writing this, the overflow detection is quite buggy, and a table with too much padding will just lose rows off the end of a page. It will still continue on the next page, though, so you’ll end up with, say, rows 1-8 on the first page and rows 11-14 on the second, with no sign of the missing two in the middle. In the end, I gave up on vertical padding in table cells beyond a pixel or two.<a href="https://davidyat.es/2024/02/17/print-stylesheets/#fn:6" class="footnote-ref" role="doc-noteref">6</a></li> <li>The second most maddening thing about working with Paged.js is dealing with multi-page codeblocks (<code><pre></code>). I had to set <code>display: inline</code> and then do all kinds of weird things to get the background colour back before they would split normally across pages like they do in Chrome’s print preview.</li> <li>The differences between Paged.js and browser printing mean that, at some point, you’re likely to end up with three sets of stylesheet rules: what to show on screen, what to show in browser print and what to show in Paged.js. And by the very nature of what Paged.js does – simulating <code>@media print</code> on <code>@media screen</code> – there’s no way to differentiate between those last two, so we can say goodbye to the dream of a single stylesheet.</li> </ol> <h1 id="final-thoughts"> <a class="heading-anchor" href="#final-thoughts">#</a> Final thoughts </h1> My goal with making a print stylesheet for this site was to have some neat surprises for anyone who attempted to print a page from it. As of today, most of the neat surprises I would have liked to include would only be displayed through a third-party library, which would spoil them. Paged.js and its competitors are very much dev-oriented internal tools that wouldn’t make sense to implement here. That said, this has been a very useful discovery for other projects, and I’m heartened to see such actively developed and feature-rich libraries for producing print media with HTML. I’ve done my time <a href="/tags/latex">in the LaTeX mines</a> and it is with a wealth of experience that I say I’d rather use HTML and CSS for my document finagling. So I’m glad there’s a decent way to do that for print (and <a href="/2021/01/10/death-to-the-document/">fake print</a>).<a href="https://davidyat.es/2024/02/17/print-stylesheets/#fn:7" class="footnote-ref" role="doc-noteref">7</a> <h1 id="useful-resources"> <a class="heading-anchor" href="#useful-resources">#</a> Useful resources </h1> <ul> <li>MDN articles: <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_paged_media">CSS paged media</a>, <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/@page"><code>@page</code></a> and <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/@media"><code>@media</code></a></li> <li><a href="https://print-css.rocks/">print-css.rocks</a></li> <li><a href="https://printcss.live/">PrintCSS Playground</a></li> <li><a href="https://pagedjs.org/">Paged.js</a></li> <li><a href="https://www.smashingmagazine.com/2015/01/designing-for-print-with-css/">Designing for Print With CSS | Smashing Magazine</a></li> </ul> <div class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1"> All media types except <code>print</code>, <code>screen</code> and <code>all</code> have <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/@media#media_types">since been deprecated</a>. <a href="https://davidyat.es/2024/02/17/print-stylesheets/#fnref:1" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:2"> To do this in Paged.js, you need to use <a href="https://gitlab.coko.foundation/pagedjs/pagedjs/-/issues/84#note_77132">some JavaScript code from an issue on their Gitlab</a>. To run the code with <code>pagedjs-cli</code>, save it to a file and use the flag <code>--additional-script repeatingTableHeaders.js</code>. <a href="https://davidyat.es/2024/02/17/print-stylesheets/#fnref:2" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:3"> Both Chrome and Paged.js will split one table cell across multiple pages sometimes though, even with <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/break-inside"><code>break-inside: avoid</code></a>. <a href="https://davidyat.es/2024/02/17/print-stylesheets/#fnref:3" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:4"> Firefox remains the only major browser to support this crazy CSS feature (which is very useful in Paged.js for creating page headers and footers). <a href="https://davidyat.es/2024/02/17/print-stylesheets/#fnref:4" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:5"> You could probably say the same thing about LaTeX and many other document formatting systems. Tables, man. <a href="https://davidyat.es/2024/02/17/print-stylesheets/#fnref:5" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:6"> There’s <a href="https://github.com/pagedjs/pagedjs/pull/171">a massive PR</a> of pagination fixes open right now, so hopefully this gets resolved for the next release. <a href="https://davidyat.es/2024/02/17/print-stylesheets/#fnref:6" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:7"> <a href="https://www.goodreads.com/quotes/8206-do-i-contradict-myself-very-well-then-i-contradict-myself">“Very well then I contradict myself…”</a> <a href="https://davidyat.es/2024/02/17/print-stylesheets/#fnref:7" class="footnote-backref" role="doc-backlink">↩︎</a> </li> </ol> </div> <a href="mailto:?subject=RE: Print%20stylesheets">Reply via email</a> </article> <article> <h1>Adventures in latent space</h1> Thu, 19 Oct 2023 11:27:05 +0200 I first <a href="/2022/08/31/stable-diffusion/">wrote about Stable Diffusion last August</a>, shortly after its initial public release, when Stability AI transformed LLM-based image generation from <a href="https://openai.com/product/dall-e-2">a black-box gimmick behind a paywall</a> into something that could be used and built on by anyone with a Github account. In the intervening year, that’s exactly what’s happened. Even at the outset, Stable Diffusion gave budding <a href="https://en.wikipedia.org/wiki/Synthography">synthographers</a> a wealth of tools not available for DALL-E 2 or Midjourney. Being able to control settings like randomness seeds, samplers, step counts, resolution and CFG scale, plus seemingly endless possibilities of both the text and image prompts were, it appeared, a recipe for creating just about anything. By altering text for the same seed, or running multiple iterations of img2img, the synthographer could gradually nudge an image with potential until it matched their vision. But it didn’t stop there. Updates and innovations have come so thick and fast that my every previous attempt at writing this post (for over a year) was undermined by some new advance or technique that demanded inclusion. I won’t pretend to be comprehensive here, but I’ll try to cover major advances and useful techniques pioneered over the last year below. We’ll look at three aspects of image generation: text prompting, models, and image prompting. <aside class="hint-box alert-tip"> Try this at home: Purple-bordered images in this post have embedded <a href="https://github.com/comfyanonymous/ComfyUI">ComfyUI</a> workflows. If you have ComfyUI set up, you can drag any of the images onto your canvas to load the workflow for that image, including the prompt and model used. </aside> <h1 id="text-prompting"> <a class="heading-anchor" href="#text-prompting">#</a> Text prompting </h1> The first new thing that got big after my last post was the negative prompt, i.e. an additional prompt specifying what should not be in the image. For example, if you got an image of a person with hat, you could regenerate it with the same seed and the word “hat” in the negative prompt to get a similar image without the hat.<a href="https://davidyat.es/2023/10/19/latent-space/#fn:1" class="footnote-ref" role="doc-noteref">1</a> <div class="gallery"> <figure class="sdworkflow"><figcaption> Hat </figcaption> </figure> <figure class="sdworkflow"><figcaption> No hat </figcaption> </figure> </div> <aside class="hint-box alert-note"> Don’t do it badly, please: To compensate for image generation’s greatest weakness, human hands, it became popular to use phrases like “deformed hands” in negative prompts, though <a href="https://www.reddit.com/r/StableDiffusion/comments/zi6dr8/why_your_negative_prompt_incantations_are_mostly/">it’s debatable whether this actually does anything</a> besides making hands more likely to be hidden or out of frame. Similarly, phrases like “badly drawn” have become the negative prompt equivalents of “masterpiece” and “trending on artstation”. </aside> There were also innovations in prompting itself, beyond the mere tweaking of words. SD has always weighted words at the start of a prompt more than words at the end, but the <a href="https://github.com/AUTOMATIC1111/stable-diffusion-webui">AUTOMATIC1111 webui</a> implemented an additional way to control emphasis – words surrounded by <code>()</code> would be weighted more and words with <code>[]</code> weighted less. Add additional brackets to amplify the effect, or control it more precisely with syntax like this: <code>(emphasised:1.5)</code>, <code>(de-emphasised:0.5)</code>. People also experimented with changes to the prompt as generation went on. In one method, <a href="https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#alternating-words">alternating words</a>, the prompt <code>[grey|orange] cat</code> would use <code>grey cat</code> for step 1, <code>orange cat</code> for step 2, <code>grey cat</code> for step 3, and so on, providing a way to blend concepts in a manner difficult to achieve with text alone. In another method, <a href="https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#prompt-editing">prompt editing</a>, the first option would be used for N steps, and the second option would be used for the rest. This concept was taken further by <a href="https://github.com/ljleb/prompt-fusion-extension">Prompt Fusion</a>, which allows more than one change and finer control of the interpolation between prompts. <figure><figcaption> <a href="https://github.com/ljleb/prompt-fusion-extension">Prompt Fusion example</a> </figcaption> </figure> <a href="https://stable-diffusion-art.com/regional-prompter/">Regional prompting</a> is another useful technique, helping to separate distinct areas of a given prompt – without this, Stable Diffusion will apply colours and details willy-nilly – ordinarily, “a woman with brown eyes wearing a blue dress” may generate a woman with blue eyes and a brown dress, or even a woman with a brown dress and green eyes standing in front of a blue background! In the days of the 1.4 model, prompting often felt like the be-all and end-all of image generation. Prompting was the only interface exposed by the closed-source online image generators, DALL-E 2 and Midjourney, that immediately preceded SD 1.4’s public release. The model was a <a href="https://libraryofbabel.info">Library of Babel</a> which theoretically contained every variation of every image that had ever been made or ever could be made, all of them locatable by entering just the right series of characters. By this may you contemplate the variation of the 600 million parameters. But even with positive and negative prompts, word emphasis and de-emphasis, prompt editing and all kinds of theorising and experimentation with different words in different combinations, mere language was found wanting as the sole instrument for navigating latent space. It’s possible to get just what you want from prompting alone, in the same way that it’s possible to find a billionaire’s BTC private key <a href="https://allprivatekeys.com/all-bitcoin-private-keys-list">in this index</a>. <h1 id="models"> <a class="heading-anchor" href="#models">#</a> Models </h1> Since my last post, Stability AI has released several models that improve on that initial <code>model.ckpt</code> (v1.4), most recently <a href="https://stability.ai/blog/stable-diffusion-sdxl-1-announcement">Stable Diffusion XL 1.0</a>, a model trained on 1024x1024 images and thus capable of greater resolutions and detail right out of the gate. Where not otherwise noted, I’ve used this model to generate the images in this post. But rather than wait on Stable Diffusion’s official model releases, people started doing their own training. This took a few different forms. One immediate demand was for the ability to add subjects to the training data – it’s trivial to generate recognisable pictures of celebrities with the base model, but what if you want to generate pictures of yourself or your friends (or <a href="https://twitter.com/davidyat_es/status/1614672009878814723">your cat</a>)? Enter <a href="https://textual-inversion.github.io/">Textual Inversion</a>, followed by <a href="https://dreambooth.github.io/">DreamBooth</a>, followed by <a href="https://bennycheung.github.io/stable-diffusion-training-for-embeddings">hypernetworks</a> and <a href="https://github.com/cloneofsimo/lora">LoRA</a>. Some of these approaches create new model files, while others create small files called embeddings that can be used with different base models. But there’s a lot more to be done with additional training than just adding new faces. <a href="https://civitai.com">CivitAI</a>, the main repository for models and embeddings, contains a plethora of models trained to achieve specific styles and additional fidelity for particular subjects. Most of these are based on Stable Diffusion 1.5, and the most popular ones produce much better results in their specific areas (digital art, realism and anime, mostly). There are even embeddings for facial expressions and getting <a href="https://civitai.com/models/3036/charturner-character-turnaround-helper-for-15-and-21">multiple angles of the same character in one image</a>. Some technically inclined visual artists have even trained models to produce works in their own style. <figure> <figcaption> Some models available on CivitAI </figcaption> </figure> Training can be quite computationally intensive, and also requires curation of your own set of captioned images with the right dimensions, so it’s not for everyone. In some sense, training your own models and embeddings is just a more powerful way of using Stable Diffusion’s image input functionality, img2img. Which brings us to… <h1 id="image-prompting"> <a class="heading-anchor" href="#image-prompting">#</a> Image prompting </h1> As we covered in my original SD post, img2img takes an input image in addition to a prompt, and uses the input image as the initial noise for generation. There are many different uses for this: <ul> <li>Generating variations of existing images, either with altered prompts or masking, or just different randomness. <div class="gallery"> <figure class="sdworkflow"> </figure> <figure class="sdworkflow"> </figure> </div> </li> <li>Generating detailed pictures from crude drawings. <div class="gallery"> <figure class="sdworkflow"> </figure> <figure class="sdworkflow"> </figure> </div> </li> <li>Changing an image to a different style. <div class="gallery"> <figure class="sdworkflow"> </figure> <figure class="sdworkflow"> </figure> </div> </li> <li>Transferring a pose from one character to another. <div class="gallery"> <figure class="sdworkflow"> </figure> <figure class="sdworkflow"> </figure> </div> </li> </ul> Unfortunately, vanilla img2img doesn’t know which of these you’re trying to achieve with a given input, and the only extra variable you get to control is the number of diffusion steps to perform on top of the input image (often termed denoising). One solution to this problem is careful prompt engineering. Another solution is changing the way img2img works. Last year, various methods were proposed for tweaking img2img to get more consistent results: <a href="https://github.com/bloc97/CrossAttentionControl">Cross Attention Control</a>, <a href="https://github.com/salesforce/EDICT">EDICT</a> and <a href="https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#img2img-alternative-test">img2img alternative</a> are the ones I know of. The idea with all of these methods was to reverse the diffusion process and find the noise that would produce a given image, and then alter that noise to generate precise, minimal changes. <figure><figcaption> <a href="https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#img2img-alternative-test">img2img alternative example</a> </figcaption> </figure> When this works, it works really well, but I haven’t had a lot of success with it in my own experiments. The really useful and impressive advance in this space is <a href="https://github.com/lllyasviel/ControlNet">ControlNet</a>. ControlNet provides an additional input to the generation process in the form of a simplified version of a given image – this can take a variety of forms, but the most common are edge maps, depth maps and <a href="https://github.com/CMU-Perceptual-Computing-Lab/openpose">OpenPose</a> skeletons. This produces outputs that are much closer to the inputs used and allow us to specify which aspect of an input image we care about. It can even be used <a href="https://github.com/lllyasviel/ControlNet#guess-mode--non-prompt-mode">without text prompts</a>. To use ControlNet, we first create a simplified map of a given input image using a preprocessor (edge-mapper, depth-mapper, pose-extractor, etc) and then pass that as an input to the generation process by applying it to the developing image at each step, either for the whole process or a portion of it, depending on how strong you want the ControlNet’s influence to be. As we can see from the images below,<a href="https://davidyat.es/2023/10/19/latent-space/#fn:2" class="footnote-ref" role="doc-noteref">2</a> each ControlNet model captures a different aspect of the input image. The canny edge model is concerned with preserving lines, the depth model with preserving shapes, and the pose model concerned only with the human figure and its configuration of limbs.<a href="https://davidyat.es/2023/10/19/latent-space/#fn:3" class="footnote-ref" role="doc-noteref">3</a> <div class="gallery-three"> <figure><figcaption> ControlNet input image </figcaption> </figure> </div> <div class="gallery-three"> <figure class="sdworkflow"><figcaption> Canny edge ControlNet </figcaption> </figure> <figure class="sdworkflow"><figcaption> Depth ControlNet </figcaption> </figure> <figure class="sdworkflow"><figcaption> OpenPose ControlNet </figcaption> </figure> </div> ControlNets can be combined, and additional models allow you to capture details such as colours, brightness and tiling patterns. The latter two have been combined to great effect to generate creative images that double as <a href="https://arstechnica.com/information-technology/2023/06/redditor-creates-working-anime-qr-codes-using-stable-diffusion/">QR codes</a>, as well as compositions like <a href="https://www.reddit.com/r/StableDiffusion/comments/16ew9fz/spiral_town_different_approach_to_qr_monster/">Spiral Town</a>. <figure> <figcaption> </figcaption> </figure> <a href="https://ip-adapter.github.io/">IP-Adapter</a> is yet another image-input technique, in which input images are converted into tokens instead of being used as initial noise. This is very a good way of producing images based on a specific face without losing its likeness (and far quicker than training a DreamBooth model or LORA).<a href="https://davidyat.es/2023/10/19/latent-space/#fn:4" class="footnote-ref" role="doc-noteref">4</a> <div class="gallery"> <figure> </figure> <figure> </figure> <figure> </figure> </div> It’s also great for combining images. <div class="gallery"> <figure><figcaption> Donald Trump </figcaption> </figure> <figure><figcaption> Elon Musk </figcaption> </figure> <figure class="sdworkflow"><figcaption> Donald Musk </figcaption> </figure> </div> All of the images above are one-shot generations for the sake of illustration, but any synthographer worth their burnt-out graphics card knows that to execute on a vision, you’re going to need multiple iterations. You might get a first pass from pure prompting, then use ControlNet to make some variations that preserve a pose from the original, then fix up some individual details with inpainting (or manual image editing), then expand it with outpainting, then combine it with something else through IP-Adapter, and on and on it goes. You might generate a scene’s background separately from its foreground elements, stick them together in an image editor, and then do a low denoising img2img pass to blend it all together. None of this requires as much effort as, y’know, actually learning to draw or take good photographs, but it takes some ingenuity and experience (and more than a little luck with RNG) to coax attractive results from the machine. Or so it seemed. <h1 id="dall-e-3"> <a class="heading-anchor" href="#dall-e-3">#</a> DALL-E 3 </h1> Despite being first on the scene, OpenAI’s DALL-E 2 soon lost ground to newer versions of Midjourney and Stable Diffusion. As OpenAI focused their efforts on ChatGPT, their image generator was left to languish. But that all changed at the start of this month, with the public release of DALL-E 3, first replacing DALL-E 2 as the engine behind <a href="https://www.bing.com/images/create/">Bing Image Creator</a>, and then as a ChatGPT plugin for users on a paid plan. Just like its predecessor, DALL-E 3 takes in a natural language text prompt and generates four different images from it. It’s a lot better, both in terms of image quality and especially in terms of prompt understanding. Let’s compare outputs from the prompt I used at the start of my previous post: <pre tabindex="0"><code>the city of cape town as a martian colony artist's conception landscape painting </code></pre><div class="gallery"> <figure><figcaption> DALL-E 2 </figcaption> </figure> <figure><figcaption> DALL-E 3 (Bing Image Creator) </figcaption> </figure> </div> And what’s more, if you use DALL-E 3 with ChatGPT, it will write multiple detailed variations of your prompt and use those to generate the four images.<a href="https://davidyat.es/2023/10/19/latent-space/#fn:5" class="footnote-ref" role="doc-noteref">5</a> Here are a couple of results from that, with ChatGPT’s prompts as their captions. <figure><figcaption> Artistic conception of Cape Town on Mars. The familiar landmarks of Cape Town are juxtaposed against the alien backdrop of the Red Planet. Bio-domes shelter the city, and advanced infrastructure hints at a thriving Martian colony. The vastness of the Martian landscape stretches out, with dunes and rock formations. </figcaption> </figure> <figure><figcaption> Futuristic landscape painting of Cape Town on Mars. The city’s skyline rises from the Martian surface, with its iconic landmarks protected by large transparent shields. Martian rovers and settlers can be seen, and the vast red desert of Mars extends to the horizon. </figcaption> </figure> <a href="https://www.creativebloq.com/news/openai-dalle-3-ai-image-generator">Some have suggested that this spells an end for prompt engineering</a>, but I’m not so certain. ChatGPT’s expansions create attractive, detailed images, but it may extrapolate things that weren’t intended by the prompter. For playing around and making images to post on social media or in blogposts like this one, that’s fine, but it doesn’t make executing on a specific vision much easier. Stable Diffusion XL has some limited ability to generate images containing coherent text, a word or two at most. More often than not, this means that random words from the prompt will be written somewhere on the image, necessitating the negative prompt “text”. DALL-E 3, on the other hand, will happily generate whole sentences with one or two errors at most, and only when explicitly asked to write something. <figure class="sdworkflow"><figcaption> Stable Diffusion XL </figcaption> </figure> <figure><figcaption> DALL-E 3 in ChatGPT </figcaption> </figure> DALL-E 3’s ability to create pleasant, mostly coherent images with plenty of fine detail from text prompts alone (note the application of the word “frantic” in the image above) makes many of the Stable Diffusion techniques above feel like kludgey workarounds compensating for an inferior text-to-image core. Not that there isn’t a place for different types of image inputs, custom training, and funky prompt manipulation, but all of these techniques and tools would be far more powerful and useful alongside a text-to-image model as powerful as DALL-E 3. The potential in pure text prompting is far from exhausted, and OpenAI’s competitors have their work cut out for them. <figure><figcaption> A clown doing yoga, no ControlNet needed </figcaption> </figure> It’s been suggested that Stable Diffusion is a lot weaker than it might otherwise be due to the low quality of much of the LAION dataset it was trained on – many captions do not correspond to image content and many images are small, badly cropped or bad quality. With the advent of <a href="https://openai.com/research/gpt-4v-system-card">vision</a> <a href="https://llava-vl.github.io/">models</a>, there’s a case to be made for using AI to recaption the whole dataset and training a new model on the result.<a href="https://davidyat.es/2023/10/19/latent-space/#fn:6" class="footnote-ref" role="doc-noteref">6</a> That would be a good start, but I think it’s also likely that OpenAI is leveraging the same secret sauce present in ChatGPT to understand prompts. As far as they’re concerned, generating pictures is a side-effect of the true goal: creating a machine that understands us. <div class="gallery-three"> <figure><figcaption> The Great Gatsby pixel art </figcaption> </figure> <figure><figcaption> The door into summer </figcaption> </figure> <figure><figcaption> A king seeks his robot’s counsel, circa 1200 AD </figcaption> </figure> <figure><figcaption> A neat logo </figcaption> </figure> <figure><figcaption> Zombies out for a jog </figcaption> </figure> <figure><figcaption> Not sure what this Pixar film’s about </figcaption> </figure> </div> <div class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1"> Most of the time, at least. Diffusion models don’t actually understand English, it just feels like they do. <a href="https://davidyat.es/2023/10/19/latent-space/#fnref:1" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:2"> These were generated with SD 1.5 models, as I find most of the current SDXL ControlNet models somewhat lacking. <a href="https://davidyat.es/2023/10/19/latent-space/#fnref:2" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:3"> While the pose model captures the general configuration of human limbs, SD is not smart enough to know which limbs are which, or how many of each one a human should have. It gets especially confused with poses in which the person’s hands touch the floor – many discarded generations of the final image showed three-legged clowns. <a href="https://davidyat.es/2023/10/19/latent-space/#fnref:3" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:4"> Here again, I’ve used SD 1.5 because of model compatibility. <a href="https://davidyat.es/2023/10/19/latent-space/#fnref:4" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:5"> It’s possible that Bing Image Creator also does this under the hood. <a href="https://davidyat.es/2023/10/19/latent-space/#fnref:5" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:6"> Update (2023-10-20): according to <a href="https://cdn.openai.com/papers/dall-e-3.pdf">this paper</a>, this is pretty much exactly how DALL-E 3 was built. <a href="https://davidyat.es/2023/10/19/latent-space/#fnref:6" class="footnote-backref" role="doc-backlink">↩︎</a> </li> </ol> </div> <a href="mailto:?subject=RE: Adventures%20in%20latent%20space">Reply via email</a> </article> <article> <h1>Review: The Excavation of Hob's Barrow</h1> Fri, 11 Aug 2023 14:00:51 +0200 The Excavation of Hob’s Barrow is a British folk horror point-and-click adventure game from <a href="https://cadgames.weebly.com/">Cloak and Dagger</a>, published by <a href="http://www.wadjeteyegames.com/">Wadjet Eye</a>. The latter is a long-time favourite studio of mine, as <a href="/tags/adventure-games/">previous game reviews</a> on this blog will attest.<a href="https://davidyat.es/2023/08/11/review-hobs-barrow/#fn:1" class="footnote-ref" role="doc-noteref">1</a> The protagonist of Hob’s Barrow is one Thomasina Bateman, an antiquarian primarily interested in digging up barrows (burial mounds) across England. The game is set in Bewley, a remote Midlands village surrounded by desolate moors, sometime during the Victorian Era. Thomasina<a href="https://davidyat.es/2023/08/11/review-hobs-barrow/#fn:2" class="footnote-ref" role="doc-noteref">2</a> receives a letter from a gentleman named Leonard Shoulder, who invites her to the village to excavate a local barrow. Upon her arrival, Thomasina meets a cast of eccentric and unwelcoming townsfolk, none of whom seem willing or able to tell her much about the supposedly famous landmark or her elusive contact Mr Shoulder. Being a stubborn sort, she presses on anyway, determined to find the barrow and uncover its secrets. <figure> <figcaption> The eastern edge of Bewley. </figcaption> </figure> <h1 id="gameplay"> <a class="heading-anchor" href="#gameplay">#</a> Gameplay </h1> The game takes place over a few days, during which Thomasina must explore the village and complete errands for its inhabitants so she can locate and ultimately excavate the titular Hob’s Barrow. The interface is standard for modern adventure games – left-click to interact, right-click to examine, and mouse over the top of the screen to see the inventory. Modern conveniences such as a fast-travel map and to-do list/quest log are welcome inclusions, and there’s also a neat time-saving mechanic that allows you to instantly exit a room by double-clicking. Thomasina is an upper-crust lady of means, but for gameplay reasons runs out of cash early on, and so must embroil herself in some classic adventure game puzzle solving to get anything she wants – the plot advances through such staples as elaborate NPC distraction puzzles and long chains of fetch quests. <figure> <figcaption> There’s also a bit of digging, as one might expect from the title. </figcaption> </figure> Despite my love of the genre, I must confess a distaste for these sorts of puzzles, at least outside of comedic works like Sam & Max or Flight of the Amazon Queen. I’ve previously praised other Wadjet Eye games (<a href="/2019/02/03/review-unavowed/">Unavowed</a>, <a href="/2016/03/30/review-technobabylon/">Technobabylon</a>, <a href="/2014/04/28/review-blackwell-epiphany/">Blackwell Epiphany</a>) for puzzles that felt like integrated parts of the story and game world. On this count Hob’s Barrow is not a complete success. The better puzzles are used to explore the game’s cast and deepen Thomasina’s relationships with them, and contain some touching character moments. Other puzzles feel a lot like padding. One chain, involving a broken fiddle bow, exists only to facilitate a set piece that, while appropriately unsettling, feels poorly integrated into the overall story. Another chain, involving a pail of milk, is <a href="https://seths.blog/2005/03/dont_shave_that/">yak-shaving</a> exemplified and contains an NPC-distractions puzzle that felt a little too comical for the game’s general tone. But there are enough strong moments and interesting plot twists in this segment of the game for me to overlook even the weaker puzzles. The final section of the game, which involves actual barrow excavation, consists of a different sort of traditional adventure game puzzle, but one which gels perfectly with the game’s tone and plot, and actually feel like the sort of thing a proto-archaeologist in this kind of story should be doing. Though I wouldn’t go so far as to say the game would have been better with more focus on this section – there’s an undeniable charm to the village of Bewley and its inhabitants that Hob’s Barrow would be much poorer without. <figure> <figcaption> A garden in the sky. </figcaption> </figure> My first playthrough took about ten hours. I’m compulsive about examining everything and exhausting all dialogue options, so that’s probably close to the upper bound of how long you can spend with this game if you don’t get seriously stuck. <h1 id="presentation"> <a class="heading-anchor" href="#presentation">#</a> Presentation </h1> At first blush, the graphics are standard for a Wadjet Eye game – 2D pixel art that mimics VGA adventure games of the early-to-mid 1990s. It’s beautifully done, with many background animations that make every scene feel alive. More modern-looking particle and lighting effects are used to good effect for rain and fog. <figure> <figcaption> Bewley at night. </figcaption> </figure> The game distinguishes itself with detailed close-up animations for dramatic and creepy moments. While none of them are out-and-out jumpscares, they’re very unsettling. <figure> <figcaption> Though a few of them are quite pleasant. Incidentally, this is the least creepy image of this cat in the whole game. </figcaption> </figure> The voice acting is up to Wadjet Eye’s normal excellent standard, and it’s a welcome change of pace to hear British accents in one of these games. “There’s nowt for you ’ere, lass,” will be echoing in my brain for at least a few more weeks. The music is understated and contributes to the atmosphere. Apart from background music, there are a couple of musical performances in the game, both of which are great, even if one feels a touch modern for Victorian England. <h1 id="story"> <a class="heading-anchor" href="#story">#</a> Story </h1> Vague spoilers follow. As is common for the horror genre, and especially for works inspired by Lovecraft, Hob’s Barrow does not have a happy ending. Choices made during gameplay lead to variations in character dialogue, but there’s only a single ending – nothing you can do will save Thomasina from the fate she hints at in the game’s sombre narration, framed as a letter to her mother written after the game’s events. It reminds me of another adventure game about a highly characterised adventurer archaeologist excavating a tomb. <figure class="image-right"> </figure> Infocom’s Infidel, released in 1983, was likely the first adventure game to feature a strongly defined player character who meets an unpleasant end after the player “wins”. In that game, you take on the role of an assistant to a famous and successful adventurer-archaeologist in the cast of Indiana Jones. After receiving a call from a wealthy benefactor wanting to sponsor an expedition to an Egyptian pyramid while your boss is out, you decide to take on the job alone, without informing your boss about it. This doesn’t go well, and the game proper begins like this: <blockquote> You wake slowly, sit up in your bunk, look around the tent, and try to ignore the pounding in your head, the cottony taste in your mouth, and the ache in your stomach. The droning of a plane’s engine breaks the stillness and you realize that things outside are quiet – too quiet. You know that this can mean only one thing: your workmen have deserted you. They complained over the last few weeks, grumbling about the small pay and lack of food, and your inability to locate the pyramid. Ander after what you stupidly did yesterday, trying to make them work on a holy day, their leaving is understandable. </blockquote> And after some hours of deciphering pseudo-hieroglyphics, solving mechanical puzzles, and classic adventure game treasure hunting, it ends like this: <blockquote> You lift the cover with great care, and in an instant you see all your dreams come true. The interior of the sarcophagus is lined with gold, inset with jewels, glistening in your torchlight. The riches and their dazzling beauty overwhelm you. You take a deep breath, amazed that all of this is yours. You tremble with excitement, then realize the ground beneath your feet is trembling, too. As a knife cuts through butter, this realization cuts through your mind, makes your hands shake and cold sweat appear on your forehead. The Burial Chamber is collapsing, the walls closing in. You will never get out of this pyramid alive. You earned this treasure. But it cost you your life. </blockquote> This was highly controversial and arguably led to Infidel’s commercial underperformance. While adventure game players would have been no strangers to a grisly death, for it to be presented as the game’s winning ending was a highly unwelcome surprise. It raised new questions about storytelling in games. Could “winning the game” be reconciled with experiencing an unhappy ending? And how should one reconcile the tragic arc of a highly characterised protagonist with the player’s agency? Nearly forty years later, the ending of Hob’s Barrow has been similarly controversial, though a much better game and story. Infidel tells the story of an irredeemable bastard getting his comeuppance, whereas the protagonist of Hob’s Barrow is an immensely likeable character whose undoing results from her love for her father. While both stories have their endings firmly locked in from the start – Infidel because it starts after your character has already doomed himself to a lonely death in the desert, and Hob’s Barrow through its framing device – there’s a sense that you might have more control over Thomasina’s fate because you can direct her actions from the moment she arrives in Bewley. The developers could have included an option somewhere to make her wait for the next train out of town and leave the burrow unexcavated, though this wouldn’t make for much of a story. Many games with well-defined protagonists have been made in the last forty years, and the compromise between this and player choice seems to be giving the player choices between different in-character options. I think most modern players can accept this, and that’s very much the case in Hob’s Barrow. When you encounter a drunkard near the start of the game, you choose whether Thomasina slaps or fobs him off verbally, but there’s no option for her to kill him. <figure> <figcaption> Charming. </figcaption> </figure> We can apply the same logic to the overall narrative – Thomasina is driven by compulsions at the core of her personality to excavate the barrow despite the warnings and suspicious behaviour of the townsfolk – to run away would be out of character. I think this is what the writers were going for, and while I’m not opposed to it in principle, a couple of missteps make it less effective than it might have been. Just before the game’s final segment, a friendly character indicates that he has something important to tell Thomasina that he’s been struggling to remember. The contents of his message are shown to us in a flashback sequence, which reveals a very subtly hinted-at twist. We then return to the present, where another character appears on the scene and shoos away the messenger. It is unclear whether we’re supposed to understand that he told Thomasina the contents of the flashback, or that he was shooed away before he could tell it to her. Neither interpretation is particularly satisfying. If Thomasina is supposed to be ignorant of the flashback’s events, then it doesn’t fit into the framing device of the letter she’s writing. Besides that, there’s something extremely frustrating about having to direct a character’s actions while knowing that they’re missing an important bit of information you have no way of communicating. I don’t think dramatic irony works in an interactive story. On the other hand, if Thomasina is supposed to have heard the message, then it’s bizarre that she has no reaction to it and never mentions it again. There’s a difference between having your fate bound up in a tragic personal flaw and just being stupid – imagine a version of Romeo and Juliet where Romeo receives the letter from Friar Laurence but still kills himself. In the game’s commentary, writer and programmer Shaun Aitcheson mentions that the flashback was added pretty late in the game’s development, after it became clear that testers needed something more to understand the story. I think a better approach would have been to pepper the game with additional foreshadowing and only reveal the twist at the end. Sure, there probably would have been accusations that it came out of nowhere, but this game’s ending was always going to be controversial. Any twist ending, no matter how carefully written, will come off as incongruous to some and obvious to others. For a tragic story about a flawed protagonist to work in a game, it’s essential that the player comes as close to that character as possible and really understands their perspective, to the extent of being blinded by it. Hob’s Barrow undermines itself by stepping away from that to telegraph its ending. <h1 id="conclusion"> <a class="heading-anchor" href="#conclusion">#</a> Conclusion </h1> Despite my quibbles with some of the game’s puzzles and storytelling choices, I had a great time with it overall and would heartily recommend it to fans of indie adventure games and folk horror. That said, players who prioritise their own agency and ability to choose between multiple endings may want to steer clear. <figure> <figcaption> </figcaption> </figure> <div class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1"> When I was looking into Cloak and Dagger’s previous work, I discovered that they were behind <a href="https://en.wikipedia.org/wiki/A_Date_in_the_Park">A Date in the Park</a>, a free-to-play adventure game that I rather disliked and left a snippy negative Steam review for. This game is much better! <a href="https://davidyat.es/2023/08/11/review-hobs-barrow/#fnref:1" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:2"> An old-fashioned female name often shortened to and now subsumed by variants of Tamsin. <a href="https://davidyat.es/2023/08/11/review-hobs-barrow/#fnref:2" class="footnote-backref" role="doc-backlink">↩︎</a> </li> </ol> </div> <a href="mailto:?subject=RE: Review%3a%20The%20Excavation%20of%20Hob%27s%20Barrow">Reply via email</a> </article> <article> <h1>Review: The Lost Room</h1> Sun, 06 Aug 2023 18:53:50 +0200 <a href="https://en.wikipedia.org/wiki/The_Lost_Room">The Lost Room</a> is a television miniseries that ran on the Sci-Fi Channel in late 2006 to low ratings and mixed reviews. Intended as a launchpad for a longer series that was never made, it instead became a minor cult classic and (perhaps) the inspiration for a whole subgenre of internet horror fiction. Recently, I watched it for the second time. The protagonist is a policeman, Joe Miller, who comes into possession of a motel room key with special properties. Turn the key in any compatible lock on any door, and that door will open into an empty motel room. If you enter the room and close the door, it won’t open into the same place you left behind. If you visualise another door anywhere in the world while holding the key, the door will open there. Not a bad way to get around. <figure> <figcaption> The room in question </figcaption> </figure> Early on in the series, it’s revealed that the key is one of many seemingly mundane objects with supernatural abilities, ranging from powerful to totally useless. There’s a bus ticket that teleports people to a bus stop in New Mexico, a pen that microwaves anything it touches, and a wristwatch that… hard boils eggs. There are about one hundred of these objects, and they all came from the motel room. They cannot be destroyed. When combined, they produce even stranger effects. As Joe discovers more about the objects and the titular room, he comes into contact (and often conflict) with a myriad of different groups and individual collectors, all pursuing their own agendas: some are mere collectors, while others believe the objects to have divine significance. Before long, Joe crosses the wrong group and his own quest to learn about the objects turns from an idle curiosity to a matter of extreme personal importance. <figure> <figcaption> Joe’s primary facial expression throughout the series </figcaption> </figure> Though it’s only three 90-minute episodes long, The Lost Room packs more backstory and plot developments into that runtime than lesser shows manage in entire seasons. The effects and interactions of the objects and rules that govern them are well thought-out and consistent, making most of the twists gratifying rather than arbitrary. The premise remains fresh even nearly twenty years later – this is a story about supernatural events that doesn’t lean on religion, witchcraft, aliens, folklore, or any other common tropes, choosing instead to make its own distinctly modern mythology. The worst criticism I can make is that the romantic subplot comes off as wooden and entirely perfunctory. The miniseries ends with a few loose ends and an obvious sequel hook, belying the creators’ intention to expand it, but still tells an essentially complete story and won’t leave you feeling ripped off.<a href="https://davidyat.es/2023/08/06/review-the-lost-room/#fn:1" class="footnote-ref" role="doc-noteref">1</a> It’s often said that the Velvet Underground’s first album only sold a few thousand copies, but everyone who bought a copy started a band. The Lost Room appears to have had a similar effect on the world of internet horror fiction. Either that, or there was a case of creative synchronicity going around at the time. The series aired in December 2006, and in early 2007, both <a href="http://www.scp-wiki.net/scp-173">SCP-173</a> and <a href="http://web.archive.org/web/20200920201147/http://theholders.org/?The_Holder_of_the_End">The Holder of the End</a> appeared on 4chan. Both of these pieces spawned long-running collaborative internet horror fiction projects built around numbered lists of objects with strange properties – namely, SCP and The Holders. The latter, with its exhortation that the objects “must never come together” and <a href="http://web.archive.org/web/20200918134914/http://theholders.org/?Category:Legion%27s_Objects">a collector named Legion</a> is especially synchronistic. <figure> <figcaption> In any city, in any country, go to any mental institution or halfway house you can get yourself to. </figcaption> </figure> You can get the gist of both projects by reading their first entries. <a href="http://www.scp-wiki.net/scp-173">SCP-173</a> is written like an internal memo or database record describing a numbered object. It starts with Special Containment Procedures, a description of how the object is to be contained – this builds up some tension and curiosity about what the object is and does, and why these measures are necessary. The containment procedures are followed by a terse description of the object, which is horrifying and bizarre, but detailed in clinical and meticulously precise language. A format like this lends itself to imitation, and thus SCP was born. Each of the <a href="https://scp-wiki.wikidot.com/scp-series">hundreds of entries</a> describes a bizarre object or creature that has been captured and secured by the SCP Foundation, in the style of the original SCP-173. Longer entries add logs of experiments and interviews, and liberally employ redaction to enhance the “top secret” feeling of the documents. The Holders series is written in a looser style, evoking urban legends. The first Holders entry, <a href="http://web.archive.org/web/20200920201147/http://theholders.org/?The_Holder_of_the_End">The Holder of the End</a>, is shlocky and slightly overwritten, but kinda evocative all the same. Written in second person, it provides a list of instructions for retrieving an object from a mysterious individual who can be found in “any mental institution or halfway house you can get yourself into”,<a href="https://davidyat.es/2023/08/06/review-the-lost-room/#fn:2" class="footnote-ref" role="doc-noteref">2</a> mostly consisting of things to say and things to avoid doing. The prize for following these instructions correctly is one of 538 objects which every entry reminds us “must never come together”. With a line like “That object is 1 of 538”, The Holder of the End was begging for imitators. Most of the additional ~537 follow the same format, beginning with “In any city, in any country” and ending with “They must never come together.” As the entries go on, the elaborate rituals become more and more elaborate, as do the gruesome consequences for failure at any stage. But at a certain point, this becomes more comical than frightening. <blockquote> Knock on the second door from the left twelve times in a perfect dodecahedron pattern and then cough at precisely 86dB. If you’re even one decibel off, a demon will appear and drag you to hell by your nostrils, where you will be tortured for all eternity. Every day your ears will be pulled off and your tongue cut out, only to grow back in a more hideous and mangled form the next day. </blockquote> I made up the above quote to avoid picking on any particular entry, but a lot of the stories are like this. The longer Holders stories will have many of these kinds of paragraphs, showing plenty of enthusiasm but none of the restraint required for good horror writing. The lower quality SCPs often fail in similar ways, eschewing the terseness and fragmented feeling that is essential for the series’s atmosphere in service of the writer’s enthusiasm for sharing their cool monster. What makes the original pieces in both series work for me is all that is left unsaid. The best microfiction implies infinitely more than it explicitly describes. SCP-173 hints at the existence of other strange objects, and of a vast bureaucracy that keeps them hidden from the world. The Holder of the End hints at mystical hidden places in everyday locations, and at people irrationally driven to brave terrible danger in service of collecting objects that should not be collected. To go back to The Lost Room, much of the allure and mystery of the objects lies in their randomness and inexplicability. Both qualities inevitably diminish as the work continues. So perhaps it’s for the best that it ends where it does. <div class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1"> The same can unfortunately not be said of the creator’s more recent television pilot, <a href="https://en.wikipedia.org/wiki/Parallels_(film)">Parallels</a>, which is deceptively packaged as a movie on some streaming services. I enjoyed it, but I guess the network wasn’t looking for an updated <a href="https://en.wikipedia.org/wiki/Sliders_(TV_series)">Sliders</a>. <a href="https://davidyat.es/2023/08/06/review-the-lost-room/#fnref:1" class="footnote-backref" role="doc-backlink">↩︎</a> </li> <li id="fn:2"> These sorts of lists are a common creepypasta form outside of The Holders, being mostly elaborate variations on classic folklore rituals such as “go into a candle-lit bathroom at midnight and say Bloody Mary three times in the mirror”. <a href="https://davidyat.es/2023/08/06/review-the-lost-room/#fnref:2" class="footnote-backref" role="doc-backlink">↩︎</a> </li> </ol> </div> <a href="mailto:?subject=RE: Review%3a%20The%20Lost%20Room">Reply via email</a> </article> </main></body></html>