Syntax highlighting with render hooks

 

After my previous post, I got to wondering if there were any other ways I could use Markdown render hooks to improve this blog’s appearance and editing experience. One thing that immediately came to mind was syntax highlighting for obscure languages.

Hugo uses Chroma for syntax highlighting, which supports these languages. Support is quite comprehensive, even including Go HTML Templating syntax and Holy C. But it doesn’t support everything. If you need highlighting for another language, you’ll need to implement it in a PR to Chroma and wait for it to get merged, and then incorporated into Hugo. Or at least, that is the proper way to do it.

The hacky way to do it is via code block render hooks and liberal use of replaceRE. This may be more appropriate if:

  1. The language you want highlighting for is one of your own making, maybe not even intended for public release.
  2. The snippets you want to highlight are simple and do not require a fully functional lexer.
  3. You just want a local solution that doesn’t rely on getting PRs approved and waiting for software updates.

As Hugo supports language-specific codeblock render hooks, it almost feels as though it’s encouraging you to do this. I’ve written two: one for Gruescript1, used mainly in this post:

# A ROOM
room before_barricade You're standing in front of an enormous barricade made of junk room.
prop display Before the junk barricade
prop year Unknown year
tags start

# AN OBJECT
thing sword Courier's Blade
desc This is the first job you've had where a massive sword is considered required equipment. 
carried
tags portable

# A PERSON
thing fletcher shifty character
name shifty character
desc The man looks to be in his sixties, with a white beard and a mane of grey hair.
tags alive male conversation
loc wooden_shed
prop callme Fletcher
prop start_conversation "You must be the courier," says the man, looking you up and down. "Name's Fletcher." Your collection tag mentions a Darius Fletcher – this must be him.
prop end_conversation "G'bye then."

…and one for Inform 72, used mainly in this post:

"The Lady and the Tiger"

The lady is on top of the tiger.
The tiger is in a room.
"This lady hails from Niger
Niamey, I presume."

The lady wears a smile.
Her arm is in a sling.
Riding is an action
applying to one thing.

"The lady on the tiger
Up and down they sped."
After the lady rides the tiger
Now it wears the smile instead.

The approach I took was as follows: first, start with a render hook that replicates the HTML of a Chroma-highlighted codeblock (in layouts/_default/_markup/render-codeblock-mylang.html):

{{ $code := .Inner }}

{{ $highlightedCode := $code }}

<div class="highlight">
    <pre tabindex="0" class="chroma">
        <code class="language-MYLANG" data-lang="MYLANG">
        {{ $highlightedCode | safeHTML }}
        </code>
    </pre>
</div>

Next, add a bunch of replaceRE lines right after the initial assignment of $highlightedCode that search for keywords and syntactical constructions and wrap them in <span>s with the appropriate Chroma class. For example, here’s some code for highlighting strings:

{{ $highlightedCode := replaceRE `\"([^\"]*)\"` `<span class="s">"$1"</span>` $highlightedCode }}

For this to work, your site needs to be set up to use highlighting classes and a stylesheet. You can use the comments in the Hugo-generated stylesheet to determine which class names to use for which elements (or just go by which colour combos you like).

There are obvious and severe limitations to using regular expressions to highlight code syntax. Go’s regex does not support lookarounds, and even if it did, that way lies madness. You can get around some problems by rearranging the order of your replaceREs, but there’s a hard limit to what you’ll be able to achieve without proper parsing. The specific cases I’ve implemented work well enough with the snippets I’ve used them for, but are bound to fail for complex code that includes a lot of nesting (string interpolation, commented-out code, etc). Even code that uses language keywords in strings will probably be highlighted incorrectly.

I’ve got one more render hook post coming after this one, showcasing an even crazier hack. It has to do with image render hooks. Stay tuned.


  1. Given that I wrote a Vim syntax highlighting file for the language, to not have it highlighted on my blog seemed a pity. ↩︎

  2. This was a very small one. Being intended to mimic natural language, Inform 7 uses very little highlighting. ↩︎


similar posts
webmentions(?)