Research first

I’ve quoted the story of Master Wq and the Markdown acolyte before on this blog:

A Markdown acolyte came to Master Wq to demonstrate his Vim plugin.

“See, master,” he said, “I have nearly finished the Vim macros that translate Markdown into HTML. My functions interweave, my parser is a paragon of efficiency, and the results nearly flawless. I daresay I have mastered Vimscript, and my work will validate Vim as a modern editor for the enlightened developer! Have I done rightly?”

Master Wq read the acolyte’s code for several minutes without saying anything. Then he opened a Markdown document, and typed:
:%!markdown
HTML filled the buffer instantly. The acolyte began to cry.

In addition to showing off one of the little features that makes Vim such a great editor, this little parable illustrates a broader point about modern software development that’s essential if you want to achieve practical things without wasting a lot of time. The point being that you if you have a problem that appears to call for a code solution, you should always do a thorough evaluation of existing solutions before writing one yourself.

I’ve made the mistake of not doing so a few times myself. Most recently, I was working on a project that involved a fair bit of browser automation. Or rather, I looked at the problem, thought about similar problems I’d solved in the past and what I’d done to solve them, and come to the conclusion that browser automation was the correct approach. I decided all of this without making a single web search or questioning any of the assumptions I was making about the problem space.

In the abstract, if you need to write code to interface with a web-based system of some kind, there are four general approaches you can take, depending on the affordances and restrictions of the system and its ecosystem. In ascending order of both implementation difficulty and brittleness of the solution, these are:

Find a library in your chosen programming language that wraps the target system’s API. This is the easiest and best approach, but it depends on the existence of a such library and such an API.
Interact with the target system’s API through raw requests. This depends on the existence of such an API, and will be more or less difficult depending on the API’s documentation.
Scrape the target system’s web pages. You have to do this if there isn’t an API that can give you the data you want in a nice format.
Use browser automation to simulate a real user interacting with the system. You have to do this if there’s too much JavaScript for you to scrape anything meaningful.

In this instance, I jumped directly to number 4 without properly interrogating whether 1 through 3 were possible, just because previous experience with similar but not identical problems told me they weren’t.

After wrestling with time delays and picking through HTML soup for a few days, I had a janky version of what I wanted that basically worked, provided the internet speed stayed roughly constant and you were prepared to wait a few minutes for the thing to run. It was at this point that I started searching around for alternate methods to achieve at least some of the things I’d implemented. I imagined that I might be able to engineer some kind of hybrid solution, perhaps between (4) and (2).

Imagine my shock when I found a relevant tutorial that used method (1). There was a nice library for doing everything I wanted to do, written in the language I was using. I reimplemented everything in a few hours, using less code, and ended up with a solution that was infinitely faster and more reliable, allowing me to proceed with other aspects of the project far quicker than I’d expected to.

Lesson learnt: before writing any code, do some research. Search for what you’re trying to do, in whole and in part. Anticipate some of the searches you might do while writing your code, and make those in advance. Make sure that you eliminate the easy solutions before moving onto the hard ones. Don’t reimplement %!markdown in Vimscript, and don’t do with browser automation what you can accomplish with a library.

Justin Sherrill said on 0 July 2021:

Mini-theme: maintenance. Rethinking Repair. (PDF, via) Maintenance and Care. Same topic, but with interesting pictures. (also via) Ise Jingu and the pyramid of enabling technologies. About process knowledge. AnyDice, dice probability calculator. (via) PAGNIs: Probably Are Gonna Need Its. 50 Years of Text Games: Intermission. A little behind-the-scenes. Ethernet network cables can go bad over time, with odd symptoms. The Age of Software: An introduction. New Old Game: Gravi-o-roids! Research First. Bo