Source: Weirdly enough, I didn't even have to make this image. It was already a part of the the big icon collection I subscribe to.

Obligatory warning: This post is about some of the inner technical workings of the site. If that doesn't tickle your fancy, you might find this unfathomably boring.

Sometimes, when I've exhausted all other interesting or productive things to do, I'll blow the dust off the code that makes this site hum along, and tweak it in one way or another. I've got to say, it's a pretty thankless task. Brandon in 2015 didn't really know what he was doing when it came to building web services in Golang, so the core codebase has historically been…unkempt, to say the least. To say the most, it looked like it was haphazardly slapped together by a team of monkeys banging on typewriters in the wee morning hours at the end of a hackathon.

But with a few more years of professional experience working with these tools, and a couple late nights refactoring the site into submission, making changes and adding functionality isn't nearly as painful these days. So I decided to sit down and fix one of the weaker parts of the site: the search functionality.

I've talked about my original search implementation before. I was (slightly) young(er) and ambitious, and I wanted to do everything myself (except the frontend, because HTML/CSS/JavaScript are the devil). The search code looked something like this:

func buildATerribleSearchIndex() map[string]map[int64]bool {
  // Our "search index" is a big map from words to the post IDs that they
  // appear in.
  invIndex := make(map[string]map[int64]bool)

  for _, post := range getAllOfThePublishedPosts() {
    for _, word := range extractAllTheWordsFromThePostButDoItPoorly(post) {
      // If the word isn't in the index yet, make a submap for it.
      if _, ok := invIndex[word]; !ok {
        invIndex[word] = make(map[int64]bool)
      }
      invIndex[word][post.ID] = true
    }
  }

  return invIndex
}

I removed some of the error-handling for brevity, and made the function names obnoxious because that's who I am, but this is basically what I had. There are a few major problems here:

  1. It's not persisted anywhere. It's stored in-memory. This was pure laziness on my part, and it means we have to rebuild the index every time our server dies. By virtue of the way App Engine manages server instances, it's pretty much guaranteed to kill your server randomly for fun and sport.
  2. We go through allllll the posts. Now I'm not the most prolific writer, but there are over 100 posts, and some of them are fairly long, and App Engine servers are small (by default), and the parsing/delimiting operation isn't exactly cheap. Reindexing all the posts before a page load was actually noticeably slow, on the order of a second.
  3. Index invalidation is hard. Say I want to edit a post (because someone points out I've spelled the word "truk" wrong, for example), how do I update this index? Well, I blow it away and start from scratch obviously! Instead of just removing references to the edited post and reindexing that, I reindex all the things. Again, because laziness.

So, how do we solve any/all of those problems? Well, we put our huge ego aside and leave searching up to people that know what they're doing, let's say, Google, for example. Sure enough, Google has a search API for Go on App Engine, which is exactly what we want. Ripping out my own search code and subbing in calls to Google's API was fairly straightforward, with each post mapping to a unique document.

The result is a faster search system that's easier for me to maintain. The search results are also more accurate, thanks to Google's smarter tokenization rules. And since I'm not indexing anything crazy large, it's also free for me to run. Definitely a big win for the site.

There was just one little problem left to fix: keyword highlighting. When someone searches for something, they're probably interested in finding the text where that keyword occurs. Previously, I had home-rolled a simple (but highly questionable) highlighting system on the backend that would that would find the keyword and wrap it in a <span class="highlight"></span>. This worked except when your keyword also matched something inside an HTML tag (because I write my posts in normal HTML), at which point it would just unceremoniously clobber the HTML and mangle the search results.

I figured, if I'm putting in the effort to make search better, I might as well make the whole experience actually work. So I sprinkled some mark.js magic on the search page in the frontend, and all was well with the world end-to-end search experience. Well, except for the fact that the search box got moved into obscurity at the bottom of the page after my last site redesign, but I'll probably fix that in a future update. Until then, feel free to scroll allllll the way to the bottom of the page and give the search functionality a try. As always, email me or drop a comment if you have any problems.

Next Up

Like I alluded to in my last post, I spent a lot of time the past few months not writing blog posts building different websites as Christmas presents for my friends and family. This was mostly an excuse to learn how to use modern frontend technologies for building responsive single-page applications, including actual frontend tests and minified/productionized assets. I'm currently in the process of rewriting the blog as two separate pieces: an API server that speaks a basic CRUD protocol to serve posts, questions, etc, and a static frontend that makes calls to this API server. If all goes well, nobody will even notice when I roll it out.

Previous Post Next Post

Subscribe

If you want to get emailed when I write a post, add your email here. Don't worry, you can always unsubscribe.