Public Learning

project-centric learning for becoming a software engineer

Week 3: A Recap


Continuing from last Sunday’s re-familiarization with React, I decided to create a small practical project. React is not hard to learn per se, most of its difficulty and complexity arises from handling bigger problems than what explanatory tutorials show.
I decided to write a reader application for HackerNews, my favorite place to read up on tech news (and one of the few remaining places on the internet where the comments are mostly still worth reading). I also knew that they provided an API for their content, so it felt like a good fit for what I tried to achieve. I did not know yet that this harmless endeavour would provide much of last week’s coding work …

There’s not a lot to talk about for the React part. I decided to use the “old” class-based syntax for now as I was already familiar with it. After watching the presentation for React’s new hook-based functional syntax (link), I felt a bit lukewarm about their new approach to do things.
I’m not a big fan of leveraging ES6 JavaScript’s classes for components as well, as it means dealing with the syntactic sugar for what is something different under the hood. But now they were replacing it with … some more syntactic sugar. One of the pain points the React team identified in their shiny presentation was that developers would get confused by the use of this when passing class methods as props into components. I mean, there is nothing magical happening, you just have to bind those method’s context to a specific component instance if you want to pass them into another one to be invoked there. Anyhow, I didn’t want to learn a completely new way of doing React as I’m not sure how much I’m going to need it, so I just stuck with what I had learned a couple years ago. The scope of a Hacker News reader was small enough anyway to not run into many issues or have the need to even think about introducing a state management library like Redux.

The problems arose when I started reading up on the Hacker News API (documented here). Every news story is represented by an item with an id number. If you query the API for that item, you receive JSON like this:

  "by" : "pjmlp",
  "descendants" : 981,
  "id" : 22075076,
  "kids" : [ 22078429, 22076393, 22075239, 22078182, ..., 22075256 ],
  "score" : 1220,
  "time" : 1579273048,
  "title" : "A Sad Day for Rust",
  "type" : "story",
  "url" : ""

This looks fine until you notice that there are 981 comments on this story (indicated by the descendants field). But only the comment’s ids are returned (as kids - I omitted a bunch of those in the example above). So you’d have to query the API for all those ids to receive all the comment text and meta data, resulting in a whopping 981 separate HTTP requests to fetch everything.1 To make matters even worse, kids only contains the top-level comments (nowhere near 981). Each of those has another kids property with its direct child comments, those have kids again and so on and so forth until you have the complete, nested comment structure. So you can only ever make a bunch of concurrent requests for one comment level because you don’t yet know the ids of their children.

With an average API response received after ~ 300 ms, it became clear that loading all comments would never be fast enough to make my site usable. And that’s not even taking into account the kind of DDoS-like hammering of their API. It didn’t take me long to decide on the only sensible solution to that problem: I had to build a local cache for all items. 2

Making 30,000 HTTP Requests

Thankfully, the next item on the list of topics to explore this week was Redis, an in-memory data store that was perfectly suited for my intentions. I didn’t just want to create an HTTP proxy for the API (because it wouldn’t solve the problem of long initial loading times for each story), I wanted to prematurely cache all items from the the latest stories. Then, I could simply write a façade API with the same basic interface as the Hacker News one, which would be easy enough as the API has mainly one endpoint: returning items based on their id. Serving 900 requests from a machine in my LAN is still not optimal, but should be fast enough in the age of Gigabit Ethernet.

My basic idea was this: create a cronjob-invoked cache builder that would regularly fetch all new items from the API and save them to a Redis store. Now Redis is pretty cool: since it holds everything in memory, it’s blazingly fast. As a key value store with a few very helpful data structures for its values, lookup and insertion/deletion complexity would be O(1) for most of my use cases, allowing my API to complete a single HTTP request served from the cache in under 5 ms - that includes processing the request in Node, fetching the item data from Redis and sending back the response.
I tried to estimate how many items I needed to initially cache (to at least fetch all stories and comments from the various top/new/best lists) and I arrived at a number of well over 30,000 items.
Well, if that’s how they design their API, so be it. My cache builder determines the range of all ids to fetch by finding the smallest story id in all lists and querying a helpful API endpoint that returns the current highest item id. My application then creates a queue (also managed by Redis, because hey, why not?) and pushes batches of ids into it. Those id's data gets fetched with a number of concurrent requests at certain intervals. I found 50 requests every two seconds to yield the best results. Any more and the axios library would start to occasionally throw connection errors.

There were only a couple problems in what was otherwise a pretty straight-forward project. The first one arose from me not realizing that sometimes certain ids returned only a null response. I don’t know whether those are deleted or oterwise retracted items, but they confused my error handling. Normally, whenever a request might fail, I would requeue it. Since those null items would fail my guard clauses, they would get requeued again and again, essentially sending my program into an endless loop (sorry Hacker News for trying to fetch those null items for, I don’t know, over an hour on Saturday?).
Error handling became another issue that gave me a few headaches. My inexperience with heavy asynchronous tasks in JavaScript led to some difficulties in trying to figure out where to catch and handle certain inevitable errors (like a failed request). I definitely need to read up some more on async patterns, especially when using the new async/await syntax. Right now my error handling looks as beautiful as a Morlock.

Still, in the end I achieved my goal: it takes around 20 minutes to build the initial data cache. Afterwards, fetching all new items every five minutes is no big deal and now I have my own local mirror of all Hacker News stories and comments from the last three months, together with a React app that displays them in my favored way. Well, I kind of neglected the React part, which is why I still have no way of displaying comments, but overall I am very pleased with how everything turned out. It was quite fun to build something with practical relevance.

Hacker News Reader App

Books and Docs

Redis managed to impress me, not just with how useful it is for so many use cases that I can already imagine, but also for how well laid-out its documentation is. The information for each of the available commands (like LPUSH here) is concise and explains everything necessary. Much to my delight I also found that the company now standing behind Redis, Redis Labs, provide the excellent book Redis in Action for free on their site.

The Pragmatic Programmer Cover I also started reading The Pragmatic Programmer (The Pragmatic Bookshelf, 2019) this week. This book was gifted to me for Christmas by my sister who does software engineering way beyond what I dabble in.
I have long resisted the temptation to read one of those “meta books” on programming, because I always felt that I lack the experience necessary to get the most out of them. I was only a few pages in, though, when I realized how much of the advice I could already apply in my daily work. Reading that book feels like discovering a treasure trove for programmers who consider themselves craftsmen/craftswomen, creating things to pride oneself on, preferring quality over quantity and deeply caring about both the process and the result.
I’m still far from where I want to be in that regard, but The Pragmatic Programmer offers some immensely helpful advice to pay heed to on the road.

Project Panic

In other news: work on my main project is going to start on January, 27th.
Writing this down immediately induces a feeling best described as a mix of fear, panic and the sudden realization that there’s no more coffee. But if I want to maximize the effect of this three-month journey, I should get started on my real project as soon as possible. There is just one problem left: I still don’t have a good idea.
I spent a couple hours last week to scribble down more ideas and evaluating some of them. The latter part is what I most struggle with. It’s hard to judge an early-phase idea for viability, especially if it involves technologies that I don’t know much about yet. How difficult is real-time socket communication? Would I be able to implement an openly published specification in the time that I have? Does it make sense to tackle something that already has a perfectly executed open-source solution?

My hesitation to write down even one of those barebones ideas demonstrates my insecurities. I don’t want to aim too high, but neither do I want to aim too low. Right now I don’t feel ready for anything, but I also freely admit that I could spend a year or two until I feel ready - but that’s time I simply don’t have.
So this week, I will take a few hours out of each day to focus on idea finding and evaluation again. And something will, nay, has to come out of it.

The rest of the time will be spent on the remaining high-priority items on my list of topics worth exploring: Docker (containerizing my Redis server is a natural next step), WebSockets, Node.js worker threads. I’ll most likely also give my Hacker News project some more love in the evening hours, especially the frontend.

This is definitely going to be an interesting week.



Time spent this week: 46 hours3

  1. The documentation says that the API basically exposes their in-memory data structures and that it “isn’t so hot over the network”. I wholeheartedly agree. ↩︎

  2. Yes, the actual sensible solution would have been to use one of the third-party Hacker News APIs which provide a more sane data structure. Or to just not proceed at all. But where’s the fun in that? ↩︎

  3. I’m not going to mention again that this is just a rough estimate. This week included lots of coding for which there seems to be an inherent daily limit until I produce garbage that I’ll need to fix the next day. So I often took the evening hours off, which had an undeniably positive effect on my overall mental state. ↩︎