Public Learning

project-centric learning for becoming a software engineer

Week 7: A Recap

Regaining Confidence

Last week’s blog post must have read like a message from the deep pits of despair, and I certainly felt like being trapped down there. Since then, I have at least rediscovered a fresh source of determination and have started actually writing production-ready code. Not as much as I would like to, but there is at least a public repository now that will be an incorruptible representation of any further progress. And it might just be a footnote considering all the challenges I’m still facing, but I have decided to name my project 200 OK, a name that is supposed to mirror the flexibility of the final product.
There are still lingering doubts, but there is so much low-hanging fruit right now, that I will not run out of code to write even if I keep evading the elephants in the room, with the structure of my data persistence layer being the most pressing problem to solve.

Dumping Data

In most classic web applications, the database is a known quantity. Most ORM’s even have a hard requirement for pre-defined models that define the exact schema in which data will be expected. For 200 OK, nothing is set in stone by default until I actually receive a POST request. This makes it surprisingly tricky to decide on how to store user data.
For example, a user might create data by POSTing to a /users route. This is relatively easy to handle: in a document store like MongoDB, the users resource would become a collection and every JSON payload that is received gets stored as a document inside of that collection, adding only an id field (if not already present) to allow retrieval of that specific document later (by a GET request to /users/:id).
Matters get more complicated as soon as collections get nested: RESTful API design encourages nested parent-child relationships, so a user might also have associated comments to her, each one represented by the users/:id/comments/:id route. Translating that into a database schema poses the question: how will the relationship between users and comments be represented?

In a relational SQL database, one would normally create two tables, users and comments, with a foreign key constraint linking each comment to a specific user. But the API might also receive a POST request to the /comments resource, a collection of comments that might be completely unrelated to the comments associated to each user. One solution might be to simply create tables with unique names (like users_comments for the user-specific comments and comments for the top-level collection of other comments) and implement multiple id fields to uniquely identify each comment.
However, the general problem of creating SQL tables on the fly remains: without any idea of the data that will get saved, the only option would be to dump everything into a JSON-typed column and only add primary and/or foreign keys. That hardly seems like proper SQL usage and appears to be heavily discouraged as a bad practice.

Another option could be to simply construct nested JSON properties. As soon as the first payload for a user comment is received, it gets added into a comments array property on that specific user. This simplifies handling the relationship between that data, but will make the retrieval of data a messy effort: a request to /users/42 is supposed to return only the actual user data, not data of every nested resource as well. So a pure JSON solution would require heavy filtering and selecting of JSON properties to be able to return any requested data or subsets thereof.

Storing collections in a non-relational database (e.g. a document store like MongoDB)comes with the awkwardness of having to handle the relationships between collections like users and users_comments. If a user gets deleted, the associated comments will need to be removed as well, a task ill-suited for a NoSQL database. That would require application logic to handle, adding responsibilities and complications to the API backend that should not live there.

There seems to be no obvious solution that is noticeably better than what I described above, so I find myself in a difficult spot. The aspects mentioned above are only a small part of my considerations and thoughts. There is a Markdown file inside of the repo that contains longer explanations and alternative approaches to the problem space.
I can implement a naive solution using a simple document store like MongoDB for now and separate it well enough to be replaceable later on, but eventually I will need to find a permanent solution and I think I would prefer to find it before I start writing hundreds of lines of code for it.

Fantastic Errors And Where To Handle Them

Database issues aside, I was spending quite a bit of time on laying the proper groundwork for handling requests and responses to the API. The book The Design of Web APIs mentioned two weeks ago was providing some good recommendations on how to properly respond to malformed, invalid or otherwise failing requests. Following the examples set by the book, I was starting to implement functionality to centrally handle response body construction in the case of an error, setting the correct (and helpful) status codes and headers so that no ambiguity remains in the case of a failed requests.
The same goes for handling any errors that might occur during runtime. The worst thing to happen for a Node.js app is to crash, so preventing that by trying to catch all errors is a given. The difficulty lies in determining how to respond to each error and how it may effect the running application.
This stuff may seem trivial, but it’s crucial for any application. Finding good advice on how to deal with database errors (like a dropped connection) and JavaScript exceptions is difficult, so I didn’t feel confident wiring those early parts of my code together. Still, with every line of code, I become more confident in the whole application, even though progress is still slower than I anticipated. Writing software is hard, y’all.



Time spent this week: 35 hours