How Does the raywenderlich.com Tech Stack Work for Books?

Want a sneak peek at part of the tech stack behind raywenderlich.com? Come learn how our online book reader works — it’s more complicated than you think! By Chris Belanger.

Save for later
Share

In all the excitement of launching our new online reading experience for books, we forgot that, for developers, half of the fun of a new technology is learning how it works underneath!

Although the front-end of our online books looks and functions like you’d expect any modern website to work, there’s a lot of moving parts behind the scenes that transform books from raw Markdown to the online reading experience you can experience for yourself online.

I sat down with our CTO, Sam Davies, who explained a little about how the books repositories work, how we use continuous integration to build the books, and what various new services we had to build at raywenderlich.com to bring you the online reading experience.

Book Markdown Repositories

Q. Hey Sam. We’ve had our books in Markdown for some time now, but what did you have to do to make the book repos work in this new model?

It was important from the outset that not only should the migration to our online books be as light-touch as possible for our book teams, but also that our legacy publishing chain continued to work.

The books are written in a variant of Markdown that very closely resembles CommonMark, so we were able to achieve the migration without making any changes to the book content at all. That meant we only had to make a couple of changes to the book repos to support the new publishing system.

A sample of the metadata from our book, Swift Apprentice

A sample of the metadata from our book, Swift Apprentice

First, we added extensive metadata to each book, to record details such as contributors, artwork assets, chapter descriptions etc. Second, we formalized the branching strategy so that it would be possible to definitively reason about where different editions of a book live.

As a corollary of this project, we were also able to split the private Markdown content of each book apart from the book’s sample projects and materials. A single book now has two repos: one for the Markdown source that represents the body of the book, and one for the sample projects that is public. This creates more manageably-sized repos, and creates a clear separation of concerns in each repo.

Despite aiming to minimize the repo changes required, in order to reduce disruption to our book teams, there was still a fairly involved migration effort to add the newly required metadata, and check that each repo conformed to the new paradigm.

GitHub Actions and CI

Q. So, a book starts life as a repo that has the book content stored as Markdown. Merging to the main branch triggers the CI that lints and builds the book, but can you tell me a little about what the CI does and what the stack looks like there?

We leverage GitHub Actions for CI, since it is incredibly easy to integrate and is easy to access for our book teams.

The CI runs two processes: a linting pass, followed by a publication pass. During linting, we check some basic things: Is the metadata valid? Do all the images referenced in the Markdown exist? Does the referenced edition number match the git branch? We have plans to extend these operations in future to catch other common errors as we uncover them.

GitHub Actions makes linting and building the online books a breeze

GitHub Actions makes linting and building the online books a breeze

Once the book passes a linting check, the publication pass then renders the entire book. It changes the book’s Markdown into HTML, applies our in-house Markdown extensions to CommonMark as it goes, before then packaging the entire book up into a JSON payload and sending it to our book data store, known as alexandria.

The entire CI operation is wrapped up in a Dockerized app we call robles. This lets us be relatively agnostic in terms of our CI provider, but, more importantly, it lets our book teams run the linter and HTML live preview locally, which helps to speed up the writing and editing processes in our pipeline.

Linting and the Content Store

Q. Let’s talk a bit about those two new services you had to create: robles and alexandria. What tech are they built on? What purpose does each serve?

We build nearly all of our tools in Ruby, and robles and alexandria are no exception.

robles is named after Ángela Ruiz Robles, who invented the ebook. robles is a Ruby CLI app that can lint, render and publish the book repos, and lives inside a Docker container. It runs both in CI and locally, to provide authors with the local preview I mentioned earlier. Docker makes distribution and execution of robles considerably easier than having to deal with dependencies across a whole host of platforms.

alexandria is named after the famous library which may or may not have burnt down. It is a Ruby on Rails app that acts as the content store for all our books. This provides an API to which robles can publish the books, and also provides an admin user interface that displays the metadata and publication status of each of the books.

In addition to the data store, our alexandria service also provides admin functions on the backend

In addition to the data store, our alexandria service also provides admin functions on the backend

Now, alexandria isn’t the site that readers see when they look at one of our online books; the front-end of raywenderlich.com is called carolus (after Johann Carolus, the publisher of the first newspaper), so alexandria also provides change notifications and an API for carolus to request data from for reader consumption.

Front-End Rendering

Q. The books look great on the front end. How do you generate the rendered book content to display on the website?

The cool thing about displaying the book content is that it’s just HTML — and robles does an excellent job of generating that, so we’re pretty much done, right? Well, it’s not quite that simple.

In order to support some of the user experience around the online books, carolus (the site you see when you visit raywenderlich.com) processes the HTML it receives from alexandria, which is the book data store. We generate a table of contents from the Markdown headings in the chapter content, and then we break the entire book into separate paragraphs.

We then index these paragraphs to support our integrated full-book search, which helps provide accurate in-book search results. The paragraphs and their indexing are also an integral part of the highlighting and notebook feature.