This post is a continuation of our series about paying down technical debt and re-architecting our platform. You can read the introductory post here: Paying Technical Debt to Focus on the Future.
Like many other companies that underwent massive growth in a short period, Trulia has accumulated a large, monolithic codebase. Developers working within this monolith became bogged down with bugs due to the increasingly intertwined code structure. We’d made some headway breaking things up into microservices, which has helped quite a bit on the backend, but our frontend code was still suffering. We needed a microservice approach for shipping pages and UI components.
The Legacy Platform
Before we began building our new platform, we thought about some of the problems within our existing system and set out some goals for our new platform:
The legacy platform featured a difficult environment to replicate on a local machine. As result, most developers would make changes locally and sync them up to the remote environment. This was a slow and painful process, especially when running build scripts for CSS and JS assets.
Platform Goal: Provide a fast, local, development environment.
Reusing Existing Solutions
As a workaround, teams would split their code into separate modules each with their own environments. This made it difficult to share configuration or functionality across teams. Developers would recreate existing solutions because they either didn’t know of their existence or pulling in that functionality was more work than it was worth.
Platform Goal: Provide a consistent approach for developers to create or modify functionality for all teams to consume.
The structure of our existing platform made it difficult to add or modify any existing tooling. Since most of the developers were working within the remote development environment, they were limited to the tools supported by that environment. Furthermore, implementing a new tool across the whole codebase required a lot more effort than doing so on a small subsection.
Platform Goal: Provide teams with a base-level of tooling that they can extend within their own products.
And finally, the most important problem we needed to address was that our user experience reflected our code structure; navigating across pages was slow and each page felt like it was built independent of the others. This was due to few shared assets that carried over when navigating from one page to another.
Platform Goal: Create a fast, seamless, user experience using PWA / SPA-like functionality.
The New Platform
With our new platform goals defined, let’s look at the two main concepts and technologies that make up our new frontend platform.
One of the potential downsides to a monolithic codebase is the difficulty in isolating changes and making smaller releases that don’t run the risk of breaking another team’s code. On the other hand, with teams working in separate codebases, it becomes difficult to build a seamless and cohesive experience across an app. The approach we ended up taking lands somewhere in the middle of these two.
To keep the experience seamless, we looked at the typical user’s journey through Trulia and found the average user navigates between the same few pages for most of their visit. For example, a user might enter on the homepage, execute a search on the search results page, and then alternate between the search and property pages. This collection of pages creates a natural grouping of code which we refer to as an island.
As we envisioned the flow of creating application islands to deliver microservice-oriented UI, it was clear that developers should not be concerned with all the common requirements of microservices, let alone the site-wide UI requirements for style, content, and telemetry. We decided to build a microservice chassis for our React applications.
A conventional microservice chassis allows developers to focus on the business problem by providing an implementation that fills basic service requirements such as externalized configuration, logging, health checks, metrics, and distributed tracing.
To assure that our UI islands are consistent from a technical and UX design standpoint, our UI chassis includes tracking for UI analytics, content header and footer, and login dialog.
When we began efforts to break out the monolith into services, we ran into a new problem. Teams began to reinvent the wheel every time they would have to build shared functionality. Additionally, our islands only contain the UI components specific to their pages, so we needed a way to augment those islands with all of this functionality. This is where the microservice chassis, our application shell, comes in.
The primary job of the app shell is to provide any shared functionality to the islands and provide a layer of abstraction between our islands and the underlying technology. We all know frameworks and tooling in the frontend world change with the seasons. For this reason, we decided to use the app shell to insulate our applications from frequent, long reaching, changes so that when the time comes, we could implement those changes in the app shell with minimal impact on the islands.
When deciding on the tools for building our new stack, one of the main criteria was to choose solutions that could provide a consistent developer experience across our islands. Frontend teams were facing several challenges that we were aiming to solve with newer tooling:
Building a server-side rendering solution that produces markup for a given React component isn’t difficult, but a solution that can pass that markup down to the client and hydrate the local state across pages is significantly more complex. Adding Webpack to the equation only compounds those challenges. We selected Next.js, a lightweight framework that provides an out-of-the-box solution, to address this issue. Working with ZEIT, the creators of Next.js, we were able to take advantage of encapsulating those complexities while also gaining an ability to optimize interactions that span islands using the concept of ‘zones’ which we co-developed.
GraphQL combined with apollo-client removes the need for your frontend application to be concerned with how or where your data comes from. You define the schema of the data using GraphQL and apollo-client takes care of retrieving it from the GraphQL server. The GraphQL server also provides a layer of abstraction for our legacy services. Even if the frontend application and the backend services are not in sync with the expected response structure, we could fix that at the GraphQL layer and then update those services when the time comes. (We will have more information about our GraphQL implementation in an upcoming post.)
And at the core of all of this is React, which allows us to build encapsulated components that can easily be augmented using props and context.
Bringing It All Together
Once we had our islands and app shell in place, we needed to provide a way for the app shell to pull in the islands and hook into the pages and routes provided by the island. Our simple solution was to expose a configuration file that would inform the app shell about the island’s routes and their corresponding entry points.
Next, we needed to extend the entry points with the functionality provided by the app shell. Initially, we created an App component that was responsible for passing down all of the shared functionality, but with the release of Next.js 6.0, the project has incorporated this concept in the form of the _app.js template.
The app shell and island concepts have been proven in practice: we were able to do large-scale refactors of the code within the app shell with minimal changes on the side of the islands. As of this posting, we have three islands in production, and more under development. With each island we build, we are evaluating and improving on some of the design decisions we made early on.
Future posts in this series will dive into some of the specific implementations of the platform including how we use GraphQL, optimizing performance, and other learnings gained along the way.