Trulia has been pushing to migrate away from dedicated datacenter computing to cloud-based infrastructure for the past few years in order to scale with flexibility and better respond to customer needs. We were looking forward to the benefits of the new infrastructure but had to contend with some challenges. These included mitigating the higher price of cloud computing compared to the systems we owned in the datacenter, and adopting new technology to take advantage of the cloud environment.
In order to address these challenges, we undertook a new approach and began our migration to AWS Aurora. We dubbed our new relational database suite of services “Hermes” and were pleasantly surprised at just how much simpler and more efficient it’s been compared to the legacy database system running in the datacenter.
One of the advances we made is the use of Terraform (TF) to provision and manage our AWS resources. It allows version control of infrastructure as code, and also means we have minimal interaction with the AWS GUI which reduces mistakes. Our migration to AWS Aurora begins with deploying instances that act as a facade for the master databases that remain in the datacenter.
New cloud-based architecture
During the migration phase, our AWS Aurora databases only get updates via replicated writes from the MySQL DB master in the datacenter. We deployed ProxySQL to direct reads to AWS Aurora and writes to the datacenter. ProxySQL also helps to improve performance for applications by maintaining a pool of connections to databases in the datacenter. Since it can take over one hundred milliseconds to establish a connection from the AWS region into the datacenter, this provides significant performance gains.
We set up our services and applications running in AWS to connect to ProxySQL over AWS PrivateLink via AWS Network Load Balancers (NLB) so they are not tied to an instance, and our operations team is only concerned about assuring an adequate number of instances are operational, rather than obsessing over the health of an individual instance. The services and applications that access this core DB operate in their own AWS accounts to assure security.
This strikes a stark contrast with the experience working on MySQL in our datacenter. At the time of writing, we maintain 14 different database clusters in our datacenter, all relying on full or partial replication from the master database. For each of these clusters, we must keep up with physical server maintenance, in addition to administering access and monitoring replication to ensure the data is current. Since the databases were developed by different teams with different requirements and levels of access we were operating several one-off and outdated database hosts. This has led to a physical database architecture that’s excessively complex to maintain.
Current complex database architecture
As we implemented the AWS Aurora architecture, the Trulia Operations team was delighted at how simple a cloud-based database could be. Instead of maintaining a large number of MySQL database clusters, in the cloud we have essentially two. With Hermes, we no longer need to worry about scaling, as this is built into the TerraForm code.
In addition to the simplicity of Hermes, it’s approximately 20msec faster per query on average, and more importantly, the calls are error-free. This is in contrast to when AWS-deployed services used MySQL in the datacenter, which sometimes encountered errors 5% of the time or more. Hermes is very durable as well; the Aurora DB instances are spun up via Auto-Scaling-Groups in multiple availability zones, which provides 99.99% availability for the clusters. We have not yet had any scale up events since the two 8X large Aurora DB instances have provided enough capacity. Once the migration of the applications and services makes sufficient progress, we will make AWS Aurora the destination for DB writes, it will become the master DB, and the MySQL DB instances in the datacenter will be retired.
The move to AWS RDS has been a great success for Trulia. We have rethought our architecture and created more efficient datastores that will replace those that we have been running in our datacenter. We plan to leverage this experience to migrate other datastores to AWS, where we can utilize the flexibility and durability of a cloud-based architecture.