Tuesday, October 6, 2015
Cloud native technologies like Kubernetes help you compose scalable services out of a sea of small logical units. In our last post, we introduced Vitess (an open-source project that powers YouTube's main database) as a way of turning MySQL into a scalable Kubernetes application. Our goal was to make scaling your persistent datastore in Kubernetes as simple as scaling stateless app servers - just run a single command to launch more pods. We've made a lot of progress since then (pushing over 2,500 new commits) and we're nearing the first stable version of the new, cloud native Vitess.
In preparation for the stable release, we've begun to publish alpha builds of Vitess v2.0.0. Some highlights of what's new since our earlier post include:
- Using the final Kubernetes 1.0 API.
- Official Vitess client libraries in Java, Python, PHP, and Go.
- Java and Go clients use the new HTTP/2-based gRPC framework.
- Can now run on top of MySQL 5.6, in addition to MariaDB 10.0.
- New administrative dashboard built on AngularJS.
- GTID-based reparenting for reversible, routine failovers.
- Simpler schema changes.
We've also been hard at work adding lots more documentation. In particular, the rest of this post will explore one of our new walkthroughs that demonstrates transparent resharding of a live database - that is, changing the number of shards without any code changes or noticeable downtime for the application.
Sharding is bitter medicine, as S. Alex Smith wrote. It complicates your application logic and multiplies your database administration workload. But sharding is especially important when running MySQL in a cloud environment, since a single node can only become so big. Vitess takes care of shard routing logic, so the data-access layer in your application stays simple. It also automates per-shard administrative tasks, helping a small team manage a large fleet.
The preferred sharding strategy in Vitess is what we call range-based shards. You can think of
the shards as being like the buckets of a hash table. We decide which bucket to place a record in based solely on its key, so we don't need a separate table that keeps track of which bucket each key is in.
To make it easy to change the number of buckets, we use consistent hashing. That means instead of using a hash function that maps each key to a bucket number, we use a function that maps each key to a randomly distributed (but consistent) value in a very large set - such as the set of all 8-byte sequences. Then we assign each bucket a range of these values, which we call keyspace IDs.
If you want to follow along with the new resharding walkthrough, you'll need to first bring up the cluster as described in the unsharded guide. Both guides use the same sample app, which is a Guestbook that supports multiple, numbered pages.
In the sample app code, you'll see a get_keyspace_id() function that transforms a given page number to the set of all 8-byte sequences, establishing the mapping we need for consistent hashing. In the unsharded case, these values are stored but not used. When we introduce sharding, page numbers will be evenly distributed (on average) across all the shards we create, allowing the app to scale to support arbitrary amounts of pages.
Before resharding, you'll see a single custom shard named "0" in the Vitess dashboard. This is what an unsharded keyspace looks like.
As you begin the resharding walkthrough, you'll bring up two new shards for the same keyspace. During resharding, the new shards will run alongside the old one, but they'll remain idle (Vitess will not route any app traffic to them) until you're ready to migrate. In the dashboard, you'll see all three shards, but only shard "0" is currently active.
Next, you'll run a few Vitess commands to copy the schema and data from the original shard. The key to live migration is that once the initial snapshot copy is done, Vitess will automatically begin replicating fresh updates on the original shard to the new shards. We call this filtered replication, since it distributes DMLs only to the shards to which they apply. Vitess also includes tools that compare the original and copied data sets, row-by-row, to verify data integrity.
Once you've verified the copy, and filtered replication has caught up to real-time updates, you can run the migrate command, which tells Vitess to atomically shift app traffic from the old shards to the new ones. It does this by disabling writes on the old masters, waiting for the new masters to receive the last events over filtered replication, and then enabling writes on the new masters. Since the process is automated, this typically only causes about a second of write unavailability.
Now you can tear down the old shard, and verify that only the new ones show up in the dashboard.
Note that we never had to tell the app that we were changing from one shard to two. The resharding process was completely transparent to the app, since Vitess automatically reroutes queries on-the-fly as the migration progresses.
At YouTube, we've used Vitess to transparently reshard (both horizontally and vertically) nearly all of our MySQL databases within the last year alone, and we have still more on the horizon as we continue to grow. See the full walkthrough instructions if you want to try it out for yourself.
The promise of sharding is that it allows you to scale write throughput linearly by adding more shards, since each shard is actually a separate database. The challenge in achieving that separation while still presenting a simple, unified view to the application is to avoid introducing bottlenecks. To demonstrate this scaling in the cloud, we've integrated the Vitess client with a driver for the Yahoo! Cloud Serving Benchmark (YCSB).
Below you can see preliminary results for scaling write throughput by adding more shards in Vitess running on Google Container Engine. For this benchmark, we pointed YCSB at the load balancer for our Vitess cluster and told it to send a lot of INSERT statements. Vitess took care of routing statements to the various shards.
The max throughput (QPS) for a given number of shards is the point at which round-trip write latency became degraded, which we define as >15ms on average or >50ms for the worst 1% of queries (99th percentile).
We also ran YCSB's "read mostly" workload (95% reads, 5% writes) to show how Vitess can scale read traffic by adding replicas. The max throughput here is the point at which round-trip read latency became degraded, which we define as >5ms on average or >20ms for the worst 1% of queries.
There's still a lot of room to improve the benchmarks (for example, by tuning the performance of MySQL itself). However, these preliminary results show that the returns don't diminish as you scale. And since you're scaling horizontally, you're not limited by the size of a single machine.
ConclusionWith the new cloud native version of Vitess moving towards a stable launch, we invite you to give it a try and let us know what else you'd like to see in the final release. You can reach us either on our discussion forum, or by filing an issue on GitHub. If you'd like to be notified of any updates on Vitess, you can subscribe to our low-frequency announcement list.
- Posted By Anthony Yeh, Software Engineer, YouTube