Cloud native technologies like Kubernetes help you compose scalable services out of a sea of small logical units. In our last post, we introduced Vitess (an open-source project that powers YouTube's main database) as a way of turning MySQL into a scalable Kubernetes application. Our goal was to make scaling your persistent datastore in Kubernetes as simple as scaling stateless app servers - just run a single command to launch more pods. We've made a lot of progress since then (pushing over 2,500 new commits) and we're nearing the first stable version of the new, cloud native Vitess.

Vitess 2.0
In preparation for the stable release, we've begun to publish alpha builds of Vitess v2.0.0. Some highlights of what's new since our earlier post include:

  • Using the final Kubernetes 1.0 API.
  • Official Vitess client libraries in Java, Python, PHP, and Go.
    • Java and Go clients use the new HTTP/2-based gRPC framework.
  • Can now run on top of MySQL 5.6, in addition to MariaDB 10.0.
  • New administrative dashboard built on AngularJS.
  • Built-in backup/restore, designed to plug into blob stores like Google Cloud Storage.
  • GTID-based reparenting for reversible, routine failovers.
  • Simpler schema changes.

We've also been hard at work adding lots more documentation. In particular, the rest of this post will explore one of our new walkthroughs that demonstrates transparent resharding of a live database - that is, changing the number of shards without any code changes or noticeable downtime for the application.

Vitess Sharding
Sharding is bitter medicine, as S. Alex Smith wrote. It complicates your application logic and multiplies your database administration workload. But sharding is especially important when running MySQL in a cloud environment, since a single node can only become so big. Vitess takes care of shard routing logic, so the data-access layer in your application stays simple. It also automates per-shard administrative tasks, helping a small team manage a large fleet.

The preferred sharding strategy in Vitess is what we call range-based shards. You can think of
the shards as being like the buckets of a hash table. We decide which bucket to place a record in based solely on its key, so we don't need a separate table that keeps track of which bucket each key is in.

To make it easy to change the number of buckets, we use consistent hashing. That means instead of using a hash function that maps each key to a bucket number, we use a function that maps each key to a randomly distributed (but consistent) value in a very large set - such as the set of all 8-byte sequences. Then we assign each bucket a range of these values, which we call keyspace IDs.

Transparent Resharding
If you want to follow along with the new resharding walkthrough, you'll need to first bring up the cluster as described in the unsharded guide. Both guides use the same sample app, which is a Guestbook that supports multiple, numbered pages.

In the sample app code, you'll see a get_keyspace_id() function that transforms a given page number to the set of all 8-byte sequences, establishing the mapping we need for consistent hashing. In the unsharded case, these values are stored but not used. When we introduce sharding, page numbers will be evenly distributed (on average) across all the shards we create, allowing the app to scale to support arbitrary amounts of pages.

Before resharding, you'll see a single custom shard named "0" in the Vitess dashboard. This is what an unsharded keyspace looks like.

As you begin the resharding walkthrough, you'll bring up two new shards for the same keyspace. During resharding, the new shards will run alongside the old one, but they'll remain idle (Vitess will not route any app traffic to them) until you're ready to migrate. In the dashboard, you'll see all three shards, but only shard "0" is currently active.

Next, you'll run a few Vitess commands to copy the schema and data from the original shard. The key to live migration is that once the initial snapshot copy is done, Vitess will automatically begin replicating fresh updates on the original shard to the new shards. We call this filtered replication, since it distributes DMLs only to the shards to which they apply. Vitess also includes tools that compare the original and copied data sets, row-by-row, to verify data integrity.

Once you've verified the copy, and filtered replication has caught up to real-time updates, you can run the migrate command, which tells Vitess to atomically shift app traffic from the old shards to the new ones. It does this by disabling writes on the old masters, waiting for the new masters to receive the last events over filtered replication, and then enabling writes on the new masters. Since the process is automated, this typically only causes about a second of write unavailability.

Now you can tear down the old shard, and verify that only the new ones show up in the dashboard.

Note that we never had to tell the app that we were changing from one shard to two. The resharding process was completely transparent to the app, since Vitess automatically reroutes queries on-the-fly as the migration progresses.

At YouTube, we've used Vitess to transparently reshard (both horizontally and vertically) nearly all of our MySQL databases within the last year alone, and we have still more on the horizon as we continue to grow. See the full walkthrough instructions if you want to try it out for yourself.

Scaling Benchmarks
The promise of sharding is that it allows you to scale write throughput linearly by adding more shards, since each shard is actually a separate database. The challenge in achieving that separation while still presenting a simple, unified view to the application is to avoid introducing bottlenecks. To demonstrate this scaling in the cloud, we've integrated the Vitess client with a driver for the Yahoo! Cloud Serving Benchmark (YCSB).

Below you can see preliminary results for scaling write throughput by adding more shards in Vitess running on Google Container Engine. For this benchmark, we pointed YCSB at the load balancer for our Vitess cluster and told it to send a lot of INSERT statements. Vitess took care of routing statements to the various shards.
The max throughput (QPS) for a given number of shards is the point at which round-trip write latency became degraded, which we define as >15ms on average or >50ms for the worst 1% of queries (99th percentile).

We also ran YCSB's "read mostly" workload (95% reads, 5% writes) to show how Vitess can scale read traffic by adding replicas. The max throughput here is the point at which round-trip read latency became degraded, which we define as >5ms on average or >20ms for the worst 1% of queries.
There's still a lot of room to improve the benchmarks (for example, by tuning the performance of MySQL itself). However, these preliminary results show that the returns don't diminish as you scale. And since you're scaling horizontally, you're not limited by the size of a single machine.

With the new cloud native version of Vitess moving towards a stable launch, we invite you to give it a try and let us know what else you'd like to see in the final release. You can reach us either on our discussion forum, or by filing an issue on GitHub. If you'd like to be notified of any updates on Vitess, you can subscribe to our low-frequency announcement list.

- Posted By Anthony Yeh, Software Engineer, YouTube

Cloud Spin, Part 1 and Part 2 introduced the Google Cloud Spin project, an exciting demo built for Google Cloud Platform Next, and how we built the mobile applications that orchestrated 19 Android phones to record simultaneous video. 
And now the last step is to retrieve the videos from each phone, find the frame corresponding to an audio cue in each video, and compile those images into a 180-degree animated GIF. This post explains the design decisions we made for the Cloud Spin back-end processing and how we built it.
The following figure shows a high level view of the back-end design:
1. A mobile app running on each phone uploads the raw video to a Google Cloud Storage bucket.

Video taken by one of the cameras

2. An extractor process running on a Google Compute Engine instance finds and extracts the single frame corresponding to the audio cue.

3. A stitcher process running on an App Engine Managed VM combines the individual frames into a video that pans across an 180-degree view of an instant in time, and then generates a corresponding animated GIF.

How we built the Cloud Spin back-end services
After we’d designed what the back end services would do, there were several challenges to solve as we built them. We had to figure how to:
  • Store large amounts of video, frame, and GIF data
  • Extract the frame in each video that corresponds to the audio cue
  • Merge frames into an animated GIF
  • Make the video processing run quickly
Storing video, frame, and GIF data
We decided the best place to store incoming raw videos, extracted frames, and the resulting animated GIFs was Google Cloud Storage. It’s easy to use, integrates well with mobile devices, provides strong consistency, and automatically scales to handle large amounts of traffic, should our demo become popular. 
We also configured the Cloud Storage buckets with Object Change Notifications that kicked off the back-end video processing when the Recording app uploaded new video from the phones.
Extracting a frame that corresponds to an audio cue
Finding the frame corresponding to the audio cue or beep poses challenges. Audio and video are recorded with different qualities and at different sample rates, so it takes some work to match them up.. We needed to find the frame that matched the noisiest section of the audio.To do so, we grouped the audio frames into frame intervals, each interval containing the audio that roughly corresponded to a single video frame. We computed the average noise of each interval by calculating the average of the squared amplitude of the samples. Once we identified the interval with the largest average noise, we extracted the corresponding video frame as a PNG file.
We wrote the extractor process in Python and used MoviePy, a module for video editing that uses the FFmpeg framework to handle video encoding and decoding.
Merging frames into an animated GIF
The process of generating an animated GIF from a set of video frames can be done with only four FFmpeg commands, run by a Bash script. First we generate a video by stitching all the frames together in order, extract a color palette, and then use it to generate a lower-resolution GIF to upload to Twitter.
Making the video processing run quickly
Processing the videos one-by-one on a single machine would take longer than we wanted. Next participants to have to wait to see their animated GIF. Cloud Spin takes 19 videos for each demo, one from each phone in the 180-degree arc.If extracting the synchronized frame from each video takes 5 seconds, and merging the frames takes 10 seconds, with serial processing the time between taking the demo shot and the final animated video would be (19 * 5s + 10s = 110s), almost two minutes!
We can make the process faster by parallelizing the frame extraction. If we use 19 virtual machines, one to process each video, the time between the demo shot and the animated GIF is only 15 seconds. To make this improvement work, we had to modify our design to handle synchronization of multiple machines.
Parallelizing the workload
We developed the extraction and stitching process as independent applications. This made it easy to parallelize the frame extraction. We can run 19 extractors and one stitcher, each  as a Docker container on Google Compute Engine.
But how do we make sure that each video is processed by one, and only one, extractor? Google Cloud Pub/Sub is a messaging system that solves this problem in a performant and scalable way. Using Cloud Pub/Sub, we can create a communication channel that is loosely coupled across subscribers.This means that the extractor and stitcher applications  interact through Cloud Pub/Sub, with no assumptions about the underlying implementation of either application. This makes future evolutions of the infrastructure easier to implement.
The preceding diagram  shows two Cloud Pub/Sub topics that act as processing queues for the extractor and the stitcher applications. Each time a mobile app uploads a new video, Cloud Pub/Sub publishes a message on the videos topic. The extractors subscribe to the videos topic. When a new message is published, the first extractor to pull it down has a lease on the message during which it processes  the video in order to extract the frame corresponding to the audio cue. If the processing completes successfully, the extractor  acknowledges the videos message to Cloud Pub/Sub, which causes Cloud Pub/Sub to publish a new message to the frames topic. If the extractor process fails, the lease on the videos message expires and Cloud Pub/Sub republishes the message, where it can be handled by another extractor.
When a message is published on the frames topic the stitcher pulls it down and waits until all of the frames of a single session are ready to be stitched together into an animated GIF. In order for the stitcher application to detect when it has all the frames, it needs a way to check the real-time status of all of the frames in a session.
Managing frame status with Firebase
Part 2 discussed how we managed orchestrating the phones to take simultaneous video using a Firebase database that provides real-time synchronization.
We also used Firebase to track the process of extracting the frame from each camera that corresponds to the audio cue. To do so, we added a status field to each extracted frame in the session as shown in the following screenshot.
When the Android phone takes the video it sets this status to RECORDING and then to UPLOADING when it uploads the video to Cloud Storage. The extractor process sets the status to READY when the frame matching the audio cue has been extracted. When all of the frames in a session are set to READY the stitcher process combines the extracted frames into an animated GIF and stores the GIF in Cloud Storage and its path on Firebase.
Having  the status stored in Firebase made it possible for us to create a dashboard that showed, in real time, each step of the processing of the video taken by the Android phones and the resulting animated  GIF.
We finished development of the Cloud Spin backend in time for the Google Cloud Platform Next events, and together with the mobile apps that captured the video, ran a successful demo. 
You can find more Cloud Spin animations on our Twitter feed: @googlecloudspin. We plan to release the complete demo code. Watch Cloudspin on GitHub for updates.
This is the final post in the three-part series on Google Cloud Spin. I hope you had as much fun discovering this demo as we did building it. Our goal was to demonstrate the possibilities of Google Cloud Platform in a fun way, and inspire you build something awesome!
- Posted by Francesc Campoy Flores, Google Cloud Platform

Google Cloud Launcher was created to provide customers an easy way to discover and deploy a variety of third-party solutions onto Google Cloud Platform. Today, we’re happy to announce new collaborations with Zend, NGINX, and Expert System as our first partners to contribute commercial solutions to our growing inventory.

You can now deploy commercially supported solutions with technical support included such as Zend Server, NGINX Plus, and Cogito API Core onto your choice of VM in just a few clicks. And to keep things simple, you will receive a unified bill from us for these services along with your other Google Cloud Platform usage costs at the end of the month.

Zend Server is an application server with a supported PHP runtime that can scale apps seamlessly across cloud resources, from the company most associated with PHP. Zend Server gives PHP developers and DevOps engineers amazing dev tools like Z-Ray, app deployment automation, performance monitoring, request analysis, and configuration management so apps run faster, scale better, and stay up longer.

"Zend Server on Google Cloud Platform will allow PHP developers to produce enterprise-grade applications in the very epicenter of disruption -- the cloud.," said Andi Gutmans, CEO and co-founder, Zend. "Our work with Google Cloud Platform gives developers the ammunition they need to create the quality, forward-looking applications businesses require to thrive."
NGINX Plus is a high performance, flexible, scalable, secure web accelerator and a web server. NGINX features include: Layer 4 / Layer 7 load balancing; application request routing; support for FastCGI, uwsgi, SCGI and memcached protocols; HTTP streaming media; SSL termination; bandwidth and request control; reverse proxy for HTTP, SMTP, IMAP and POP3; content caching; static content offload.

"We're excited to make NGINX Plus available on Google Cloud Platform as a fully supported solution," said Paul Oh, head of business development at NGINX, Inc. "Together, we're making it easier than ever before for organizations to deliver applications from the cloud with the speed and performance their customers have come to expect."
Through an understanding of the meaning of words in context, Cogito API Core identifies the entities present in documents, relations between the entities, concepts, and the semantically relevant information contained in a text.

“We are excited to offer a text analytics, semantic API through Google Cloud Platform, the most scalable and reliable infrastructure in the world”, said Marcello Pellacani, VP Strategic Partnerships, Expert System. “We’re looking forward to expanding the features and services available via Cloud Launcher, as APIs are a key element in developing and delivering world-class products.”

Moreover, Cloud Launcher can now be easily accessed directly in the Google Developers Console, creating a more unified experience that doesn’t require you to leave the management console to find and deploy a solution. This marks a next step toward our goal of surfacing the right solution at the right time for customers and making 3rd party solutions feel as native as possible.

Figure 1 - Cloud Launcher is a prominent menu item in Developers Console

Since our last update, we’ve also added Windows Active Directory, ASP.NET framework, and several popular open-source solutions that our users have requested, including Open edX and Redmine.

Expect more partners and solutions to be added as we expand our inventory of useful open-source and commercial options, and continue to give us suggestions about products and services you love using or would like to see us include in Cloud Launcher!

Posted by - Leslie Lee, Product Manager