Yesterday, we announced that Google Cloud Platform big data services are taking a big step forward by allowing everyone to use big data the cloud way. Google BigQuery has many new features and is now available in European zones. These improvements were designed to extend BigQuery's performance and capabilities to give you greater peace-of-mind and control over your data.

European Data Location Control

You now have the option to store your BigQuery data in European locations while continuing to benefit from a fully managed service, now with the option of geographic data control, without low-level cluster maintenance headaches. Feel free to contact the Google Cloud Platform technical support team for details on how to set this up.

Streaming Inserts

One of BigQuery's most popular features is the ability to stream data into the service for real-time analysis. To allow such low-latency analysis on very high-volume streams, we've increased the default insert-rate limit from 10,000 rows per second, per table, to 100,000 rows per second, per table. In addition, the row-size limit has increased from 20 KB to 1 MB, and pricing will move from a per-row model to a per-byte model for better flexibility and scale.

Security Features

BigQuery can now tackle a wider range of enterprise applications with the addition of data expiration controls and row-level permissions. Row-level permissions eliminate the need to create different views for different users, allowing secure shared access to systems such as finance or HR. This ensures that you get the information that’s relevant to you. In addition, data in BigQuery will be encrypted at rest.

Google Cloud Platform Logging Integration

Google Cloud Logging provides a powerful set of tools for managing your operations and understanding the systems powering your business; now, it also lets your Google App Engine and Google Compute Engine applications stream their logs into BigQuery. This allows you to perform real-time analysis on your log data and gain insight into how your system is performing and how your users are behaving. By joining application logs with your marketing and partnership data, you can rapidly evaluate the effectiveness of your outreach, or apply context from user profile info into your application logs to quickly assess what behavior resulted from specific customer interactions, providing easy and immediate value to both system administrators and business analysts.

Frequently requested features

Additionally, we’ve implemented a number of new features you’ve been asking for. You can now:

For a full list of features, take a look at the release notes.

Unprecedented scale

BigQuery continues to provide exceptional scale and performance without requiring you to deploy, augment or update your own clusters. Instead, you can focus on getting meaningful insights from massive amounts of data. For example:

  • BigQuery absorbs real-time streams of customer data totaling more than 100 TB per day, which you can query immediately. All this data is in addition to the hundreds of terabytes loaded daily from other sources. If you have fast-moving, large-scale applications such as IoT, you can now make quick, accurate decisions against in-flight applications.
  • We have customers currently running queries that scan multiple petabytes of data or tens of trillions of rows using a simple SQL query, without ever having to worry about system provisioning, maintenance, fault-tolerance or performance tuning.

With BigQuery’s new features, you can analyze even more data and access it faster than before, in brand new ways. To get started, learn more about BigQuery, read the documentation, and try it out for yourself.

-Posted by Andrew Kowal, Product Manager

Big data applications can provide extremely valuable insights, but extracting that value often demands high overhead – including significant deployment, tuning, and operational effort – diverse systems, and programming models. As a result, work other than the actual programming and data analysis dominates the time needed to build and maintain a big data application. The industry has come to accept these pains and inefficiencies as an unavoidable cost of doing business. We believe you deserve better.

In Google’s systems infrastructure team, we’ve been tackling challenging big data problems for more than a decade and are well aware of the difference that simple yet powerful data processing tools make. We have translated our experience from MapReduce, FlumeJava, and MillWheel into a single product, Google Cloud Dataflow. It's designed to reduce operational overhead and make programming and data analysis your only job, whether you’re a data scientist, data analyst or data-centric software developer. Along with other Google Cloud Platform big data services, Cloud Dataflow embodies the kind of highly productive and fully managed services designed to use big data, the cloud way.

Today we’re pleased to make Google Cloud Dataflow available in beta, for use by anyone on Google Cloud Platform. With Cloud Dataflow, you can:

  • Merge your batch and stream processing pipelines thanks to a unified and convenient programming model. The model and the underlying managed service let you easily express data processing pipelines, make powerful decisions, obtain insights and eliminate the switching cost between batch and continuous stream processing.
  • Finely tune the desired correctness model for your data processing needs through powerful API primitives for handling late arriving data. You can process data based on event time as well as clock time and gracefully deal with upstream data latency when processing data from unbounded sources.
  • Leverage a fully-managed service, complete with dynamically adaptive auto-scaling and auto-tuning, that offers attractive performance out of the box. Whether you’re a developer or systems operator, you no longer need to invest time worrying about resource provisioning or attempting to optimize resource usage. Automation, a fully managed service, and the programming model work together to significantly lower both CAPEX and OPEX.
  • Enjoy reduced complexity of managing and debugging highly parallelized processes with a simplified monitoring interface that’s logically mapped to your processing logic as opposed to how your code’s mapped to the underlying execution plane.
  • Benefit from integrated processing of data across the Google Cloud Platform with optimized support for services such as Google Cloud Storage, Google Cloud Datastore, Google Cloud Pub/Sub, and Google BigQuery.

We’re also working with major open source contributors on maturing the Cloud Dataflow ecosystem. For example, we recently announced collaborations with Data Artisans for runtime support for Apache Flink and with Cloudera for runtime support for Apache Spark.

We’d like to thank our alpha users for their numerous suggestions, reports and support along this journey. Their input has certainly made Cloud Dataflow a better product. Now, during beta, everyone can use Cloud Dataflow and we continue to welcome questions and feedback on Stack Overflow. We hope that you’ll give Google Cloud Dataflow a try and enjoy big data made easy.

-Posted by Grzegorz Czajkowski, Director of Engineering

The promise of big data is faster and better insight into your business. Yet it often turns into an infrastructure project. Why? For example, you might be collecting a deluge of information and then correlating, enriching and attempting to extract real-time insights. Should you expect such feats, by their very nature, to involve a large amount of resource management and system administration? You shouldn’t. Not in the cloud. Not if you’re using big data the cloud way.

Big data the cloud way means being more productive when building applications, with faster and better insights, without having to worry about the underlying infrastructure. More specifically, it includes:

  • NoOps: Your cloud provider should worry about deploying, managing and upgrading infrastructure to make it scalable and reliable. “NoOps” means the platform handles such tasks and optimizations for you, freeing you up to focus on understanding and exploiting the value in your data.
  • Cost effectiveness: In addition to increased ease of use and agility, a “NoOps” solution provides clear cost benefits via the removal of operations work; but the cost benefits of big data the cloud way go even further the platform auto-scales and optimizes your infrastructure consumption, and eliminates unused resources like idle clusters. You manage your costs by dialing up or down the number of queries and the latency of your processing based on your cost/benefit analysis. You should never have to re-architect your system to adjust your costs.
  • Safe and easy collaboration: You can share datasets from files in Google Cloud Storage or tables in Google BigQuery with collaborators inside or outside of your organization without the need to make copies or grant database access. There’s one version of the data – which you control – and authorized users can access it (at no cost to you) without affecting the performance of your jobs.

Google has been blazing the big data trail for the rest of the industry  so when you use Google Cloud Platform, big data the cloud way also means:

  • Cutting-edge features: Google Cloud Dataflow provides reliable, event-time-based stream processing, available by default with no extra work. But making stream processing easy and reliable doesn’t mean removing the option of running in batch. The same pipeline can execute in batch mode, which you can use to lower costs or analyze historical data. Now, consistently processing streaming data at large scale doesn’t have to be a complex and brittle endeavor that’s reserved for the most critical scenarios.

Google Cloud Platform delivers these characteristic by making data analysis quick, affordable and easy. Today, at the Hadoop Summit in Brussels, we announced that our big data services are taking a big step forward – allowing everyone to use big data the cloud way.

Google Cloud Dataflow now available in beta

Today, nothing stands between you and the satisfaction of seeing your processing logic, applied in your choice of streaming or batch mode, executed via a fully managed processing service. Just write a program, submit it, and Cloud Dataflow will do the rest. No clusters to manage – Cloud Dataflow will start the needed resources, autoscale them (within the bounds you choose), and terminate them as soon as the work is done. You can get started right now.

Google BigQuery has many new features and is now available in European zones

BigQuery, the quintessential cloud-native, API-driven service for SQL analytics, has new security and performance features. For example, the introduction of row-level permissions makes data sharing even easier and more flexible. With its ease of ingestion (we’ve raised the default ingestion limit to 100,000 rows per second per table), virtually unlimited storage, and fantastic query performance even for huge datasets, BigQuery is the ideal platform for storing, analyzing and sharing structured data. It also supports repeated records and querying inside JSON objects for loosely structured data. In addition, starting today, BigQuery now offers the option to store your data in Google Cloud Platform European zones. You can contact Google technical support today to use this option.

A comprehensive set of big data services

Google Cloud Pub/Sub is designed to provide scalable, reliable and fast event delivery as a fully managed service. Along with BigQuery streaming ingestion and Cloud Dataflow stream processing, it completes the platform’s end-to-end support for low-latency data processing. Whether you’re processing customer actions, application logs or IoT events, Google Cloud Platform allows you to handle them in real time, the cloud way. Leave Google Cloud Platform in charge of all the scaling and administration tasks so you can focus on what needs to happen, not how.

Using big data the cloud way doesn’t mean that Hadoop, Spark, Flink and other open source tools originally created for on-premises can’t be used in the cloud. We’ve ensured that you can benefit from the richness of the open source big data ecosystem via native connectors to Google Cloud Storage and BigQuery along with an automated Hadoop/Spark cluster deployment.

Google BigQuery customer zulily joined us recently for a big data webinar to share their experience using big data the cloud way and how it helped them increase revenue and overall business visibility while decreasing their operating costs. If you’re interested in exploring these types of benefits for your own company, you can easily get started today by running your first query on a public dataset or uploading your own data.

Here’s a simplified illustration of how Google Cloud Platform data processing services relate to each other and support all stages of the data lifecycle:

Scuba equipment helps humans operate under water, but divers still fall hopelessly short of the efficiency and agility of marine creatures. When it comes to big data in the cloud, be a dolphin, not a scuba diver. Google Cloud Platform offers a set of powerful, scalable, easy to use and efficient big data services built for the cloud. Embrace big data, the cloud way, by taking advantage of them today.

Learn more about Google Cloud Platform’s big data solutions or get started with Dataflow and BigQuery today. We can’t wait to see what you achieve when you use big data the cloud way.

-Posted by William Vambenepe, Product Manager

A feature length animated movie takes up to 100 million compute hours to render. 100 million.

When you hear the two words “Google” and “media,” what pops into your mind? YouTube, right? Well, as I’m excited to explain, media means much more than “YouTube” at Google. The media and entertainment industry is a key area of focus for Google Cloud Platform. As I’ll be sharing in my keynote address for the cloud conference at the 97,000-attendee NAB Show in Las Vegas on Tuesday, we’re rapidly expanding our platform and our partner ecosystem to uniquely solve media-specific challenges. In addition to my keynote, Patrick McGregor and Todd Prives from my team are participating in panel sessions on cloud security and cloud rendering. And as part of the recent Virtual NAB conference, Jeff Kember and Miles Ward from Google shared their insights.

We’re witnessing massive changes in the ways media companies are creating, transforming, archiving and delivering content, using the power of the cloud

We recognize that Google Cloud Platform best supports the media industry when we deliver capabilities that are tailored to specific workflow patterns. Great examples of these capabilities are our services for visual effects rendering. Aside from the skilled work that an artist puts into modeling, animating and compositing a realistic scene, the compute demands required to produce these images are often staggering. Even a relatively simple visual effects shot or animation can take several hours to render the 24 individual frames that make up one second of video.

Google Cloud Platform can greatly accelerate and simplify rendering while charging only for the processor cycles and bits that are consumed. For customers looking for an end-to-end rendering solution, we offer Google Zync Render. Beta launched in the first quarter of 2015, Zync is a turnkey service for small and medium-sized studios. It integrates directly with existing on-premises software workflows to feel as natural and responsive as a local render farm. Also, through our collaborations with The Foundry and others, Google Cloud Platform provides tools used in the creation of some of the highest-grossing movies.

Zync Render Workflow

By using Google Cloud Platform’s cost-efficient compute and storage, studios can seamlessly extend their rendering pipelines to handle burst capacity needs and remove the bottlenecks typically associated with production deadlines. We’re already seeing great successes from media customers like Framestore, RodeoFX, iStreamPlanet, Panda, and Industriromantik.

We’ve also built compelling general platform capabilities that help media companies with all stages of workflow and the content lifecycle. One example is Google Cloud Nearline storage, which is a service that allows a virtually unlimited volume of data to be stored at very low costs with retrieval times on the order of seconds – not hours as you would experience with tape. This is ideal for media content archiving. We also recently launched 32-core VM instances for compute-intensive workloads that crunch large volumes of content. And, yesterday, we announced a collaboration with Avere Systems that enables us to bridge cloud storage and on-premises storage without impacting performance. This opens huge opportunities for creative collaboration and content production.

Please join us this week for NAB, we hope to see you in Las Vegas!

-Posted by Brian Stevens, VP for Google Cloud Platform

Today’s enterprise must focus on running key workloads both on-premises locally and remotely in the cloud. There is simultaneously the need to keep the quality of service high for end-users in terms of network latency and reliability, and the need to ensure efficiency and security for your company’s hybrid workloads – particularly workloads that are bandwidth-intensive or latency-sensitive. Raw performance, reliability and security have been major focus areas for Google from the start, and our goal with Google Cloud Platform is to share the benefits of continuous networking innovation with our customers.

We have four announcements today in support of two major technical goals. The first is to use Google’s global network footprint – over 70 points of presence across 33 countries – to serve users close to where they are, ensuring the same low latency and responsiveness customers can expect from Google’s own services. The second goal relates to enabling enterprises to run mission-critical workloads by connecting their on-premises infrastructure to Google’s network with enterprise-grade encryption.

Today we're announcing:

  • General Availability of Cloud DNS
  • Expansion of our load balancing solutions to 12 additional points of presence globally (Los Angeles, San Francisco, Chicago, Seattle, Dallas, Miami, London, Paris, Stockholm, Munich, Madrid, Lisbon)
  • Beta of VPN
  • 11 additional Carrier Interconnect service providers

Managed DNS

With Cloud DNS – our high performance, managed DNS solution for user-facing applications and services – you can host millions of zones and records and handle SLA-backed name-serving queries. For customers with more than 10,000 zones, our new pricing tier lowers the cost of ownership for large organizations operating DNS infrastructure at scale.

Global Load Balancing

Today’s connected user is accustomed to fast and responsive application services, be they web services accessed from a browser or apps on a mobile device. Latency (“lag”) is noticeable immediately, especially as users switch from a fast, optimized service to a slow one. With the expansion of Google’s load balancing solution to 12 additional locations, your workloads running on Google Cloud Platform are closer in proximity to your users who are making service requests from all over the globe.

Additional Carrier Interconnect service providers and VPN Beta

We continue to build on our goal of enabling enterprises to connect their on-premises infrastructure to Google’s network over encrypted channels to run data-intensive, latency-sensitive workloads. In addition to announcing the beta for Cloud VPN, we’re pleased to introduce 11 additional Carrier Interconnect service providers. Our growing list of technology partners extends our reach to customer locations globally while providing tailored connectivity and choice.

iStreamPlanet is one such customer who has taken advantage of our infrastructure breadth to make high-quality connections into the Google network. iStreamPlanet recently launched Aventus, its SaaS-based product that enables content owners to serve high-quality live video with simplicity to viewers across devices. Running on Google Cloud Platform, iStreamPlanet is able to create live video events for its customers in minutes rather than days, and has lowered bandwidth costs by more than 40 percent using Google Cloud Platform’s Direct Peering offering.

We’d also like to welcome CloudFlare as a Google Cloud Platform Technology Partner. CloudFlare provides website speed optimization, security and DDOS protection, as well as caching solutions over its globally distributed network. With nearly no setup required, CloudFlare reports speed optimizations that result in content loading twice as fast on average for visitors.

Google’s network, built out over the past 15 years, is a key enabler behind the services relied upon every day by our customers and our users – from Search to Maps, YouTube to Cloud Platform. We invite you to contact us to explore how we can make Google’s network an extension of your own, or to learn about your specific needs around serving your users wherever they may be globally. You can read more about Google Cloud Networking here.

-Posted by Morgan Dollard, Cloud Networking Product Management Lead

Today’s guest post comes from Ed Byrne, Director at Panda – a cloud-based video transcoding platform. To learn more about how Panda uses Google Cloud Platform, watch their case study video.

Panda makes it easy for video producers to encode their video in multiple formats for different mobile device screen sizes. But delivering blazing fast, high-quality videos to customers is no easy task – especially when your engineers are also dealing with infrastructure. Google Cloud Platform features like Live Migration and Autoscaler have allowed us to cut our infrastructure maintenance load to only half of a developer.

With more resources to direct at innovation, we can put our focus on our customers, making their experience better with new and improved features in Panda. In fact, since relying on Google Cloud Platform for underlying infrastructure, we’ve developed our frame rate conversion by motion compensation technology. Our customers love the video quality they get using this feature, and we’re so excited about it, we agreed to give you the low down on how it works.

Introduction to motion compensation

Motion compensation is a technique that was originally used for video compression, and now it’s used in virtually every video codec. Its inventors noticed that adjacent frames usually don’t differ too much (except for scene changes), and then used that fact to develop a better encoding scheme than compressing each frame separately. In short, motion-compensation-powered compression tries to detect movement that happens between frames and then use that information for more efficient encoding. Imagine two frames:
Panda on the left...
aaaand on the right
Now, a motion compensating algorithm would detect the fact that it’s the same panda in both frames, just in different locations:
First stage of motion compensation: motion detection
We’re still thinking about compression, so why would we want to store the same panda twice? Yep, that’s what motion-compensation-powered compression does – it stores the moving panda just once (usually, it would store the whole frame #1), but it adds information about movement. Then the decompressor uses this information to construct remaining information (frame #2 based on frame #1).

That’s the general idea, but in practice it’s not as smooth and easy as in the example. The objects are rarely the same, and usually some distortions and non-linear transformations creep in. Scanning for movements is very expensive computationally, so we have to limit the search space and optimize the code, even resorting to hand-written assembly.

Frame rate conversion by motion compensation

Motion compensation can be used for frame rate conversion too, often with really impressive results.

For illustration, let’s go back to the moving panda example. Let’s assume we want to change the frame rate from two frames per second (FPS) to three FPS. In order to maintain the video speed, each frame will be on screen for a shorter amount of time (.5 sec vs .33 sec).

One way to increase the number of frames is to duplicate a frame, resulting in three FPS, but the quality will suffer. As you can see, frame #1 has been duplicated:
Converting from 2 FPS to 3 FPS by duplicating frames
Yes, the output has three frames and the input has two, but the effect isn’t visually appealing. We need a bit of magic to create a frame that humans would see as naturally fitting between the two initial frames – panda has to be in the middle. That’s a task motion compensation could deal with – detect the motion, but instead of using it for compression, create a new frame based on the gathered information. Here’s how it should work:
Converting from 2 FPS to 3 FPS by motion compensation: Panda's in the middle!
Notice that by creating a new frame, we keep our panda hero at the center.

Now for video examples, taken straight from a Panda encoder. Here’s what frame duplication (the bad guy) looks like in action (for better illustration, after converting FPS, we slowed down the video):


While the video on the left is very smooth, the frame duplicated version on the right is jittery. Not great. Now, what happens when we use motion compensation (the good guy):


The movement’s smooth and outside of slight noise, we don’t catch glimpse of any video artifacts.

There are other types of footage that fool the algorithm more easily. Motion compensation assumes simple, linear movement, so other kinds of image transformations can produce heavier artifacts that may or may not be acceptable, depending on the use case. Occlusions, refractions – you see these in water bubbles – and very quick movements, which means that too much happens between frames, are the most common examples of image transformations that can produce lower visual quality. Here’s a video full of occlusions and water:


Now let’s slow it down and see frame duplication and motion compensation side-by-side.


Motion compensation produces clear artifacts (those fake electric discharges), but still maintains higher visual quality than frame duplication.

The unilateral verdict of a short survey we shared in our office: motion compensation produces much better imaging than frame duplication.

Google Cloud Platform products like Google Compute Engine allowed us to improve performance in encoding by 30%, as well as shift our energy from focusing on underlying infrastructure to innovating for our customers. We’ve also been able to take advantage of sustained use discounts, which have helped lower our infrastructure costs, without the need to sign contracts or reserve capacity. Google’s network performance is also a huge asset for us, given video files are so large and we need to move them frequently. To learn more about how we’re using Cloud Platform, watch our video.

Panda’s excited to be at this year’s NAB show, one of the world’s largest gatherings of technologists and digital content providers. They’ll be in the StudioXperience area with Filepicker in the South Upper Hall, SU621.

Whether you’re managing a small website running on a single machine or a large service running on thousands of virtual machines, Chef makes it easy to automate configuration, deployment and management. Chef relies on reusable definitions called recipes for expressing system configuration, coupled with a client/server framework for distributing and enforcing the configuration. In a typical web service deployment, for example, you can use recipes to define how you’d like to configure the load balancer, Apache web servers and MongoDB servers, and then easily deploy the recipes to your service’s virtual machines. Chef has a large active community with 60,000 members and has been downloaded more than 12 million times.

Today, we’re making it even easier to deploy Open Source Chef on Google Compute Engine with Click to Deploy. Now you can quickly set up a Chef server to provision and manage resources on Compute Engine. This adds even more speed and ease to building and deploying cloud environments. "Fast high-quality software delivery requires automation and cloud technologies in combination with DevOps practices,” said Ken Cheney, vice president of business development for Chef. “Our integration with Google Cloud Platform delivers a seamless experience for rapidly building and deploying cloud environments to accelerate software development. ”

Learn more about Chef or deploy a Chef server today. Please feel free to let us know what you think about this feature. You can also contact Chef for formal training. Deploy away!

-Posted by Pratul Dublish, Technical Program Manager