Google unveils a host of open data and AI advancements at Cloud Next

Technology

When it comes to tech products, concepts are often more elegant than reality. Capabilities and functions that sound logical and straightforward often prove to be much more complicated or arduous than they first appear.

Part of the problem, of course, is that many of the most advanced technologies are complex, and it can be very difficult to bring them to life. But an even more common problem is that pre-existing requirements aren’t fully explained, or the number of steps required can prove to be much more challenging than first appears.

To put it simply, “the devil is in the details.”

This is true of many cloud and AI technologies. High-level product ideas, such as the ability to quickly analyze any type of data to help generate artificial intelligence (AI) or machine learning (ML)-driven models using the new types of hardware accelerators, have been talked about for years.

As Google made clear through several announcements at its Cloud Next event, however, there are a lot of important details that need to be in place in order for these ideas to become reality.

To start with, not all data analysis tools and data platforms can work with any type of data. That’s why the ability to import, or ingest, new and different format data types into a wider range of analytics tools is so important. Opening up the ability for data platforms like Elastic to get access to data stored on Google Cloud, and Google bringing support for Elastic into its newly expanded Looker line of business analytics tools, are just two of the many open data-related announcements made at Cloud Next.

Similarly, different types of data are often stored in different formats, and analytics tools have to specifically enable support for these data structures in order to make them more useful to a wider variety of users and application developers.

In the growing field of data lakehouses, for example, where large “lakes” of unstructured data, such as video and audio, are allowed to be queried with the kinds of tools found in structured data warehouses, the open-source Apache Iceberg table format is becoming increasingly popular.

That’s why Google added support for it and other formats, including Delta and Hudi, to its BigLake storage engine and added support for analyzing unstructured data to its BigQuery data analytics tools. Not only does this provide additional flexibility, but it also means unstructured data can leverage other Google Cloud Platform (GCP) Big Query tools, including ML functions like speech recognition, computer vision, text processing and more.

Another important area of development has to do with the use of various types of hardware accelerator chips to improve AI model performance. Google has created several generations of TPUs (tensor processing units), for example, that offer important benefits for applications like AI model training or inference. In addition, there have been many recent announcements from established semiconductor companies like Intel, AMD, Nvidia, and Qualcomm as well as a slew of chip startups focused on this burgeoning area.

As you might expect, each of those chip companies use different techniques to perform the acceleration of AI and ML models. What isn’t as clear is that the methods necessary to write software or create models for the different accelerators is also proprietary. As a result, it can be challenging for software developers and AI/ML model creators to take advantage of these different chips because of how difficult it can be to learn all these unique approaches.

To address this, a couple of Google’s more intriguing announcements from Cloud Next are the launch of a new industry consortium called the OpenXLA Project and the debut of new open-source software tools designed to ease the process of working with multiple different types of hardware accelerators.

OpenXLA is designed to increase the flexibility of choices that AI/ML developers have by providing connections between most of the popular front-end frameworks used for building AI models—including TensorFlow, PyTorch and JAX—and a host of different hardware accelerator backends. The initial software tools being released include an upgraded XLA compiler and a portable set of ML computing operations called StableHLO.

Companies that have also joined Google in the initiative include Intel, Amazon Web Services, AMD, Nvidia, Arm, Meta and more. The inclusion of Intel is interesting because in many ways, the goal of the OpenXLA Project is similar to Intel’s own OneAPI, which is targeted at allowing developers to leverage Intel’s several types of computing architectures such as CPUs, GPUs, and Habana Gaudi AI accelerators, without having to learn how to program for each of the different chip types. OpenXLA takes that concept to an industry-wide level and, thanks to the inclusion of many key cloud computing players, should open up a number of important new opportunities, and speed along the adoption of hardware accelerators.

Like many of the announcements Google made at Cloud Next, the real-world benefits of the OpenXLA Project and the tools associated with it will take some time to make a significant impact. In the big picture of tech industry trends, these tools may seem a bit modest on their own. Collectively, however, they represent very important steps forward and are indicative of the kinds of efforts Google is making to make its tools more useful to a wider audience of people.

They also reflect a strong emphasis on open-source tools and a desire to make to make its Google Cloud Platform and related offerings more transparent and more flexible. The process of leveraging all the technology tools that Google offers is still undoubtedly complex, but with the broad collection of announcements that the company unveiled at Cloud Next, it is clear that the company’s evolution as a major cloud provider continues to advance.