Trong thời đại “Big Data” thì các kho dữ liệu (Data Warehouse) truyền thống gặp…
BigQuery's AI-assisted data preparation is now in preview
In today's data-driven world, the ability to efficiently transform raw data into actionable insights is paramount. However, preparing and processing data is often a significant undertaking. And the advent of Google BigQuery’s AI-assisted data preparation has opened a new chapter that promises to revolutionize the way we work with data. By automating tedious tasks and boosting analytics, BigQuery is helping businesses get the most value from their data.
Now in preview, BigQuery data preparation provides a number of capabilities:
- AI-powered suggestions: BigQuery’s AI-assisted data preparation use Gemini to analyze your data and schema and provide intelligent suggestions for cleaning, transforming, and enriching the data. This significantly reduces the time and effort required for manual data preparation tasks.
- Data cleansing and standardization: Easily identify and rectify inconsistencies, missing values, and formatting errors in your data.
- Visual data pipelines: The intuitive, low-code visual interface helps both technical and non-technical users easily design complex data pipelines, and leverage BigQuery's rich and extensible SQL capabilities.
- Data pipeline orchestration: Automate the execution and monitoring of your data pipelines. The SQL generated by BigQuery data preparation can become part of a Dataform data engineering pipeline that you can deploy and orchestrate with CI/CD, for a shared development experience.
BigQuery data preparation helps you ensure the accuracy and reliability of your data, leading to more informed business decisions. BigQuery data preparation automates data quality checks and integrates with other Google Cloud services such as Dataform and Cloud Storage, providing a unified and scalable environment for your data needs.
How does it work?
Getting started is easy. When you sample a BigQuery table in BigQuery data preparation, it uses state-of-the-art foundation models to evaluate the data and schema using Gemini in BigQuery to generate data preparation recommendations like filter and transformation suggestions. For example, it knows how to identify valid date formats by country and which columns can act as join keys, accelerating the data engineering process.
In the above example (using synthetic data), the Birthdate column contains two different date formats and is of type STRING. BigQuery data preparation suggests to “ Convert column Birthdate from type string to date with the following format(s): '%Y-%m-%d','%m/%d/%Y ”. After you apply the suggestion card, you can verify the transformed preview data in a DATE format column.
With BigQuery’s AI-assisted data preparation, you can
- Significantly reduce time spent discovering data quality issues and cleaning data by leveraging Gemini-assisted suggestion cards
- Customize your own suggestion cards by providing an example in the data grid
- Increase operational efficiency by deploying data preparation with incremental data processing
What BigQuery customers are saying
Customers are already solving numerous challenges with BigQuery data preparation.
GAF is a major manufacturer of roofing materials in North America, and is adopting data preparation for creating data transformation pipelines on BigQuery.
“GAF is looking to modernize the ETL infrastructure and adopt a BigQuery native, low-code solution. BigQuery data preparation will help our skilled business users and the analytics team in the data preparation processes for the enablement of self-service analytics.” - Puja Panchagnula, Management Director - Enterprise Data Management & Analytics, GAF
mCloud Technologies helps businesses in sectors like energy, buildings, and manufacturing to optimize the performance, reliability, and sustainability of their assets.
“We receive data feeds from our partners. BigQuery data preparation allows our product managers to prepare and operate the file data feeds with little to no help from our data engineering team.” - Jim Christian, Chief Product and Technology Officer, mCloud Technologies
Public Value Technologies is a joint venture between two German public broadcasting organizations (ARD).
“Public Value Technologies receives data feeds from our media partners for our data mesh solution and AI applications. BigQuery data preparation allows our data analysts and scientists to rapidly integrate the data feeds that standardize and preprocess the data in a low code way.” - Korbinian Schwinger, Team Lead Data Engineer, Public Value Technologies
Getting started
With its powerful AI capabilities, intuitive interface, and tight integration with the Google Cloud ecosystem, BigQuery data preparation is set to revolutionize the way organizations manage and prepare their data. By automating tedious tasks, improving data quality, and empowering users, this innovative solution reduces the time you spend preparing data and improves your productivity.
To get started with BigQuery data preparation, explore the following resources: