skip to Main Content
Welcome to Gimasys!
Hotline: +84 974 417 099 (HCM) | +84 987 682 505 (HN) gcp@gimasys.com

How to migrate data warehouse on premises to BigQuery on Google Cloud

Companies' Data teams face ongoing challenges in consolidating, processing, and making data useful. They deal with challenges such as a mix of ETL jobs, on-premises data warehouses with long ETL capacity limits, and growing demand from users. They also need to ensure that the downstream requirements of the ML, reporting and analysis are met with the data processing. And they need to plan for the future – how will more data be handled and how will new downstream groups be supported?

Why use BigQuery?

On premises data warehouses become difficult to scale so the biggest goal for most companies is to create a future system for secure, scalable and efficient data storage. cost. GCP's BigQuery is serverless, highly scalable, and cost-effective, and is a great fit for an EDW use case. It's a multi-cloud data warehouse designed for business agility. However, migrating a large, highly integrated data warehouse from an on-premise data warehouse to BigQuery not a simple conversion. You need to ensure that your downstream system doesn't crash as a result of inconsistent data set migrations, both during and after the migration. So you have to plan your move. 

Data warehouse migration strategy

The following steps are typical for a successful migration: 

    • Evaluate and plan: Pre-scoping to plan legacy data warehouse migrations 
      • Define data sets, patterns, and application accessibility 
      • Use tools and utilities to identify unknown levels of complexity and dependencies
      • Identify required application conversions and test
      • Determine initial processing and storage capacity for budget forecasting and capacity planning 
      • Consider growth and anticipated changes during the migration period
      • Formulate future state strategy and vision to guide design
    • Migration: Set up the GCP platform and start migrating
      • The cloud is being established, consider running centralized POCs to validate processes and data migration
      • Look for automated utilities to help with any forced code migrations
      • Plan to maintain data synchronization between the legacy EDW and the target for the duration of the migration. This becomes an important business process to keep the project on schedule.
      • Plans to integrate several enterprise tools to help existing teams scale both environments
      • Review current data access patterns among communities of EDW users and how they map to similar controls available in Big Query. 
      • Main scope includes code integration and data model transformation
      • Expect to refine capacity forecasts and refine attribution designs. In Big Query, there are many options for balancing cost and performance to maximize business value. For example, you can use on-demand or price permanent or a combination of both. 
  • Validate and check
    • Look for tools to enable intelligent, automated data validation 
    • Scope must include schema and data validation
    • Ideally, solutions would allow for continuous validation from source to target system during migration
    • Test complexity and duration will be driven by the number and complexity of applications using data from EDW and the rate of change of those applications 

The key to a successful migration is finding Google Cloud partners who have experience migrating EDW workloads. For example, our Google Cloud partner Datametica provide services and Specialized Movement Accelerator for each of these migration phases to help plan and execute the migration more efficiently

How to migrate data warehouse on premises to BigQuery on Google Cloud

 

Data Warehouse Migration: Things to Consider

  • Financial benefits of open source: Target moves to 'Open Source' where no services have license fees. For example: BigQuery use standard SQL; Cloud Composer managed by Apache Airflow, . based on Apache Beam. Considering this as managed services provides the financial benefits of open source, but avoids the burden of maintaining open source platforms internally. 
  • Serverless: Migrate to “serverless” big data services. The majority of services used in the proposed GCP data architecture scale on demand allowing for more cost-effective aligning with demand. Using fully managed services allows you to focus your engineering time on business process priorities, not building and maintaining infrastructure. 
  • Efficiency of a unified platform: Any data warehouse migration involves integration with services around the EDW for data ingest and pre-processing and advanced analysis on the data stored in the EDW to maximize business value. A cloud provider like GCP offers a full range of 'big data' services integrated and managed with machine learning integration. This can significantly reduce long-term TCO by increasing both operational efficiency and cost when compared to EDW specific point solutions. 
  • Establish a solid cloud foundation: From the beginning, take the time to design a secure platform that will serve the business and technical needs of the next workload. Key features include: Scalable Resource Hierarchy, Multi-layered Security, multi-tier networking and data center strategy, and automation using Infrastructure as code. Also allows time to integrate cloud-based services into existing enterprise systems such as CI/CD pipelines, monitoring, alerting, logging, process scheduling, and service request management service. 
  • Unlimited capacity expansion: Moving to the cloud sounds like a big step, but really think of it as adding more data centers to your teams. Of course, these data centers offer many new services that are difficult to develop in-house and offer virtually unlimited scalability with minimal upfront financial commitment. . 
  • Patient and temporary foundation: Migrating EDW is usually a long running project. Willing to design and operate transient platforms for data synchronization, validation, and application testing. Consider the impact on the up and down system. It makes sense to migrate and modernize these systems at the same time as the EDW migration as they can be both sources of data and sinks and may be facing similar growth challenges. Also available to meet new business requirements that develop during migration. Take advantage of the long window of time your existing operations teams learn new services from a partner leading the rollout so your teams are ready to take over after the move. 
  • Experienced Partner: Migrating EDW can be a huge undertaking with its challenges and risks during migration, but presents a tremendous opportunity to reduce costs, simplify operations, and provide significantly improved capacity. for internal and external EDW users. Choice suitable partner helps reduce technical and financial risks, and allows you to plan and can start taking advantage of these long-term benefits early in the migration process.

 

How to migrate data warehouse on premises to BigQuery on Google Cloud

 

Data warehouse migration architecture examples

 

  • Set of background elements. In GCP includes, IAM for authorization and access, cloud resource hierarchy, payment, networking, code pipelines, Infrastructure as code to use Cloud Build with Terraform ( GCP Foundation Toolkit), Cloud DNS and a Partner Dedicated Link to connect to existing data centers.
  • Enable security scanning and monitoring services before real user data is loaded with Cloud Operations for monitoring and logging and Security Command Center for security monitoring. 
  • Extract files from legacy EDW on premise and transfer to Cloud Storage and set up ongoing sync using BigQuery Transfer Services
  • From Cloud Storage, process data in Dataflow and Load/export data to BigQuery. 
  • Validate exports with Datametica's validation utilities running in a GKE and Cloud SQL cluster to audit and synchronize historical data as needed. Application teams check validated data sets during migration. 
  • Streamline the entire process using Cloud Composer, integrated with on-premises scheduling services as needed to leverage established processes and keep old and new systems in sync. 
  • Maintain close coordination with teams/services that ingest new data into EDW and downstream analytics teams that rely on EDW data for ongoing advanced analytics. 
  • Set up granular access controls for data sets and start making data available in BigQuery to existing application usage, visualization, and reporting tools using the BigQuery data connector for users to access and test 'down-stream'. 
  • Ascending Big Query's fixed rate processing capabilities to provide the most cost-effective use of resources during migration. 

To learn more about migrating from on premises Enterprise Data Warehouse (EDW) to Bigquery and GCP here.

 

 

 

Source: Gimasys

Back To Top
0974 417 099