Trong thế giới kinh doanh hiện đại, các giải pháp công nghệ đã trở thành…
Companies' Data teams face ongoing challenges in consolidating, processing, and making data useful. They deal with challenges such as a mix of ETL jobs, on-premises data warehouses with long ETL capacity limits, and growing demand from users. They also need to ensure that the downstream requirements of the ML, reporting and analysis are met with the data processing. And they need to plan for the future – how will more data be handled and how will new downstream groups be supported?
- Analyze Big Data with BigQuery and Google Sheets
- Introducing Datastream for Google BigQuery
- Optimizing BigQuery with data sources in Google Cloud VMware Engine
Why use BigQuery?
On premises data warehouses become difficult to scale so the biggest goal for most companies is to create a future system for secure, scalable and efficient data storage. cost. GCP's BigQuery is serverless, highly scalable, and cost-effective, and is a great fit for an EDW use case. It's a multi-cloud data warehouse designed for business agility. However, migrating a large, highly integrated data warehouse from an on-premise data warehouse to BigQuery not a simple conversion. You need to ensure that your downstream system doesn't crash as a result of inconsistent data set migrations, both during and after the migration. So you have to plan your move.
Data warehouse migration strategy
The following steps are typical for a successful migration:
- Evaluate and plan: Pre-scoping to plan legacy data warehouse migrations
- Define data sets, patterns, and application accessibility
- Use tools and utilities to identify unknown levels of complexity and dependencies
- Identify required application conversions and test
- Determine initial processing and storage capacity for budget forecasting and capacity planning
- Consider growth and anticipated changes during the migration period
- Formulate future state strategy and vision to guide design
- Migration: Setting up the platform GCP and start moving
- The cloud is being established, consider running centralized POCs to validate processes and data migration
- Look for automated utilities to help with any forced code migrations
- Plan to maintain data synchronization between the legacy EDW and the target for the duration of the migration. This becomes an important business process to keep the project on schedule.
- Plans to integrate several enterprise tools to help existing teams scale both environments
- Review current data access patterns among communities of EDW users and how they map to similar controls available in Big Query.
- Main scope includes code integration and data model transformation
- Expect to refine capacity forecasts and refine attribution designs. In Big Query, there are many options for balancing cost and performance to maximize business value. For example, you can use on-demand or price permanent or a combination of both.
- Evaluate and plan: Pre-scoping to plan legacy data warehouse migrations
- Validate and check
- Look for tools to enable intelligent, automated data validation
- Scope must include schema and data validation
- Ideally, solutions would allow for continuous validation from source to target system during migration
- Test complexity and duration will be driven by the number and complexity of applications using data from EDW and the rate of change of those applications
The key to a successful migration is finding Google Cloud partners who have experience migrating EDW workloads. For example, our Google Cloud partner Datametica provide services and Specialized Movement Accelerator for each of these migration phases to help plan and execute the migration more efficiently
Data Warehouse Migration: Things to Consider
- Financial benefits of open source: Target moves to 'Open Source' where no services have license fees. For example: BigQuery use standard SQL; Cloud Composer managed by Apache Airflow, . based on Apache Beam. Considering this as managed services provides the financial benefits of open source, but avoids the burden of maintaining open source platforms internally.
- Serverless: Migrate to “serverless” big data services. The majority of services used in the proposed GCP data architecture scale on demand allowing for more cost-effective aligning with demand. Using fully managed services allows you to focus your engineering time on business process priorities, not building and maintaining infrastructure.
- Efficiency of a unified platform: Bất kỳ quá trình di chuyển kho dữ liệu nào cũng liên quan đến việc tích hợp với các dịch vụ xung quanh EDW để nhập và xử lý trước dữ liệu và phân tích nâng cao trên dữ liệu được lưu trữ trong EDW để tối đa hóa giá trị kinh doanh. Một nhà cung cấp đám mây như GCP cung cấp toàn bộ các dịch vụ ‘dữ liệu lớn’ được tích hợp và quản lý bằng tích hợp machine learning. Điều này có thể làm giảm đáng kể TCO dài hạn bằng cách tăng cả hiệu quả hoạt động và chi phí khi so sánh với các giải pháp điểm cụ thể EDW.
- Establish a solid cloud foundation: From the beginning, take the time to design a secure platform that will serve the business and technical needs of the next workload. Key features include: Scalable Resource Hierarchy, Multi-layered Security, multi-tier networking and data center strategy, and automation using Infrastructure as code. Also allows time to integrate cloud-based services into existing enterprise systems such as CI/CD pipelines, monitoring, alerting, logging, process scheduling, and service request management service.
- Unlimited capacity expansion: Moving to the cloud sounds like a big step, but really think of it as adding more data centers to your teams. Of course, these data centers offer many new services that are difficult to develop in-house and offer virtually unlimited scalability with minimal upfront financial commitment. .
- Patient and temporary foundation: Migrating EDW is usually a long running project. Willing to design and operate transient platforms for data synchronization, validation, and application testing. Consider the impact on the up and down system. It makes sense to migrate and modernize these systems at the same time as the EDW migration as they can be both sources of data and sinks and may be facing similar growth challenges. Also available to meet new business requirements that develop during migration. Take advantage of the long window of time your existing operations teams learn new services from a partner leading the rollout so your teams are ready to take over after the move.
- Experienced Partner: Migrating EDW can be a huge undertaking with its challenges and risks during migration, but presents a tremendous opportunity to reduce costs, simplify operations, and provide significantly improved capacity. for internal and external EDW users. Choice suitable partner helps reduce technical and financial risks, and allows you to plan and can start taking advantage of these long-term benefits early in the migration process.
Data warehouse migration architecture examples
- Set of background elements. In GCP includes, IAM for authorization and access, cloud resource hierarchy, payment, networking, code pipelines, Infrastructure as code to use Cloud Build with Terraform ( GCP Foundation Toolkit), Cloud DNS and a Partner Dedicated Link to connect to existing data centers.
- Enable security scanning and monitoring services before real user data is loaded with Cloud Operations for monitoring and logging and Security Command Center for security monitoring.
- Trích xuất các tệp từ EDW kế thừa on premise và chuyển sang Cloud Storage và thiết lập đồng bộ hóa đang diễn ra bằng cách sử dụng BigQuery Transfer Services
- From Cloud Storage, process data in Dataflow and Load/export data to BigQuery.
- Validate exports with Datametica's validation utilities running in a cluster GKE and Cloud SQL to check and synchronize historical data as needed. Application teams check validated data sets during migration.
- Streamline the entire process using Cloud Composer, integrated with on-premises scheduling services as needed to leverage established processes and keep old and new systems in sync.
- Maintain close coordination with teams/services that ingest new data into EDW and downstream analytics teams that rely on EDW data for ongoing advanced analytics.
- Set up granular access controls for data sets and start making data available in BigQuery to existing application usage, visualization, and reporting tools using the BigQuery data connector for users to access and test 'down-stream'.
- Ascending Big Query's fixed rate processing capabilities to provide the most cost-effective use of resources during migration.
Learn more about migrating from on premises Enterprise Data Warehouse (EDW) to Bigquery and GCP here: https://cloud.google.com/architecture/dw2bq/dw-bq-migration-overview