This is the second part of a 3 parts series where I explain how you can build a cost-efficient and automated ML retraining system using Kubeflow Pipelines as the ML system orchestrator. In the first part readily available here, I focused on:
Let’s now shift gears toward a slightly more advanced topic. This writing covers how to share data between pipeline components.
When we task a Kubeflow Pipeline with some work, it’s often possible to break this work into multiple small pieces of…
This is the first part of a 3 parts series where I explain how you can build a cost-efficient and automated ML retraining system with Kubeflow. Along the way, we’ll also pick some best practices around building pipelines.
While Kubeflow Pipelines isn’t yet the most popular batch jobs orchestrator, a growing number of companies is adopting it to handle their data and ML jobs orchestration and monitoring. Actually, Kubeflow is designed to benefit from Kubernetes strengths and that’s what makes it very attractive.
According to an 2017 article by the MIT Sloan Management Review:
The gap between ambition and execution is large at most companies. Three-quarters of executives believe AI will enable their companies to move into new businesses. Almost 85% believe AI will allow their companies to obtain or sustain a competitive advantage. But only about one in five companies has incorporated AI in some offerings or processes.
Arguably, part of this gap is explained by ML models failing to transition from ML labs to high value real worlds products and services. What’s more, ML models that do make it to production…
If you are using GCP (Google Cloud Platform) to store your data, it’s very likely you are using BigQuery as your data warehouse or data lake solution (or as a part of it). Some solutions to store transactional data include Datastore, Cloud Spanner and Cloud SQL. In many cases, especially when migrating from on-premises to GCP, Cloud SQL is a natural choice for transactional data as it offers managed MySQL and PosgreSQL.
There are many ways you can go about synchronizing your Cloud SQL data to BigQuery. One solution is to use the federated queries with Cloud SQL BigQuery feature…
Machine Learning Engineer/ Data Engineer/ Google Cloud Certified