Hands-on Tutorials

Learn how to implement a cost-efficient and automated model retraining solution with Kubeflow Pipelines — Part 2

Photo by JOHN TOWNER on Unsplash

This is the second part of a 3 parts series where I explain how you can build a cost-efficient and automated ML retraining system using Kubeflow Pipelines as the ML system orchestrator. In the first part readily available here, I focused on:

  1. Building pipelines components
  2. Writing conditional components and externally triggered pipelines

Let’s now shift gears toward a slightly more advanced topic. This writing covers how to share data between pipeline components.

Data sharing mastery is crucial

When we task a Kubeflow Pipeline with some work, it’s often possible to break this work into multiple small pieces of…


Learn how to implement a cost-efficient and automated model retraining solution with Kubeflow Pipelines — Part 1

Photo by JOHN TOWNER on Unsplash

This is the first part of a 3 parts series where I explain how you can build a cost-efficient and automated ML retraining system with Kubeflow. Along the way, we’ll also pick some best practices around building pipelines.

While Kubeflow Pipelines isn’t yet the most popular batch jobs orchestrator, a growing number of companies is adopting it to handle their data and ML jobs orchestration and monitoring. Actually, Kubeflow is designed to benefit from Kubernetes strengths and that’s what makes it very attractive.

In this article, I’ll show you how you can build an automated and cost-efficient ML model retraining…


Hands-On model drift detection with Apache Airflow and model retraining with Google AI Platform

Photo by Sandra Tenschert on Unsplash

According to an 2017 article by the MIT Sloan Management Review:

The gap between ambition and execution is large at most companies. Three-quarters of executives believe AI will enable their companies to move into new businesses. Almost 85% believe AI will allow their companies to obtain or sustain a competitive advantage. But only about one in five companies has incorporated AI in some offerings or processes.

Arguably, part of this gap is explained by ML models failing to transition from ML labs to high value real worlds products and services. What’s more, ML models that do make it to production…


Secure your data and workloads using private IP connectivity and Cloud SQL proxy

Photo by Scott Webb on Unsplash

If you are using GCP (Google Cloud Platform) to store your data, it’s very likely you are using BigQuery as your data warehouse or data lake solution (or as a part of it). Some solutions to store transactional data include Datastore, Cloud Spanner and Cloud SQL. In many cases, especially when migrating from on-premises to GCP, Cloud SQL is a natural choice for transactional data as it offers managed MySQL and PosgreSQL.

There are many ways you can go about synchronizing your Cloud SQL data to BigQuery. One solution is to use the federated queries with Cloud SQL BigQuery feature…

Marc Djohossou

Machine Learning Engineer/ Data Engineer/ Google Cloud Certified

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store