Summary and Schedule

Already know how to train an ML model in Python but haven’t used the cloud? This hands-on workshop gets you running ML/AI workloads on Google Cloud Platform (GCP) — no prior cloud experience required. By the end, you’ll be able to move a local training workflow into GCP’s Vertex AI platform and take advantage of cloud-scale hardware and managed services.

What you’ll learn:

Cloud-based notebooks — Set up a Vertex AI Workbench notebook as your development environment and cloud controller.
Data in the cloud — Upload datasets to Cloud Storage and connect them to your training code.
Scalable model training — Launch custom training jobs on cloud GPUs/CPUs with your own PyTorch (or other framework) code.
Hyperparameter tuning — Run parallel tuning jobs in Vertex AI to efficiently search for optimal model settings.
RAG pipelines — Build a retrieval-augmented generation pipeline using Google’s Gemini models with grounding via Google Search.
Cost management — Monitor spending, set budget alerts, and clean up resources to avoid surprise bills.

Prerequisites

This workshop assumes you have a fundamental ML/AI background. Specifically, you should be comfortable with:

Python — writing scripts, using packages like pandas and NumPy. New to Python? See the Intro to Python workshop.
Core ML/AI concepts — train/test splits, overfitting, loss functions, hyperparameters. New to ML/AI? See the Intro to Machine Learning workshop.
Training a model — you’ve trained at least one model in any framework (scikit-learn, PyTorch, TensorFlow, XGBoost, etc.).
Command line basics — navigating directories, running commands in a terminal.

No prior GCP or cloud experience is required — that’s what this workshop teaches.

Setup Instructions

Download files required for the lesson

00h 00m

1. Overview of Google Cloud for Machine Learning and AI

Why would I run ML/AI experiments in the cloud instead of on my laptop or an HPC cluster?
What does GCP offer for ML/AI, and how is it organized?
What is the “notebook as controller” pattern?

00h 12m

2. Notebooks as Controllers

How do you set up and use Vertex AI Workbench notebooks for machine learning tasks?
How can you manage compute resources efficiently using a “controller” notebook approach in GCP?

00h 42m

3. Data Storage and Access

How can I store and manage data effectively in GCP for Vertex AI workflows?
What are the advantages of Google Cloud Storage (GCS) compared to local or VM storage for machine learning projects?
How can I load data from GCS into a Vertex AI Workbench notebook?

01h 32m

4. Training Models in Vertex AI: Intro

What are the differences between training locally in a Vertex AI notebook and using Vertex AI-managed training jobs?
How do custom training jobs in Vertex AI streamline the training process for various frameworks?
How does Vertex AI handle scaling across CPUs, GPUs, and TPUs?

02h 12m

5. Training Models in Vertex AI: PyTorch Example

When should you consider a GPU (or TPU) instance for PyTorch training in Vertex AI, and what are the trade‑offs for small vs. large workloads?
How do you launch a script‑based training job and write all artifacts (model, metrics, logs) next to each other in GCS without deploying a managed model?

02h 42m

6. Hyperparameter Tuning in Vertex AI: Neural Network Example

How can we efficiently manage hyperparameter tuning in Vertex AI?
How can we parallelize tuning jobs to optimize time without increasing costs?

03h 32m

7. Retrieval-Augmented Generation (RAG) with Vertex AI

How do we go from “a pile of PDFs” to “ask a question and get a cited answer” using Google Cloud tools?
What are the key parts of a RAG system (chunking, embedding, retrieval, generation), and how do they map onto Vertex AI services?
How much does each part of this pipeline cost (VM time, embeddings, LLM calls), and where can we keep it cheap?

04h 02m

8. Bonus: CLI Workflows Without Notebooks

How do I submit Vertex AI training jobs from the command line instead of a Jupyter notebook?
What does authentication look like when working outside of a Workbench VM?
Can I manage GCS buckets, training jobs, and endpoints entirely from a terminal?

04h 27m

9. Resource Management & Monitoring on Vertex AI (GCP)

How do I monitor and control Vertex AI, Workbench, and GCS costs day‑to‑day?
What specifically should I stop, delete, or schedule to avoid surprise charges?
How do I set budget alerts so cost leaks get caught quickly?

05h 07m

Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.

Setup (Complete Before the Workshop)

Before attending this workshop, you’ll need to complete a few setup steps to ensure you can follow along smoothly. The main requirements are:

GCP Access – Use the shared Google Cloud project provided by RCI and ML+X (standard for UW-Madison workshops) or sign up for a personal GCP Free Tier account.
Titanic Dataset – Download the required CSV files in advance.
(Optional) Google Cloud Skills Boost — For a broader overview of GCP, visit the Getting Started with Google Cloud Fundamentals course.
(Optional) GitHub Account — Only needed if you want to push your work back to a fork. See the GitHub PAT guide for details.

Details on each step are outlined below.

2. GCP Access

There are two ways to get access to GCP for this lesson. Please wait for a pre-workshop email from the instructor to confirm which option to choose.

Option 1) Shared Google Cloud Project (UW-Madison workshops)

When this workshop is taught at UW-Madison (e.g., Machine Learning Marathon, Research Bazaar), the instructors provide access to a shared GCP project courtesy of RCI (Research Cyberinfrastructure) and ML+X. You do not need to set up your own account or billing.

How access works: The instructors will add your Google account to a Google Group that has the necessary permissions on the shared project. Once you’re added, GCP will recognize your membership and grant access — this can take 5–15 minutes to propagate. If possible, the instructors will add you the day before the workshop so everything is ready at start time.

What to expect:

During the lesson, you will log in with your Google account credentials and select the shared GCP project.
If you can’t see the project right away, wait a few minutes for permissions to propagate. Try refreshing the page or opening an incognito/private browser window. Make sure you’re logged into the correct Google account (the one the instructors added to the group).
This setup ensures that all participants have a consistent environment and avoids unexpected billing for attendees.
Please use shared credits responsibly — they are limited and reused for future training events.
- Stay within the provided exercises and avoid launching additional compute-heavy workloads (e.g., training large language models).
- Do not enable additional APIs or services unless instructed.

Option 2) GCP Free Tier — Skip If Using Shared Project

If the instructors aren’t providing a shared account environment, please follow these instructions:

Go to the GCP Free Tier page and click Get started for free.
Complete the signup process. The Free Tier includes a $300 credit valid for 90 days and ongoing free usage for some smaller services.
Once your account is ready, log in to the Google Cloud Console.
During the lesson, we will enable only a few APIs (Compute Engine, Cloud Storage, and Notebooks).

Following the lesson should cost well under $15 total if you are using your own credits.

3. Download the Data

For this workshop, you will need the Titanic dataset, which can be used to train a classifier predicting survival.

Please download the following zip file (Right-click → Save as):
data.zip
Extract the zip folder contents (Right-click → Extract all on Windows; double-click on macOS).
Save the two data files (train and test) somewhere easy to access, for example:
- ~/Downloads/data/titanic_train.csv
- ~/Downloads/data/titanic_test.csv

In Episode 3, you will create a Cloud Storage bucket and upload this data to use with your notebook.

4. (Optional) Google Cloud Skills Boost — Getting Started with Google Cloud Fundamentals

If you want a broader introduction to GCP before the workshop, consider exploring the Getting Started with Google Cloud Fundamentals self-paced learning path. It covers the basics of the Google Cloud environment, including project structure, billing, IAM (Identity and Access Management), and common services like Compute Engine, Cloud Storage, and BigQuery. This step is optional but recommended for those that want a broader overview of GCP before diving into ML/AI use-cases.