Summary and Schedule

Already know how to train an ML model in Python but haven’t used the cloud? This hands-on workshop gets you running ML/AI workloads on Google Cloud Platform (GCP) — no prior cloud experience required. By the end, you’ll be able to move a local training workflow into GCP’s Vertex AI platform and take advantage of cloud-scale hardware and managed services.

What you’ll learn:

  • Cloud-based notebooks — Set up a Vertex AI Workbench notebook as your development environment and cloud controller.
  • Data in the cloud — Upload datasets to Cloud Storage and connect them to your training code.
  • Scalable model training — Launch custom training jobs on cloud GPUs/CPUs with your own PyTorch (or other framework) code.
  • Hyperparameter tuning — Run parallel tuning jobs in Vertex AI to efficiently search for optimal model settings.
  • RAG pipelines — Build a retrieval-augmented generation pipeline using Google’s Gemini models with grounding via Google Search.
  • Cost management — Monitor spending, set budget alerts, and clean up resources to avoid surprise bills.

Prerequisites

This workshop assumes you have a fundamental ML/AI background. Specifically, you should be comfortable with:

  • Python — writing scripts, using packages like pandas and NumPy. New to Python? See the Intro to Python workshop.
  • Core ML/AI concepts — train/test splits, overfitting, loss functions, hyperparameters. New to ML/AI? See the Intro to Machine Learning workshop.
  • Training a model — you’ve trained at least one model in any framework (scikit-learn, PyTorch, TensorFlow, XGBoost, etc.).
  • Command line basics — navigating directories, running commands in a terminal.

No prior GCP or cloud experience is required — that’s what this workshop teaches.

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.

Setup (Complete Before the Workshop)


Before attending this workshop, you’ll need to complete a few setup steps to ensure you can follow along smoothly. The main requirements are:

  1. GCP Access – Use the shared Google Cloud project provided by RCI and ML+X (standard for UW-Madison workshops) or sign up for a personal GCP Free Tier account.
  2. Titanic Dataset – Download the required CSV files in advance.
  3. (Optional) Google Cloud Skills Boost — For a broader overview of GCP, visit the Getting Started with Google Cloud Fundamentals course.
  4. (Optional) GitHub Account — Only needed if you want to push your work back to a fork. See the GitHub PAT guide for details.

Details on each step are outlined below.

2. GCP Access

There are two ways to get access to GCP for this lesson. Please wait for a pre-workshop email from the instructor to confirm which option to choose.

Option 1) Shared Google Cloud Project (UW-Madison workshops)

When this workshop is taught at UW-Madison (e.g., Machine Learning Marathon, Research Bazaar), the instructors provide access to a shared GCP project courtesy of RCI (Research Cyberinfrastructure) and ML+X. You do not need to set up your own account or billing.

How access works: The instructors will add your Google account to a Google Group that has the necessary permissions on the shared project. Once you’re added, GCP will recognize your membership and grant access — this can take 5–15 minutes to propagate. If possible, the instructors will add you the day before the workshop so everything is ready at start time.

What to expect:

  • During the lesson, you will log in with your Google account credentials and select the shared GCP project.
  • If you can’t see the project right away, wait a few minutes for permissions to propagate. Try refreshing the page or opening an incognito/private browser window. Make sure you’re logged into the correct Google account (the one the instructors added to the group).
  • This setup ensures that all participants have a consistent environment and avoids unexpected billing for attendees.
  • Please use shared credits responsibly — they are limited and reused for future training events.
    • Stay within the provided exercises and avoid launching additional compute-heavy workloads (e.g., training large language models).
    • Do not enable additional APIs or services unless instructed.

Option 2) GCP Free Tier — Skip If Using Shared Project

If the instructors aren’t providing a shared account environment, please follow these instructions:

  1. Go to the GCP Free Tier page and click Get started for free.
  2. Complete the signup process. The Free Tier includes a $300 credit valid for 90 days and ongoing free usage for some smaller services.
  3. Once your account is ready, log in to the Google Cloud Console.
  4. During the lesson, we will enable only a few APIs (Compute Engine, Cloud Storage, and Notebooks).

Following the lesson should cost well under $15 total if you are using your own credits.

3. Download the Data

For this workshop, you will need the Titanic dataset, which can be used to train a classifier predicting survival.

  1. Please download the following zip file (Right-click → Save as):
    data.zip

  2. Extract the zip folder contents (Right-click → Extract all on Windows; double-click on macOS).

  3. Save the two data files (train and test) somewhere easy to access, for example:

    • ~/Downloads/data/titanic_train.csv
    • ~/Downloads/data/titanic_test.csv

In Episode 3, you will create a Cloud Storage bucket and upload this data to use with your notebook.

4. (Optional) Google Cloud Skills Boost — Getting Started with Google Cloud Fundamentals

If you want a broader introduction to GCP before the workshop, consider exploring the Getting Started with Google Cloud Fundamentals self-paced learning path. It covers the basics of the Google Cloud environment, including project structure, billing, IAM (Identity and Access Management), and common services like Compute Engine, Cloud Storage, and BigQuery. This step is optional but recommended for those that want a broader overview of GCP before diving into ML/AI use-cases.