Glossary

Last updated on 2025-10-24 | Edit this page

Understanding the terminology used in cloud computing and GCP is half the battle when working with Vertex AI and Workbench. Familiarity with these key concepts will help you navigate Google Cloud services, configure machine learning workflows, and troubleshoot issues more efficiently.

We encourage you to briefly study this glossary before the workshop and refer back to it as needed. While we’ll go over these terms throughout the workshop, early exposure will make the hands-on exercises smoother and faster.

Cloud Compute Essentials

  • Serverless: A way of running code without managing infrastructure. The cloud provider handles provisioning, scaling, and maintenance automatically, and you only pay when your code runs. In GCP, examples include Cloud Functions, Cloud Run, and Vertex AI Predictions, which can scale to zero when idle.
  • Virtual Machine (VM): A software-based computer that runs on Google’s Compute Engine infrastructure. Each Vertex AI Workbench notebook is ultimately backed by a Compute Engine VM, even if the environment looks fully managed.
  • Instance: A running VM in the cloud. In GCP, instances are defined by machine families (e.g., N2, C2, A2, A3) and can be customized for CPU, memory, and GPU needs. These are the same instance types you select for Vertex AI Workbench notebooks.
  • Container: A lightweight, isolated environment that packages code and dependencies together. Containers ensure consistent execution and are the foundation of services like Vertex AI Workbench (notebook containers), Cloud Run, and Kubernetes Engine.
  • Docker: The most common container platform used for building, shipping, and running containerized applications. Many GCP ML environments (like TensorFlow or PyTorch Workbench images) are built as Docker containers hosted on Google Container Registry (GCR) or Artifact Registry.
  • Elasticity: The ability of cloud resources to scale up or down automatically based on workload. GCP provides elasticity through autoscaling managed instance groups, Kubernetes Engine, and Vertex AI training services.

GCP General

  • Compute Engine (GCE): The core infrastructure service that provides customizable VMs. Vertex AI Workbench notebooks, training jobs, and many ML services are built on top of Compute Engine.
  • Vertex AI: A unified machine learning platform that integrates model training, tuning, deployment, and monitoring. It supports managed notebooks, training jobs, pipelines, and AutoML but can also run fully custom ML code.
  • Auto Scaling: A feature that automatically adjusts the number of VM instances in a managed instance group based on utilization metrics such as CPU or memory.

Account Governance and Security

  • IAM (Identity and Access Management): GCP’s permission management system. It defines who (user, service account, or group) can access what resources (Vertex AI, Storage, Compute Engine) and with what level of privilege.
  • Service Account: A special Google identity used by applications and services to access GCP resources securely. For example, a Vertex AI Workbench notebook uses a service account to read data from Cloud Storage or launch training jobs.
    • Relation to Cloud Storage Policies: Service accounts grant programmatic access at the project or resource level, while bucket-level permissions control access to individual Cloud Storage buckets. Both must align for data access to work.
  • Bucket Policy (Cloud Storage IAM Policy): Defines who can read, write, or manage objects in a Google Cloud Storage bucket. These policies are project- and bucket-scoped and often reference service accounts.
  • Access Control Lists (ACLs): A legacy way to manage fine-grained access for specific Cloud Storage objects. ACLs are now largely superseded by IAM policies but can still appear in legacy datasets.
  • Organization Policy Service: A GCP feature for defining constraints and policies across multiple projects (e.g., restricting region usage or service types). Similar to AWS Organizations, it supports centralized governance and billing.
  • Quotas and Limits: GCP places default usage caps (e.g., maximum number of CPUs or GPUs per region). Quotas can be increased through the Quota Management Console, and understanding them helps prevent resource allocation failures.
  • Billing Alerts: GCP provides Budgets & Alerts to track project spending and receive email or Pub/Sub notifications when costs exceed thresholds.

Data Storage and Management

  • Cloud Storage (GCS): GCP’s object storage service for datasets, models, and artifacts. It’s highly scalable and the direct counterpart to AWS S3.
  • Bucket: A top-level container within Cloud Storage that holds data files (objects). Each file can be accessed via a unique URI in the form gs://your-bucket-name/path/to/file.csv.
  • GCS URI (Object URI): The unique path referencing an object in a Cloud Storage bucket, typically used by Vertex AI and Workbench for loading or saving data. Example: gs://ml-project-dataset/train.csv.
  • Persistent Disk (PD): Block storage volumes attached to VMs, including Workbench notebooks. They retain data between VM reboots and can be used to store local datasets, checkpoints, or outputs.
  • Filestore / Cloud Storage FUSE: Options for mounting network file systems or Cloud Storage buckets directly to your notebook’s filesystem.

Vertex AI Workbench and Machine Learning Workflows

  • Vertex AI Workbench: A managed Jupyter notebook environment built on top of Compute Engine. It comes in Managed and User-Managed modes and is used to run ML experiments and manage data workflows interactively.
  • Workbench Notebook Instance: The actual VM running your notebook container. You configure its machine type (CPU/GPU), disk size, and region just like any other VM.
  • Controller: In this workshop, the notebook itself acts as the controller — it configures and runs training, tuning, and evaluation jobs through the Vertex AI SDK rather than performing all computation inside the notebook runtime.
  • Vertex AI Custom Job: A managed training job that executes your custom training code on dedicated Compute Engine instances. It’s equivalent to a SageMaker Training Job in AWS.
  • Hyperparameter Tuning Job: A Vertex AI service that automatically searches for the best model configuration by evaluating multiple trials with different hyperparameter sets.
  • Model Registry: Stores trained models for versioning, deployment, and comparison across experiments.
  • Endpoint (for Deployment): A deployed instance of a trained model that serves predictions through Vertex AI Prediction.