Accessing and Managing Data in GCS with Vertex AI Notebooks
Last updated on 2025-08-27 | Edit this page
Overview
Questions
- How can I load data from GCS into a Vertex AI Workbench
notebook?
- How do I monitor storage usage and costs for my GCS bucket?
- What steps are involved in pushing new data back to GCS from a notebook?
Objectives
- Read data directly from a GCS bucket into memory in a Vertex AI
notebook.
- Check storage usage and estimate costs for data in a GCS
bucket.
- Upload new files from the Vertex AI environment back to the GCS bucket.
Initial setup
Open JupyterLab notebook
Once your Vertex AI Workbench notebook instance shows as
Running, open it in JupyterLab. Create a new Python 3
notebook and rename it to: Interacting-with-GCS.ipynb
.
Set up GCP environment
Before interacting with GCS, we need to authenticate and initialize the client libraries. This ensures our notebook can talk to GCP securely.
PYTHON
from google.cloud import storage
from google.colab import auth
import pandas as pd
# Step 1: Authenticate your account (only prompts if needed)
auth.authenticate_user()
# Step 2: Initialize a GCS client
client = storage.Client()
# Step 3: List buckets in your current project to confirm access
buckets = list(client.list_buckets())
print("Buckets in project:")
for b in buckets:
print("-", b.name)
Explanation of the pieces:
- auth.authenticate_user()
: Ensures you are logged in to
your Google account and the notebook can act on your behalf. In
Workbench, this usually auto-resolves.
- storage.Client()
: Creates a connection to Google Cloud
Storage. All read/write actions will use this client.
- list_buckets()
: Confirms which storage buckets your
account can see in the current project.
This setup block prepares the notebook environment to efficiently interact with GCS resources.
Reading data from GCS
As with S3, you can either (A) read data directly from GCS into memory, or (B) download a copy into your notebook VM. Since we’re using notebooks as controllers rather than training environments, the recommended approach is reading directly from GCS.
Checking storage usage of a bucket
Estimating storage costs
PYTHON
storage_price_per_gb = 0.02 # $/GB/month for Standard storage
total_size_gb = total_size_bytes / (1024**3)
monthly_cost = total_size_gb * storage_price_per_gb
print(f"Estimated monthly cost: ${monthly_cost:.4f}")
print(f"Estimated annual cost: ${monthly_cost*12:.4f}")
For updated prices, see GCS Pricing.
Writing output files to GCS
PYTHON
# Create a sample file
with open("Notes.txt", "w") as f:
f.write("This is a test note for GCS.")
# Upload to bucket/docs/
bucket = client.bucket(bucket_name)
blob = bucket.blob("docs/Notes.txt")
blob.upload_from_filename("Notes.txt")
print("File uploaded successfully.")
List bucket contents:
Challenge: Estimating GCS Costs
Suppose you store 50 GB of data in Standard storage
(us-central1) for one month.
- Estimate the monthly storage cost.
- Then estimate the cost if you download (egress) the entire dataset
once at the end of the month.
Hints
- Storage: $0.02 per GB-month
- Egress: $0.12 per GB
- Storage cost: 50 GB × $0.02 = $1.00
- Egress cost: 50 GB × $0.12 = $6.00
- Total cost: $7.00 for one month including one full download
- Load data from GCS into memory to avoid managing local copies when
possible.
- Periodically check storage usage and costs to manage your GCS
budget.
- Use Vertex AI Workbench notebooks to upload analysis results back to GCS, keeping workflows organized and reproducible.