Slashing Your Cloud Bill 💸 - Part 1; Compute Costs

As a business scales, we constantly try to balance profitability against the cost of running our systems, which makes cloud cost management critical.
If you run your workloads on a Kubernetes cluster (GKE), here are a few tips that can work to keep your compute costs under control.

While the examples use GKE and gcloud commands, the principles and steps can still apply to your own cloud provider, just use the equivalent CLI tools available in your environment.

Simple tips:

Separate Spot nodes (AWS)/ Preemptible nodes (GCP) node pools

A Spot Instance is an instance that uses spare EC2 capacity that is available for less than the On-Demand price.

Preemptible VM instances are available at much lower price—a up to 91% discount—compared to the price of standard VMs

Create dedicated node pools for stateless and non-critical workloads to take advantage of lower-cost compute.

gcloud container node-pools create stateless-preemptible-pool \
--cluster=my-cluster \
--region=us-central1 \
--machine-type=e2-standard-2 \
--preemptible \
--enable-autoscaling \
--min-nodes=1 \
--max-nodes=2 \
--scopes=https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring \
--enable-autorepair \
--enable-autoupgrade \
--node-taints=workload=stateless:NoSchedule

Use Kubernetes scheduling rules
Apply node selectors, node affinity, and tolerations to ensure workloads run only on the intended node pools.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: stateless-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: stateless-app
  template:
    metadata:
      labels:
        app: stateless-app
    spec:
      tolerations:
        - key: "workload"
          operator: "Equal"
          value: "stateless"
          effect: "NoSchedule"

      nodeSelector:
        cloud.google.com/gke-nodepool: stateless-preemptible-pool

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: cloud.google.com/gke-nodepool
                    operator: In
                    values:
                      - stateless-preemptible-pool

      containers:
        - name: app
          image: nginx:alpine
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "250m"
              memory: "256Mi"

Scale down non-essential workloads after weekdays
Scale deployments to zero for applications not required outside business hours , such as self-hosted runners, internal tools and dashboards, and sandbox environments and scale the Spot/Preemptible node pools hosting these applications to zero.

import os
import json
from kubernetes import client, config
from google.cloud import container_v1

PROJECT_ID = os.getenv("PROJECT_ID")
CLUSTER_NAME = os.getenv("CLUSTER_NAME")


DEPLOYMENTS = json.loads(os.getenv("DEPLOYMENTS", "[]"))
NODE_POOLS = json.loads(os.getenv("NODE_POOLS", "[]"))


# ========================
# Kubernetes Deployment Scaling
# ========================

def scale_deployment(name: str, namespace: str, replicas: int):
    try:
        config.load_incluster_config()
    except config.ConfigException:
        config.load_kube_config()

    apps = client.AppsV1Api()

    body = {"spec": {"replicas": replicas}}

    resp = apps.patch_namespaced_deployment_scale(
        name=name,
        namespace=namespace,
        body=body
    )

    print(f"[DEPLOYMENT] {namespace}/{name} → {resp.spec.replicas}")


# ========================
# GKE Node Pool Scaling
# ========================

def downsize_node_pool(location: str, name: str, count: int):
    gke = container_v1.ClusterManagerClient()

    node_pool_path = (
        f"projects/{PROJECT_ID}/locations/{location}"
        f"/clusters/{CLUSTER_NAME}/nodePools/{name}"
    )

    req = container_v1.SetNodePoolSizeRequest(
        name=node_pool_path,
        node_count=count
    )

    op = gke.set_node_pool_size(request=req)
    print(f"[NODEPOOL] {location}/{name} → {count} nodes (op={op.name})")




for d in DEPLOYMENTS:
    scale_deployment(
        name=d["name"],
        namespace=d["namespace"],
        replicas=d["replicacount"],
    )

for np in NODE_POOLS:
    downsize_node_pool(
        location=np["location"],
        name=np["name"],
        count=np["count"]
    )

# Use a slim, supported Python base image
FROM python:3.11-slim

# Prevent Python from writing .pyc files and buffer stdout/stderr
ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

# Set working directory
WORKDIR /app

# Install system dependencies (only what is required)
RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# Copy dependency definition first (better layer caching)
COPY requirements.txt .

# Install Python dependencies
RUN pip install google-cloud-container kubernetes

# Copy application code
COPY main.py .

# Run as non-root user (recommended)
RUN useradd -u 10001 appuser && chown -R appuser /app
USER appuser

# Entry point for the CronJob
CMD ["python", "main.py"]

Automate the whole Process
Use a scheduled job K8s Cronjob to automate this

Create the SA account

export PROJECT_ID="my-gcp-project"
export SA_NAME="gke-scaler-sa"
export SA_EMAIL="${SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"
export CLUSTER_NAME="GKE cluster"

gcloud iam service-accounts create ${SA_NAME} \
  --project=${PROJECT_ID} \
  --description="Service account for scaling GKE node pools and deployments" \
  --display-name="GKE Scaler"

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member="serviceAccount:${SA_EMAIL}" \
  --role="roles/container.clusterAdmin"

kubectl create serviceaccount gke-scaler \
  --namespace <namespace>


kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: deployment-scaler
rules:
  - apiGroups: ["apps"]
    resources: ["deployments", "deployments/scale"]
    verbs: ["get", "list", "watch", "patch", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: deployment-scaler-binding
subjects:
  - kind: ServiceAccount
    name: gke-scaler
    namespace: <namespace>
roleRef:
  kind: ClusterRole
  name: deployment-scaler
  apiGroup: rbac.authorization.k8s.io
EOF


gcloud container clusters update $CLUSTER_NAME \
  --project=${PROJECT_ID} \
  --workload-pool=${PROJECT_ID}.svc.id.goog

kubectl annotate serviceaccount gke-scaler \
  -n <namespace> \
  iam.gke.io/gcp-service-account=${SA_EMAIL}


gcloud iam service-accounts add-iam-policy-binding ${SA_EMAIL} \
  --project=${PROJECT_ID} \
  --role=roles/iam.workloadIdentityUser \
  --member="serviceAccount:${PROJECT_ID}.svc.id.goog[<namespace>/gke-scaler]"

Create the Cron job

apiVersion: batch/v1
kind: CronJob
metadata:
  name: "<CRON JOB NAME>"
spec:
  schedule: ""
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: gke-scaler
          restartPolicy: OnFailure
          containers:
            - name: scaler
              image: gcr.io/my-project/gke-scaler:latest
              env:
                - name: PROJECT_ID
                  value: my-project
                - name: CLUSTER_NAME
                  value: my-cluster
                - name: DEPLOYMENTS
                  value: |
                    [
                      {"namespace": "default", "name": "api", "replicacount": 0},
                      {"namespace": "tools", "name": "runner", "replicacount": 0}
                    ]
                - name: NODE_POOLS
                  value: |
                    [
                      {"location": "us-central1", "name": "stateless-preemptible-pool", "count": 0}
                    ]

Using the yaml format above, you can set two cron tasks:

One that scales down deployment replicaCount and Nodepool count to zero after work hours on Friday evening. schedule: "0 19 * * 5"
One that scales up deployment replicaCount and Nodepool count on Monday morning. schedule: "0 8 * * 1" - just make sure to increase the key-value "replicaCount": 0 and "count": 0 to values greater than zero.