Slashing Your Cloud Bill đŸ’¸ - Part 1; Compute Costs

2026-01-01

As a business scales, we constantly try to balance profitability against the cost of running our systems, which makes cloud cost management critical.
If you run your workloads on a Kubernetes cluster (GKE), here are a few tips that can work to keep your compute costs under control.

While the examples use GKE and gcloud commands, the principles and steps can still apply to your own cloud provider, just use the equivalent CLI tools available in your environment.


Simple tips:

  1. Separate Spot nodes (AWS)/ Preemptible nodes (GCP) node pools

A Spot Instance is an instance that uses spare EC2 capacity that is available for less than the On-Demand price.


Preemptible VM instances are available at much lower price—a up to 91% discount—compared to the price of standard VMs

Create dedicated node pools for stateless and non-critical workloads to take advantage of lower-cost compute.


1
2
3
4
5
6
7
8
9
10
11
12
gcloud container node-pools create stateless-preemptible-pool \
--cluster=my-cluster \
--region=us-central1 \
--machine-type=e2-standard-2 \
--preemptible \
--enable-autoscaling \
--min-nodes=1 \
--max-nodes=2 \
--scopes=https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring \
--enable-autorepair \
--enable-autoupgrade \
--node-taints=workload=stateless:NoSchedule

  1. Use Kubernetes scheduling rules
    Apply node selectors, node affinity, and tolerations to ensure workloads run only on the intended node pools.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
apiVersion: apps/v1
kind: Deployment
metadata:
name: stateless-app
spec:
replicas: 1
selector:
matchLabels:
app: stateless-app
template:
metadata:
labels:
app: stateless-app
spec:
tolerations:
- key: "workload"
operator: "Equal"
value: "stateless"
effect: "NoSchedule"

nodeSelector:
cloud.google.com/gke-nodepool: stateless-preemptible-pool

affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-nodepool
operator: In
values:
- stateless-preemptible-pool

containers:
- name: app
image: nginx:alpine
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "250m"
memory: "256Mi"

  1. Scale down non-essential workloads after weekdays
    Scale deployments to zero for applications not required outside business hours , such as self-hosted runners, internal tools and dashboards, and sandbox environments and scale the Spot/Preemptible node pools hosting these applications to zero.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
import os
import json
from kubernetes import client, config
from google.cloud import container_v1

PROJECT_ID = os.getenv("PROJECT_ID")
CLUSTER_NAME = os.getenv("CLUSTER_NAME")


DEPLOYMENTS = json.loads(os.getenv("DEPLOYMENTS", "[]"))
NODE_POOLS = json.loads(os.getenv("NODE_POOLS", "[]"))


# ========================
# Kubernetes Deployment Scaling
# ========================

def scale_deployment(name: str, namespace: str, replicas: int):
try:
config.load_incluster_config()
except config.ConfigException:
config.load_kube_config()

apps = client.AppsV1Api()

body = {"spec": {"replicas": replicas}}

resp = apps.patch_namespaced_deployment_scale(
name=name,
namespace=namespace,
body=body
)

print(f"[DEPLOYMENT] {namespace}/{name} → {resp.spec.replicas}")


# ========================
# GKE Node Pool Scaling
# ========================

def downsize_node_pool(location: str, name: str, count: int):
gke = container_v1.ClusterManagerClient()

node_pool_path = (
f"projects/{PROJECT_ID}/locations/{location}"
f"/clusters/{CLUSTER_NAME}/nodePools/{name}"
)

req = container_v1.SetNodePoolSizeRequest(
name=node_pool_path,
node_count=count
)

op = gke.set_node_pool_size(request=req)
print(f"[NODEPOOL] {location}/{name} → {count} nodes (op={op.name})")




for d in DEPLOYMENTS:
scale_deployment(
name=d["name"],
namespace=d["namespace"],
replicas=d["replicacount"],
)

for np in NODE_POOLS:
downsize_node_pool(
location=np["location"],
name=np["name"],
count=np["count"]
)



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Use a slim, supported Python base image
FROM python:3.11-slim

# Prevent Python from writing .pyc files and buffer stdout/stderr
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1

# Set working directory
WORKDIR /app

# Install system dependencies (only what is required)
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*

# Copy dependency definition first (better layer caching)
COPY requirements.txt .

# Install Python dependencies
RUN pip install google-cloud-container kubernetes

# Copy application code
COPY main.py .

# Run as non-root user (recommended)
RUN useradd -u 10001 appuser && chown -R appuser /app
USER appuser

# Entry point for the CronJob
CMD ["python", "main.py"]


  1. Automate the whole Process
    Use a scheduled job K8s Cronjob to automate this

Create the SA account

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
export PROJECT_ID="my-gcp-project"
export SA_NAME="gke-scaler-sa"
export SA_EMAIL="${SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"
export CLUSTER_NAME="GKE cluster"

gcloud iam service-accounts create ${SA_NAME} \
--project=${PROJECT_ID} \
--description="Service account for scaling GKE node pools and deployments" \
--display-name="GKE Scaler"

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
--member="serviceAccount:${SA_EMAIL}" \
--role="roles/container.clusterAdmin"

kubectl create serviceaccount gke-scaler \
--namespace <namespace>


kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: deployment-scaler
rules:
- apiGroups: ["apps"]
resources: ["deployments", "deployments/scale"]
verbs: ["get", "list", "watch", "patch", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: deployment-scaler-binding
subjects:
- kind: ServiceAccount
name: gke-scaler
namespace: <namespace>
roleRef:
kind: ClusterRole
name: deployment-scaler
apiGroup: rbac.authorization.k8s.io
EOF


gcloud container clusters update $CLUSTER_NAME \
--project=${PROJECT_ID} \
--workload-pool=${PROJECT_ID}.svc.id.goog

kubectl annotate serviceaccount gke-scaler \
-n <namespace> \
iam.gke.io/gcp-service-account=${SA_EMAIL}


gcloud iam service-accounts add-iam-policy-binding ${SA_EMAIL} \
--project=${PROJECT_ID} \
--role=roles/iam.workloadIdentityUser \
--member="serviceAccount:${PROJECT_ID}.svc.id.goog[<namespace>/gke-scaler]"


Create the Cron job

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
apiVersion: batch/v1
kind: CronJob
metadata:
name: "<CRON JOB NAME>"
spec:
schedule: ""
jobTemplate:
spec:
template:
spec:
serviceAccountName: gke-scaler
restartPolicy: OnFailure
containers:
- name: scaler
image: gcr.io/my-project/gke-scaler:latest
env:
- name: PROJECT_ID
value: my-project
- name: CLUSTER_NAME
value: my-cluster
- name: DEPLOYMENTS
value: |
[
{"namespace": "default", "name": "api", "replicacount": 0},
{"namespace": "tools", "name": "runner", "replicacount": 0}
]
- name: NODE_POOLS
value: |
[
{"location": "us-central1", "name": "stateless-preemptible-pool", "count": 0}
]

Using the yaml format above, you can set two cron tasks:

  1. One that scales down deployment replicaCount and Nodepool count to zero after work hours on Friday evening. schedule: "0 19 * * 5"
  2. One that scales up deployment replicaCount and Nodepool count on Monday morning. schedule: "0 8 * * 1" - just make sure to increase the key-value "replicaCount": 0 and "count": 0 to values greater than zero.