Deploy Airbyte to Kubernetes

Step-by-step guide to install Airbyte Helm v1 chart on ARM64 GKE with Cloud SQL and GCS. Includes production-ready values.yaml configuration and troubleshooting tips.

Step-by-step production deployment of Airbyte 1.8.1 on GKE ARM64 infrastructure with Cloud SQL and Google Cloud Storage

TL;DR

Installing Airbyte Helm v1 chart on ARM64 Kubernetes requires specific configurations for production readiness. Key points:

  • Use Airbyte Helm Chart v1.8.1 for stable ARM64 support
  • ARM64 tolerations required for all services to run on cost-effective ARM nodes
  • External Cloud SQL PostgreSQL recommended over in-cluster database
  • Google Cloud Storage integration for scalable state and log storage
  • High availability setup with 3 replicas for critical services
  • Resource limits tuned for ARM64 performance characteristics
  • Multi-tenant ready with proper authentication and security

Installation time: 15-30 minutes with pre-configured infrastructure. Cost savings: 20-40% compared to x86 equivalent setup.

Introduction

Airbyte on Kubernetes using Helm charts provides enterprise-grade data integration capabilities, but deploying it correctly on ARM64 infrastructure requires understanding the specific configuration nuances. This guide walks through a complete production installation of Airbyte Helm v1 chart on Google Kubernetes Engine (GKE) with ARM64 nodes, external Cloud SQL database, and Google Cloud Storage.

Unlike generic installation guides, this tutorial uses real-world production values and explains the rationale behind each configuration choice. If you need expert assistance with Docker and Kubernetes deployments, EaseCloud specializes in production-ready data platform implementations.

Infrastructure Overview

Our Production Setup

Before diving into the Helm installation, let's understand the infrastructure architecture we're deploying to:

Google Kubernetes Engine (GKE) Cluster:

  • Node Pool: ARM64 nodes (t2a-standard-16, Tau T2A processors)
  • Node Count: 4 nodes for high availability
  • Zones: Multi-zone deployment across us-central1
  • Networking: VPC-native with private nodes

External Dependencies:

  • Database: Cloud SQL PostgreSQL 15 (db-perf-optimized-N-4)
  • Storage: Google Cloud Storage buckets for logs, state, and workload output
  • DNS: Cloud DNS for custom domain (etl.yourdomain.com)
  • SSL: Let's Encrypt certificates via cert-manager

Security & Access:

  • Authentication: Instance admin with secure password management
  • Secrets: Google Secret Manager integration
  • Network: Private cluster with authorized networks

Why This Architecture?

This setup provides:

  • Cost Optimization: ARM64 nodes offer 20-40% cost savings
  • Scalability: External database and storage scale independently
  • Reliability: Multi-zone deployment with external managed services
  • Security: Private networking with managed SSL certificates
  • Performance: Optimized for high-throughput data pipelines

Prerequisites

Infrastructure Setup

  1. GKE Cluster with ARM64 nodes:
gcloud container clusters create airbyte-cluster \
  --zone us-central1-a \
  --machine-type t2a-standard-16 \
  --num-nodes 4 \
  --enable-ip-alias \
  --enable-network-policy \
  --enable-autoscaling \
  --min-nodes 2 \
  --max-nodes 8
  1. Cloud SQL PostgreSQL instance:
gcloud sql instances create airbyte-postgres \
  --database-version POSTGRES_15 \
  --tier db-perf-optimized-N-4 \
  --region us-central1 \
  --storage-type SSD \
  --storage-size 100GB \
  --storage-auto-increase
  1. Google Cloud Storage buckets:
gsutil mb -l us-central1 gs://airbyte-storage-xxxx
  1. Required Tools:
# Install Helm
curl https://get.helm.sh/helm-v3.14.0-linux-arm64.tar.gz | tar -xzO linux-arm64/helm > /usr/local/bin/helm
chmod +x /usr/local/bin/helm

# Install kubectl (if not already installed)
gcloud components install kubectl

Helm Repository Setup

Add Airbyte Helm Repository

# Add the official Airbyte Helm repository
helm repo add airbyte https://airbytehq.github.io/helm-charts

# Update repositories to get latest charts
helm repo update

# Verify Airbyte v1 chart availability
helm search repo airbyte/airbyte --versions | grep "^airbyte/airbyte\s*1\."

Expected output:

airbyte/airbyte    1.8.1    1.8.1    Helm chart to deploy Airbyte in the cloud
airbyte/airbyte    1.8.0    1.8.0    Helm chart to deploy Airbyte in the cloud

Production Values Configuration

Understanding the values-v1.yaml Structure

Our production values-v1.yaml file is organized into several key sections. Let's break down each component:

Global Configuration

global:
  serviceAccountName: "airbyte-admin"
  edition: "community"
  airbyteUrl: "https://etl.yourdomain.com"

Key Points:

  • serviceAccountName: Dedicated service account for RBAC
  • edition: Community edition (free) vs Enterprise
  • airbyteUrl: Your custom domain for accessing Airbyte UI

Authentication Setup

global:
  auth:
    enabled: true
    instanceAdmin:
      secretName: "airbyte-auth-secrets"
      firstName: "Admin F"
      lastName: "Admin L"
      email: "admin@airbyte.example"
      passwordSecretKey: "instance-admin-password"

Why This Matters:

  • Enables secure authentication instead of open access
  • Uses Kubernetes secrets for password management
  • Configures initial admin user

External Database Configuration

global:
  database:
    type: "external"
    secretName: "airbyte-cloudsql-secret"
    host: "10.xx.x.x"  # Cloud SQL private IP
    port: "5432"
    database: "airbyte"
    user: "airbyte"
    passwordSecretKey: "database-password"

Critical for Production:

  • External database prevents data loss during pod restarts
  • Cloud SQL provides automated backups and high availability
  • Private IP ensures secure database connections

Storage Configuration

global:
  storage:
    type: "gcs"
    secretName: "airbyte-gcs-log-creds"
    bucket:
      log: "airbyte-storage-xxxx"
      state: "airbyte-storage-xxxx"
      workloadOutput: "airbyte-storage-xxxx"
    gcs:
      projectId: "prj-xxxx"
      credentialsJsonPath: "/secrets/gcs-log-creds/gcp.json"

Storage Strategy:

  • Single bucket with different prefixes for organization
  • Service account with minimal required permissions
  • Persistent storage for job state and logs

ARM64 Specific Configurations

Every service needs ARM64 tolerations:

global:
  tolerations:
    - key: kubernetes.io/arch
      operator: Equal
      value: arm64
      effect: NoSchedule

# Applied to each service (webapp, server, worker, etc.)
webapp:
  tolerations:
    - key: kubernetes.io/arch
      operator: Equal
      value: arm64
      effect: NoSchedule

High Availability Setup

# Server HA configuration
server:
  replicaCount: 3
  podDisruptionBudget:
    enabled: true
    minAvailable: 2
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                  - server
          topologyKey: kubernetes.io/hostname

HA Benefits:

  • 3 replicas ensure service availability during node failures
  • Pod disruption budgets prevent simultaneous pod evictions
  • Anti-affinity rules spread pods across different nodes

Resource Optimization for ARM64

global:
  env_vars:
    # Database connection optimization
    HIKARI_MAXIMUM_POOL_SIZE: "40"
    HIKARI_CONNECTION_TIMEOUT: "5000"
    HIKARI_IDLE_TIMEOUT: "300000"

    # JVM optimization for ARM64
    JAVA_OPTS: "-XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Xms2g -Xmx3g -XX:+ExitOnOutOfMemoryError"

Job Resource Configuration

global:
  jobs:
    resources:
      requests:
        memory: 2Gi
        cpu: 500m
      limits:
        memory: 8Gi
        cpu: 4

    kube:
      nodeSelector:
        workload: airbyte-jobs
      tolerations:
        - key: airbyte-jobs
          operator: Equal
          value: "true"
          effect: NoSchedule

Job Optimization:

  • Dedicated node pool for data processing jobs
  • Resource limits prevent job pods from overwhelming nodes
  • Node selectors ensure jobs run on appropriate hardware

Step-by-Step Installation

1. Create Namespace

kubectl create namespace airbyte

2. Create Required Secrets

Database Secret:

kubectl create secret generic airbyte-cloudsql-secret \
  --from-literal=database-password='your-secure-password' \
  -n airbyte

Authentication Secret:

kubectl create secret generic airbyte-auth-secrets \
  --from-literal=instance-admin-password='your-admin-password' \
  -n airbyte

GCS Service Account Secret:

kubectl create secret generic airbyte-gcs-log-creds \
  --from-file=gcp.json=/path/to/service-account-key.json \
  -n airbyte

3. Install Airbyte with Helm

helm install airbyte airbyte/airbyte \
  --version 1.8.1 \
  -n airbyte \
  --create-namespace \
  -f airbyte-k8s/helm/values/values-v1.yaml \
  --timeout 10m \
  --wait

Installation Parameters:

  • --version 1.8.1: Specific version for stability
  • --create-namespace: Auto-create namespace if it doesn't exist
  • --timeout 10m: Allow sufficient time for ARM64 image pulls
  • --wait: Wait for all resources to be ready

4. Verify Installation

# Check pod status
kubectl get pods -n airbyte

# Check services
kubectl get svc -n airbyte

# Check ingress
kubectl get ingress -n airbyte

Expected pod status (all Running):

NAME                                            READY   STATUS    RESTARTS
airbyte-server-75595f7b5b-4kvbq                 1/1     Running   0
airbyte-server-75595f7b5b-7gd2w                 1/1     Running   0
airbyte-server-75595f7b5b-86x2q                 1/1     Running   0
airbyte-webapp-7547f5b7d8-btbh7                 1/1     Running   0
airbyte-worker-5889c9d94f-mth5q                 1/1     Running   0
airbyte-temporal-75857855b4-2sskj               1/1     Running   0

5. Configure Ingress and SSL

The values file includes ingress configuration:

ingress:
  enabled: true
  className: "nginx"
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
  hosts:
    - host: etl.yourdomain.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: etl-yourdomain-com-tls
      hosts:
        - etl.yourdomain.com

Ensure cert-manager and nginx-ingress are installed:

# Install nginx-ingress if not present
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install nginx-ingress ingress-nginx/ingress-nginx -n ingress-nginx --create-namespace

# Install cert-manager if not present
helm repo add jetstack https://charts.jetstack.io
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.15.3 \
  --set crds.enabled=true

Post-Installation Configuration

1. Access Airbyte UI

Once DNS is configured, access Airbyte at your custom domain:

  • URL: https://etl.yourdomain.com
  • Username: Your configured email
  • Password: From your auth secret

2. Configure Your First Connection

  1. Add a Source: Choose from 300+ pre-built connectors
  2. Add a Destination: Configure your data warehouse or lake
  3. Create Connection: Set up sync schedule and configuration
  4. Test Sync: Run a test sync to verify everything works

3. Monitoring Setup

Monitor your Airbyte deployment:

# Check resource usage
kubectl top pods -n airbyte

# View logs
kubectl logs -n airbyte -l app.kubernetes.io/name=server --tail=100

# Monitor database connections
kubectl logs -n airbyte -l app.kubernetes.io/name=worker | grep -i connection

Troubleshooting Common Issues

ARM64 Image Pull Issues

If you encounter image pull errors:

# Check node architecture
kubectl get nodes -o wide

# Verify tolerations are applied
kubectl describe pod -n airbyte <pod-name>

Database Connection Issues

# Test database connectivity
kubectl run -it --rm debug --image=postgres:15 --restart=Never -- \
  psql -h <your-db-host> -U airbyte -d airbyte

Storage Access Issues

# Verify GCS permissions
kubectl exec -it -n airbyte <server-pod> -- \
  gsutil ls gs://your-bucket-name

Upgrading Airbyte

Safe Upgrade Process

# Check current version
helm list -n airbyte

# Backup current values
helm get values airbyte -n airbyte > backup-values.yaml

# Upgrade to newer version
helm upgrade airbyte airbyte/airbyte \
  --version 1.8.2 \
  -n airbyte \
  -f airbyte-k8s/helm/values/values-v1.yaml \
  --timeout 10m \
  --wait

Rollback if Needed

# View upgrade history
helm history airbyte -n airbyte

# Rollback to previous version
helm rollback airbyte <revision> -n airbyte

Need Expert Help with Your Airbyte Deployment?

Setting up production-ready Airbyte on ARM64 Kubernetes involves complex configuration management, security considerations, and performance optimization. Don't risk data pipeline downtime with trial-and-error deployments.

EaseCloud's experienced team has deployed Airbyte across diverse enterprise environments with zero-downtime requirements. Our expertise includes:

  • Production Airbyte installations on GKE, EKS, and AKS
  • ARM64 optimization for maximum cost efficiency
  • High availability architecture design and implementation
  • Database performance tuning for high-throughput pipelines
  • Security hardening and compliance configuration
  • 24/7 monitoring and support for mission-critical data operations

Ready to deploy Airbyte the right way from day one? Contact our Kubernetes experts for a consultation on your data platform requirements.


Frequently Asked Questions (FAQs)

1. Why use Helm v1 chart instead of v2 for Airbyte?

Helm Chart v1 (Airbyte 1.8.x) is the stable, production-tested version with comprehensive ARM64 support. V2 charts are newer but may have compatibility issues with ARM64 environments. V1 provides better stability for production deployments while V2 is still maturing.

2. What are the minimum resource requirements for Airbyte on ARM64?

For production deployments, allocate at least:

  • GKE Cluster: 4 nodes of t2a-standard-8 (8 vCPUs, 32GB RAM each)
  • Database: Cloud SQL db-perf-optimized-N-4 (4 vCPUs, 32GB RAM) minimum
  • Storage: 100GB initial allocation with auto-scaling enabled
  • Network: Private cluster with 10+ authorized IP ranges

3. How do I configure custom connectors on ARM64 infrastructure?

Custom connectors require ARM64-compatible Docker images. Build your connector images with:

FROM --platform=linux/arm64 airbyte/integration-base:1.8.1
# Your connector code here

Then publish to a registry accessible by your GKE cluster and configure in your values file under the connectors section.

4. Can I use the same Cloud SQL instance for multiple Airbyte deployments?

Yes, but create separate databases within the same instance:

CREATE DATABASE airbyte_prod;
CREATE DATABASE airbyte_staging;
CREATE USER airbyte_prod_user WITH PASSWORD 'secure_password';
GRANT ALL PRIVILEGES ON DATABASE airbyte_prod TO airbyte_prod_user;

Update your values.yaml database configuration accordingly for each deployment.

5. How do I backup and restore Airbyte configuration and metadata?

Airbyte metadata is stored in your Cloud SQL database, which has automated backups. For additional protection:

# Manual database backup
gcloud sql export sql airbyte-postgres gs://your-backup-bucket/airbyte-backup-$(date +%Y%m%d).sql

# Backup Kubernetes secrets and configs
kubectl get secrets -n airbyte -o yaml > airbyte-secrets-backup.yaml
kubectl get configmaps -n airbyte -o yaml > airbyte-configs-backup.yaml

Additional Resources