Managed SSL for TCP Loadbalancer in GKE / Habr

At Altenar, we often use RabbitMQ as an entry point for our products. In this specific case, we will discuss a system that calculates sports data and provides it to other systems within the organization as well as outside the perimeter. When designing the system, the RabbitMQ component had the following requirements:

The endpoint is secured with an SSL certificate
RabbitMQ is hosted in Kubernetes. We use the RabbitMQ cluster operator and RabbitMQ messaging topology operator

For this project, we are using Google Cloud. Like Let’s Encrypt, you can use a Google Managed SSL certificate for any public HTTP load balancer in Google Cloud. However, it has several limitations, with the key ones being:

It only works for HTTP Load balancers.
It only works for public Load balancers.

According to RabbitMQ documentation, there are two common approaches for client connections:

Configure RabbitMQ to handle TLS connections
Use a proxy or load balancer (such as HAproxy) to perform TLS termination of client connections and use plain TCP connections to RabbitMQ nodes.

Our idea was to use a Google Managed certificate for TCP/SSL proxy load balancer in Google Cloud. However, due to the aforementioned limitations, a managed certificate is not supported for TCP load balancers. On the other hand, Google allows you to use an SSL certificate for several load balancers, which we decided to explore. Overall the blueprint would look like this:

We wanted to have a default dummy HTTP service that we would expose over port 443 and use for the sake of managing the SSL certificate and automatic renewals.
We would have a separate endpoint reusing the same IP address and SSL certificate.

Let’s examine each part of the solution in detail.

GKE Cluster

As the main hosting platform, we use GKE. We use the Google Terraform module for the private GKE cluster, configure Workload Identity and install an AutoNEG controller to the cluster.

In this article, we won’t delve into the specifics of how Workload Identity works with AutoNEG, as they are used as described in the official documentation.

With all preparations in place, we deploy a dummy “Hello World” service to the cluster and attach it to the Google Load balancer via AutoNEG. Below is the YAML configuration for that:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gcp-tls-certificate-issuer
  labels:
    app: gcp-tls-certificate-issuer
  annotations:
    deployment.kubernetes.io/revision: '1'
spec:
  replicas: 2
  selector:
    matchLabels:
      app: gcp-tls-certificate-issuer
  template:
    metadata:
      labels:
        app: gcp-tls-certificate-issuer
    spec:
      containers:
        - name: ok
          image: assemblyline/ok:latest
          ports:
            - containerPort: 8888
              protocol: TCP
          imagePullPolicy: Always
          securityContext:
            capabilities:
              drop:
                - ALL
            runAsUser: 1000
            runAsGroup: 3000
            runAsNonRoot: true
            readOnlyRootFilesystem: true
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 25%
  revisionHistoryLimit: 10
  progressDeadlineSeconds: 600
---
apiVersion: v1
kind: Service
metadata:
  name: gcp-tls-certificate-issuer
  labels:
    app: gcp-tls-certificate-issuer
  annotations:
    cloud.google.com/neg: '{"exposed_ports": {"8888":{"name": "gcp-tls-certificate-issuer"}}}'
    controller.autoneg.dev/neg: '{"backend_services":{"8888":[{"name":"envcode-rabbit-https-backend-service","max_connections_per_endpoint":10000}]}}'
spec:
  ports:
    - name: http
      protocol: TCP
      port: 8888
      targetPort: 8888
  selector:
    app: gcp-tls-certificate-issuer
  clusterIP: 10.10.12.130
  clusterIPs:
    - 10.10.12.130
  type: ClusterIP

Notice the service's annotation. This is how AutoNEG adds it to the Load balancer as a backend.

Google load balancer

The next part is managed outside GKE and created separately. Google Load balancer is not a single object but rather a set of different objects combined. Below is the Terraform code with comments:

resource "google_compute_managed_ssl_certificate" "rabbitmq" {
  project = var.project
  name    = "${var.environment_name}-google-managed-certificate-rabbitmq"

  managed {
    domains = ["rabitmq.example.com."]  # Replace with your domain
  }
}

# reserved IP address
resource "google_compute_global_address" "default" {
  project = var.project
  name    = "tcp-proxy-xlb-ip"
}

output "rabbitmq-ip" {
  value = google_compute_global_address.default.address
}

# forwarding rule for TCP Loadbalanser
resource "google_compute_global_forwarding_rule" "default" {
  project               = var.project
  name                  = "${var.environment_name}-tcp-global-loadbalancer"
  provider              = google
  ip_protocol           = "TCP"
  load_balancing_scheme = "EXTERNAL"
  port_range            = "5671"
  target                = google_compute_target_ssl_proxy.default.id
  ip_address            = google_compute_global_address.default.id
}
# https://cloud.google.com/load-balancing/docs/ssl
# When you use Google-managed SSL certificates with SSL Proxy Load Balancing, the frontend port for traffic must be 443 to enable the Google-managed SSL certificates to be provisioned and renewed.

# forwarding rule for HTTPS Loadbalanser
resource "google_compute_global_forwarding_rule" "https" {
  project               = var.project
  name                  = "${var.environment_name}-https-global-loadbalancer"
  provider              = google
  ip_protocol           = "TCP"
  load_balancing_scheme = "EXTERNAL"
  port_range            = "443"
  target                = google_compute_target_ssl_proxy.https.id
  ip_address            = google_compute_global_address.default.id
}

resource "google_compute_target_ssl_proxy" "default" {
  project          = var.project
  name             = "${var.environment_name}-global-loadbalancer-tcp-proxy"
  backend_service  = google_compute_backend_service.default.id
  ssl_certificates = [google_compute_managed_ssl_certificate.rabbitmq.id]
}

resource "google_compute_target_ssl_proxy" "https" {
  project          = var.project
  name             = "${var.environment_name}-global-loadbalancer-https-proxy"
  backend_service  = google_compute_backend_service.https.id
  ssl_certificates = [google_compute_managed_ssl_certificate.rabbitmq.id]
}

# backend service For RabbitMQ Autoneg
resource "google_compute_backend_service" "default" {
  project = var.project
  name    = "${var.environment_name}-tcp-backend-service"
  protocol              = "TCP"
  port_name             = "tcp"
  load_balancing_scheme = "EXTERNAL"
  timeout_sec           = 10
  health_checks         = [google_compute_health_check.default.id]
  session_affinity      = "CLIENT_IP"

  # We don't want TF to remove whatever was configured by AutoNEG
  lifecycle {
    ignore_changes = [backend]
  }
}

# backend service For HTTPS Autoneg
resource "google_compute_backend_service" "https" {
  project               = var.project

                          #that's what you use in the service annotations
  name                  = "${var.environment_name}-https-backend-service"  
  protocol              = "TCP"
  port_name             = "tcp"
  load_balancing_scheme = "EXTERNAL"
  timeout_sec           = 10
  health_checks         = [google_compute_health_check.https.id]

  # We don't want TF to remove whatever was configured by AutoNEG
  lifecycle {
    ignore_changes = [backend]
  }
}

resource "google_compute_health_check" "default" {
  project            = var.project
  name               = "tcp-proxy-health-check"
  description        = "Backend service for AutoNEG"
  timeout_sec        = 1
  check_interval_sec = 1

  tcp_health_check {
    port = "5672" #use container port
  }
}

resource "google_compute_health_check" "https" {
  project            = var.project
  name               = "https-proxy-health-check"
  description        = "Backend service for AutoNEG"
  timeout_sec        = 1
  check_interval_sec = 1

  tcp_health_check {
    port = "8888" #use container port
  }
}

]

As you can see, we created one IP address and one SSL certificate and then used them in two forwarding rules. This allowed us to have a managed SSL certificate used for the TCP Load Balancer.

Don’t forget to configure DNS and point the IP to the right hostname for the whole thing to work.

Tip: GKE has thel7-default-backend deployment. Perhaps it would’ve been enough just creating a service with AutoNEG annotations and pointing it to that deployment’s pods. Try that and let me know in the comments if it works.