Shivam Rastogi

Over-engineering an async AI voice chat app, on purpose

Seven repos, four languages, Apache Pulsar on Kubernetes — and the honest reason the architecture looks like this.

This is an async AI voice chat app. You record a voice message, send it to a friend, they listen and reply when they’re ready. In the background, OpenAI Whisper transcribes the audio and a separate service handles image processing. Conceptually, it’s a ten-line Django view with a file upload and a row in a message table.

That’s not what I built.

What I built is seven repos, four languages, Apache Pulsar running on Amazon EKS with end-to-end TLS, a Spring Boot image service, a Python Whisper transcription consumer, a Django web app deployed to ECS Fargate behind an AWS CDK CodePipeline, and a cert-manager setup issuing certificates from AWS Private CA to every pod in the cluster.

None of this is necessary. All of it is on purpose.

This post is the honest explanation for why a side project about voice messages looks like a distributed systems exam — and a tour of the architecture for anyone who wants to know what each piece does.

The honest reason: it’s a learning vehicle

The thing I never see on side-project blog posts is the admission that the project is actually a pretext. This app isn’t going to have users. It’s not a startup. It’s not a product. It’s a shape I chose so I had a concrete reason to learn:

A simpler architecture would have taught me less. The over-engineering is the feature.

Once you accept that, the decisions stop looking strange.

What the app actually is

flowchart LR
    User([User]) -->|HTTPS| ALB[AWS ALB]
    ALB --> Django[Django Web App
ECS Fargate] Django -->|Publish audio| Pulsar[Apache Pulsar
on EKS] Django -->|Publish image| Pulsar Pulsar -->|Audio topic| Whisper[Whisper Consumer
Python] Pulsar -->|Image topic| SpringBoot[Image Service
Spring Boot] Whisper -->|Transcript| S3[(S3)] SpringBoot -->|Processed| S3 Django -->|Read results| S3

The flow is:

  1. A user records an audio message and hits send in the Django UI
  2. Django uploads the audio to S3 and publishes a message to a Pulsar topic
  3. The Python Whisper consumer picks up the message, transcribes the audio, writes the transcript back
  4. If there’s an image attached, the Spring Boot service handles it in parallel via a separate topic
  5. The Django app reads back the results and displays them in the chat thread

Everything except the Django request handler is asynchronous. Everything except S3 runs in a container I own.

Why Pulsar (and not Kafka or RabbitMQ)

This is the single decision I’m asked about most. Honestly, any of the three would have worked for a chat app. Here’s why I picked Pulsar:

For a chat app with no users, none of this matters. For a learning vehicle, all of it matters.

The component tour

Django web app (Python)

The front door. Handles auth, the chat UI, and routing messages into Pulsar. Deployed on ECS Fargate (not EKS) because I wanted to keep the stateless web tier isolated from the stateful messaging infrastructure. The split was deliberate: Fargate is great for “ship a container, forget about it,” and EKS is where I put the stuff that actually benefits from Kubernetes primitives.

Deployed via an AWS CDK CodePipeline — Django code lands in GitHub, a pipeline builds a Docker image, pushes it to ECR, and deploys it to a Fargate service. The pipeline itself is babblebox-cdk-pipeline.

Apache Pulsar cluster (EKS)

The backbone. Runs on Amazon EKS via the official Helm chart, with everything TLS-encrypted end-to-end. The interesting bits:

I spent more time on the TLS setup than on the messaging code. That was the point.

Whisper transcription consumer (Python)

Subscribes to the audio topic, downloads the message, runs it through OpenAI’s Whisper model, and writes the transcript back. Packaged as a Docker container and deployed via whisper-pulsar-consumer-cdk.

The interesting part here isn’t the Whisper model — it’s learning how Pulsar subscription semantics work. Exclusive vs. shared vs. key-shared subscriptions. Acknowledgement modes. What happens when a consumer dies mid-processing. The chat app is incidental; the consumer lifecycle is the lesson.

Spring Boot image service (Java)

The reason this is Java at all is that I wanted one service in a language I don’t use day-to-day. The Pulsar Java client is the most mature of the three clients, Spring Boot is the default “production Java” framework, and the combination meant I’d learn something instead of cruising.

It consumes from an image topic, handles processing, and writes results back to S3. Less interesting than Whisper because the actual image work is boring — but the Java / Spring Boot / Pulsar integration was the point.

Infrastructure, in one sentence per component

Each one is a rabbit hole. Each rabbit hole is the point.

What I’d do differently

This is the section most side-project posts fake. I’m going to be direct:

This is the hub post. The individual deep dives live here:

And the repos: