CHUG

Yashaswi Mishra

February 2026

Source Code

Tech Stack

Frontend

TypeScript

Backend

PostgreSQL

Completion Status

Project completion100%

This project has reached its initial development goals.

CHUG: Streaming ETL That Handles Real-World Scale

Moving data from PostgreSQL to ClickHouse shouldn't require a PhD in distributed systems or an enterprise license. CHUG is a high-performance ETL pipeline designed for one thing: getting your data from point A to point B as fast as possible, with zero memory overhead.

The Problem

Analytics teams need data in ClickHouse for fast queries, but production data lives in PostgreSQL. The gap between these systems is where most ETL tools fall apart:

Load entire tables into memory, crash on large datasets
Complex DAG configurations for simple migrations
No visibility into progress or failures
Expensive enterprise pricing for basic functionality

CHUG takes a different approach: stream everything, keep memory constant, and provide real-time visibility into every operation.

Performance That Speaks for Itself

Benchmarked on real workloads:

Scenario	Throughput	Dataset
Local network	145,694 rows/sec	30M rows
Cross-region cloud	30,251 rows/sec	Remote sync
CDC detection latency	< 3 seconds	Polling-based

Memory usage remains flat regardless of table size. A 100GB migration uses the same memory as a 100MB one.

Architecture

CHUG is built on three pillars:

Streaming Extraction Data flows through the pipeline in configurable chunks. No loading entire tables into memory, no OOM kills at 3 AM.

Parallel Batch Insertion A 4-worker goroutine pool handles concurrent batch inserts to ClickHouse. Larger batches (2000+ rows) amortize network overhead effectively.

Connection Pooling Both source and target databases use connection pools, eliminating connection churn and improving throughput under sustained load.

Change Data Capture

CHUG supports continuous synchronization through delta column polling:

yaml
cdc:
  enabled: true
  poll_interval: 10s
  delta_column: updated_at

Monitor your PostgreSQL tables for changes and sync only what's new. Keep analytics data fresh without full table reloads.

Modern Web Interface

No more staring at terminal logs. CHUG includes a React-based web UI with:

Real-time progress tracking via WebSocket
Visual configuration for ingestion jobs
CDC status monitoring and controls
Connection testing for source and target databases

Built with React 18, TanStack Query, and Tailwind CSS.

CLI-First Design

For automation and scripting, CHUG provides a clean CLI interface:

bash
// Generate a sample configuration
chug sample-config > config.yaml

// Run ingestion from config
chug ingest --config config.yaml

// Start the web interface
chug web --port 8080

Zero-config quick start with sensible defaults. Full YAML configuration when you need fine-grained control.

Automatic Schema Mapping

PostgreSQL and ClickHouse have different type systems. CHUG handles the translation:

PostgreSQL	ClickHouse
UUID	UUID
TIMESTAMP	DateTime64
JSONB	String
TEXT[]	Array(String)
NUMERIC	Decimal

No manual type mapping required. Schema inference happens automatically during the first sync.

Deployment

Local Development

bash
docker-compose up -d

Spins up PostgreSQL, ClickHouse, and management UIs for testing.

Production Single binary, no runtime dependencies. Deploy in a container, run as a systemd service, or trigger via cron.

Why Go?

Go is uniquely suited for ETL workloads:

Goroutines make parallel batch processing trivial
Static binary deploys anywhere without dependencies
Memory efficiency keeps resource usage predictable
Strong typing catches schema issues at compile time
Performance approaches C++ for data-intensive operations

Technical Stack

Backend: Go, Gorilla WebSocket, Connection pooling for PostgreSQL and ClickHouse

Frontend: React 18, TypeScript, TanStack Query, Tailwind CSS, Lucide icons

Infrastructure: Docker Compose for local dev, single binary for production

Follow my journey

Buy me a coffeeSupport

Back to all projects