Tech Stack
Frontend
Backend
Completion Status
CHUG: Streaming ETL That Handles Real-World Scale
Moving data from PostgreSQL to ClickHouse shouldn't require a PhD in distributed systems or an enterprise license. CHUG is a high-performance ETL pipeline designed for one thing: getting your data from point A to point B as fast as possible, with zero memory overhead.
The Problem
Analytics teams need data in ClickHouse for fast queries, but production data lives in PostgreSQL. The gap between these systems is where most ETL tools fall apart:
- Load entire tables into memory, crash on large datasets
- Complex DAG configurations for simple migrations
- No visibility into progress or failures
- Expensive enterprise pricing for basic functionality
CHUG takes a different approach: stream everything, keep memory constant, and provide real-time visibility into every operation.
Performance That Speaks for Itself
Benchmarked on real workloads:
| Scenario | Throughput | Dataset |
|---|---|---|
| Local network | 145,694 rows/sec | 30M rows |
| Cross-region cloud | 30,251 rows/sec | Remote sync |
| CDC detection latency | < 3 seconds | Polling-based |
Memory usage remains flat regardless of table size. A 100GB migration uses the same memory as a 100MB one.
Architecture
CHUG is built on three pillars:
Streaming Extraction Data flows through the pipeline in configurable chunks. No loading entire tables into memory, no OOM kills at 3 AM.
Parallel Batch Insertion A 4-worker goroutine pool handles concurrent batch inserts to ClickHouse. Larger batches (2000+ rows) amortize network overhead effectively.
Connection Pooling Both source and target databases use connection pools, eliminating connection churn and improving throughput under sustained load.
Change Data Capture
CHUG supports continuous synchronization through delta column polling:
yamlcdc: enabled: true poll_interval: 10s delta_column: updated_at
Monitor your PostgreSQL tables for changes and sync only what's new. Keep analytics data fresh without full table reloads.
Modern Web Interface
No more staring at terminal logs. CHUG includes a React-based web UI with:
- Real-time progress tracking via WebSocket
- Visual configuration for ingestion jobs
- CDC status monitoring and controls
- Connection testing for source and target databases
Built with React 18, TanStack Query, and Tailwind CSS.
CLI-First Design
For automation and scripting, CHUG provides a clean CLI interface:
bash// Generate a sample configuration chug sample-config > config.yaml // Run ingestion from config chug ingest --config config.yaml // Start the web interface chug web --port 8080
Zero-config quick start with sensible defaults. Full YAML configuration when you need fine-grained control.
Automatic Schema Mapping
PostgreSQL and ClickHouse have different type systems. CHUG handles the translation:
| PostgreSQL | ClickHouse |
|---|---|
| UUID | UUID |
| TIMESTAMP | DateTime64 |
| JSONB | String |
| TEXT[] | Array(String) |
| NUMERIC | Decimal |
No manual type mapping required. Schema inference happens automatically during the first sync.
Deployment
Local Development
bashdocker-compose up -d
Spins up PostgreSQL, ClickHouse, and management UIs for testing.
Production Single binary, no runtime dependencies. Deploy in a container, run as a systemd service, or trigger via cron.
Why Go?
Go is uniquely suited for ETL workloads:
- Goroutines make parallel batch processing trivial
- Static binary deploys anywhere without dependencies
- Memory efficiency keeps resource usage predictable
- Strong typing catches schema issues at compile time
- Performance approaches C++ for data-intensive operations
Technical Stack
Backend: Go, Gorilla WebSocket, Connection pooling for PostgreSQL and ClickHouse
Frontend: React 18, TypeScript, TanStack Query, Tailwind CSS, Lucide icons
Infrastructure: Docker Compose for local dev, single binary for production