← All Deep Dives
Architecture Deep Dive

listmonk

High-performance self-hosted newsletter system · Go + PostgreSQL · github.com/knadh/listmonk

What is listmonk?

listmonk is a high-performance, self-hosted newsletter and mailing list manager. It ships as a single Go binary with an embedded Vue.js frontend, backed by PostgreSQL. It has demonstrated production workloads of 7+ million emails per campaign with peak RAM of ~57MB and fractional CPU usage. The project has 19k+ GitHub stars and is built by Kailash Nadh (CTO of Zerodha, India's largest stock broker).

Why Study This for System Design?

Single Binary Architecture
stuffbin embeds frontend assets, SQL, i18n into one binary. Zero external dependencies beyond Postgres.
Producer-Consumer Pipeline
Classic concurrent pipeline: DB → batch fetch → template render → rate limit → goroutine pool → SMTP.
Cursor-Based Pagination
Uses keyset pagination (last_subscriber_id) instead of OFFSET for O(1) page traversal at scale.
Pluggable Backends
Messenger interface, Media provider interface. Strategy pattern enables SMTP, HTTP webhooks, S3, filesystem.
Materialized Views
Dashboard stats precomputed via PostgreSQL materialized views. Cron-refreshed for large installations.
Graceful Hot Restart
SIGHUP → graceful shutdown → syscall.Exec() self-replace. Zero-downtime config reloads.

Tech Stack

LayerTechnologyRole
LanguageGoBackend, CLI, campaign engine
Web Frameworklabstack/echo v4HTTP routing, middleware
DatabasePostgreSQLAll persistent state, JSONB attributes
DB Driverjmoiron/sqlx + lib/pqSQL execution, struct scanning
SQL Managementknadh/goyesqlNamed SQL queries from .sql files
Configknadh/koanfMulti-source: TOML → env vars → DB
FrontendVue.js 3 + BuefyAdmin SPA dashboard
Asset Embedknadh/stuffbinEmbed frontend/SQL/i18n into binary
AuthSessions + OIDC + RBACCookie sessions, SSO, role perms
Templatinghtml/template + SprigDynamic email templates, 100+ funcs

System Architecture Diagram

Hover over each component to see its responsibilities:

Layered Architecture

listmonk follows a clean 4-layer architecture. Each layer has a single responsibility and communicates only with adjacent layers.

Layer 1: HTTP Handlers (cmd/*.go)

Thin handlers that parse HTTP requests (path params, query strings, JSON bodies), call the Core layer, and serialize responses. No business logic here. Each handler is a method on the App struct which holds references to all subsystems. Echo framework provides routing, middleware, and context.

Layer 2: Core Business Logic (internal/core/)

All domain operations: CRUD for subscribers, lists, campaigns, templates. The Core struct wraps the DB and query runner. This layer is pure Go with zero HTTP dependencies — it could be called from CLI, tests, or any other interface. It enforces validation, permission checks, and domain invariants.

Layer 3: Campaign Manager (internal/manager/)

The concurrent campaign processing engine. Runs as a long-lived goroutine that polls the DB for active campaigns. It owns the entire send pipeline: batch fetching, template rendering, rate limiting, worker pool dispatch, progress tracking, error handling. Completely rewritten in v3.0.0 for near-instant pause/cancel and lossless counting.

Layer 4: Data Layer (queries/*.sql + PostgreSQL)

All SQL lives in .sql files, loaded at startup via goyesql. No ORM. This gives full control over query optimization, PostgreSQL-specific features (JSONB, arrays, materialized views), and makes SQL reviewable and versionable. The sqlx library provides struct scanning.

Key Architectural Decision: The App Struct

type App struct {
    core       *core.Core           // Business logic layer
    fs         stuffbin.FileSystem   // Embedded filesystem (assets, SQL, i18n)
    db         *sqlx.DB              // PostgreSQL connection pool
    queries    *models.Queries       // Pre-loaded named SQL statements
    constants  *constants            // Runtime config snapshot
    manager    *manager.Manager      // Campaign processing engine
    importer   *subimporter.Importer // CSV/bulk import processor
    notifs     *notifs.Notifs        // Admin email notifications
    i18n       *i18n.I18n            // Internationalization
    bounceProc *bounce.Manager       // Bounce email processor
    captcha    *captcha.Captcha      // ALTCHA/hCaptcha
    auth       *auth.Auth            // Sessions + RBAC + OIDC
    events     *events.Events        // SSE event bus
    paginator  *paginator.Paginator  // Cursor/offset pagination
    log        *log.Logger
}

This is Go's idiomatic alternative to dependency injection containers. All subsystems are initialized in init.go and wired together via this struct. Handlers access them via a.core, a.manager, etc.

Database Schema (ER Diagram)

PostgreSQL schema with ~12 tables, JSONB for extensibility, materialized views for analytics:

Key Database Design Decisions

1. Custom ENUM Types (12 types)

PostgreSQL ENUMs enforce valid states at the DB level: campaign_status has 6 states (draft → running → scheduled → paused → cancelled → finished), subscriber_status has 3 (enabled, disabled, blocklisted), subscription_status tracks per-list relationship (unconfirmed, confirmed, unsubscribed). This is a state machine enforced by the database, not application code.

2. JSONB for Flexible Attributes

subscribers.attribs stores arbitrary key-value data as JSONB, enabling schema-less subscriber segmentation. You can query attribs->>'city' = 'Atlanta' with GIN indexes. The settings table stores all app config as JSONB key-value pairs, enabling runtime configuration changes through the UI without schema migrations.

3. Junction Table Pattern (subscriber_lists)

Many-to-many with a composite primary key PK(subscriber_id, list_id). The junction table carries its own state (subscription_status) and metadata (meta JSONB). Separate indexes on each FK column for efficient queries in both directions.

4. Keyset Pagination Columns

campaigns.last_subscriber_id and max_subscriber_id enable cursor-based pagination for campaign sending. Instead of OFFSET N (O(N) scan), it uses WHERE id > last_subscriber_id ORDER BY id LIMIT batch_size (O(1) via index). Critical for sending campaigns to millions of subscribers.

5. Materialized Views for Dashboard

Three materialized views precompute expensive aggregations: mat_dashboard_counts (subscriber/list/campaign totals), mat_dashboard_charts (30-day click/view trends), mat_list_subscriber_stats (per-list subscriber counts by status). Refreshed on cron schedule via REFRESH MATERIALIZED VIEW CONCURRENTLY. Orders-of-magnitude speedup for large databases.

6. Soft References & Denormalization

campaign_views.subscriber_id is nullable with ON DELETE SET NULL — if a subscriber is deleted, their view records persist for analytics. campaign_lists.list_name denormalizes the list name so campaign history survives list deletion. Preserves historical accuracy while allowing entity cleanup.

7. Indexing Strategy

Strategic indexes on: email (case-insensitive unique via LOWER(email)), status columns (for filtered scans), composite (id, status) for campaign batch fetching, DATE(created_at) expression indexes for time-series analytics. Partial unique index on templates(is_default) WHERE is_default = true ensures only one default template.

8. UUID + Serial ID Dual Identity

Internal operations use fast integer serial IDs for joins and pagination. External/public-facing operations use UUIDs (subscriber unsubscribe links, campaign archives, media references). Best of both worlds: performance internally, security externally (UUIDs aren't guessable).

Campaign Processing Pipeline

The campaign engine is the heart of listmonk. Rewritten in v3.0.0 for lossless operation:

1. Scan DB
Manager goroutine polls DB every few seconds for campaigns with status='running' or 'scheduled' (past send_at).
2. Fetch Batch
Queries subscribers in batch_size chunks (default 1000). Uses last_subscriber_id cursor for efficient keyset pagination — no OFFSET.
3. Template Render
For each subscriber: render Go template with subscriber attrs, campaign data, tracking URLs. Sprig functions available.
4. Rate Limit
Token bucket / sliding window rate limiter (configurable). Controls msgs/sec globally across all campaigns.
5. Concurrent Send
N goroutine workers (app.concurrency) pull from channel. Each calls Messenger.Push(). SMTP connection pool per server.
6. Track Progress
Atomic counter updates to campaigns.sent. max_subscriber_id for crash recovery. last_subscriber_id for resumption.
7. Handle Errors
Per-message retries (max_msg_retries). Tracks cumulative errors. Auto-pauses campaign at max_send_errors threshold.
8. Finish/Archive
Status → 'finished'. Optional public archive. Materialized views refreshed for dashboard stats.

Concurrency Model Deep Dive

Producer-Consumer with Channels

// Simplified mental model of the campaign engine

// Producer: fetches subscriber batches from DB
go func() {
    for {
        batch := db.Query("SELECT ... WHERE id > ? LIMIT ?",
                          lastSubID, batchSize)
        for _, sub := range batch {
            msg := renderTemplate(campaign, sub)
            msgChan <- msg  // Send to worker pool
        }
        lastSubID = batch[len(batch)-1].ID
        updateProgress(campaign, lastSubID)
    }
}()

// Consumer pool: N goroutines sending messages
for i := 0; i < concurrency; i++ {
    go func() {
        for msg := range msgChan {
            rateLimiter.Wait()          // Token bucket
            err := messenger.Push(msg)  // SMTP/HTTP
            if err != nil {
                handleRetry(msg, err)
            }
            atomic.AddInt64(&sent, 1)
        }
    }()
}

Rate Limiting Strategies

StrategyConfigBehavior
Fixed Rateapp.message_rate = 10Max 10 msgs/second globally. Token bucket.
Sliding Windowsliding_window + duration + rateMax N messages within rolling time window (e.g., 10000/hour).
Per-SMTP Limitssmtp[].max_conns = 10Connection pool per SMTP server. Backpressure via channel blocking.
Error Thresholdapp.max_send_errors = 1000Auto-pause campaign after N cumulative send failures.

Crash Recovery & Resumption

The campaign stores last_subscriber_id after each batch. On restart, campaigns with status='running' are automatically resumed from the last checkpoint. The v3.0.0 rewrite ensures every single message is counted (not approximated), making pause/resume lossless. The to_send field is computed at campaign start as the total subscriber count for the target lists.

SMTP Connection Pool

Each configured SMTP server maintains a pool of max_conns persistent TCP connections. Connections are reused across messages (SMTP pipelining). idle_timeout and wait_timeout control connection lifecycle. Multiple SMTP servers can be load-balanced under the default "email" messenger, or targeted individually by naming them.

Go Patterns & Best Practices

Interface-Based Abstractions
// Messenger interface (Strategy pattern)
type Messenger interface {
    Name() string
    Push(msg Message) error
    Flush() error
    Close() error
}
// Implementations: email.Emailer, postback.Postback
Clean interfaces for pluggable backends. SMTP emailer and HTTP webhook postback both implement Messenger. Media storage uses the same pattern (filesystem vs S3).
Struct Composition over Inheritance
type App struct {
    core    *core.Core
    manager *manager.Manager
    db      *sqlx.DB
    // ... all subsystems
}
// Methods: func (a *App) GetSubscribers(...)
No inheritance. The App struct composes all subsystems. Each subsystem is independently testable. Handler methods receive the full app context.
Named SQL via goyesql
-- queries/queries.sql
-- name: get-subscriber
SELECT * FROM subscribers WHERE id = $1;

-- name: get-campaign-subscribers
SELECT ... WHERE id > $1 ORDER BY id LIMIT $2;
SQL lives in .sql files with named tags. Loaded at startup, bound to prepared statements. No ORM overhead. Full PostgreSQL feature access. SQL is code-reviewable.
internal/ Package Privacy
internal/
  core/          # Business logic (unexportable)
  manager/       # Campaign engine (unexportable)
  bounce/        # Bounce processing
  auth/          # Authentication
  media/         # Media storage
  subimporter/   # CSV import
Go's internal/ convention makes these packages importable only by listmonk itself. Prevents external packages from depending on internals. Clean public API boundary.
Graceful Shutdown Pattern
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGHUP)

go func() {
    <-sigChan
    srv.Shutdown(ctx)     // HTTP
    manager.Close()       // Campaigns
    db.Close()            // Postgres
    syscall.Exec(...)     // Self-replace
}()
SIGHUP triggers graceful shutdown: HTTP server drains, campaign manager flushes, DB connections close. Then syscall.Exec() self-replaces the process for hot restart.
Configuration Layering (koanf)
// Precedence (last wins):
// 1. CLI flags (--config, --install)
// 2. TOML file (config.toml)
// 3. Env vars (LISTMONK_*)
// 4. Database settings table

ko.Load(file.Provider("config.toml"),
        toml.Parser())
ko.Load(env.Provider("LISTMONK_", ...))
Multi-source config via koanf library. Static TOML for infra, env vars for containers, DB settings table for runtime-changeable options (via admin UI).

Additional Go Idioms Used

PatternWhereWhy
Functional OptionsSMTP config, manager setupFlexible initialization without constructor explosion
Context PropagationHTTP handlers → Core → DBRequest-scoped deadlines, cancellation, auth info
sync.OnceTemplate compilation cachingThread-safe lazy initialization of expensive resources
atomic OperationsCampaign sent counterLock-free concurrent counter updates in worker pool
embed.FS (stuffbin)Static assets, SQL, i18nSingle binary deployment with zero external file deps
Error WrappingThroughoutfmt.Errorf with %w for error chain inspection
Table-Driven TestsCore packageDeclarative test cases with expected inputs/outputs
Middleware ChainEcho middlewareAuth → CORS → logging → rate limit → handler

Scalability & Performance at 1M+ Scale

Proven Production Numbers

listmonk.app states: "A production instance sending 7+ million emails. CPU usage is a fraction of a single core with peak RAM of 57 MB."

What Makes It Fast?

TechniqueImpactDetails
Keyset PaginationO(1) batch fetchWHERE id > cursor instead of OFFSET. Constant time regardless of dataset size.
Batch ProcessingAmortized DB costDefault batch_size=1000. One query fetches 1000 subscribers. Reduces round-trips 1000x.
Connection PoolingDB: 25, SMTP: 10/srvmax_open=25, max_idle=25 for Postgres. max_conns per SMTP server. Reuse over create.
Materialized ViewsInstant dashboardsPre-aggregated stats. REFRESH CONCURRENTLY allows reads during rebuild.
Template CachingZero recompileTemplates compiled once at startup, cached in memory. Re-compiled only on update.
Goroutine PoolBounded concurrencyFixed pool (default 10 workers). No goroutine leak. Channel backpressure for flow control.
Streaming ExportConstant memorySubscriber export writes CSV rows as they're fetched. No full-dataset buffering.
Expression IndexesDATE() fastidx_clicks_date ON (TIMEZONE('UTC', created_at)::DATE) avoids per-row function eval.
Single BinaryFast startupNo file I/O for assets. stuffbin serves from memory. Startup in milliseconds.
Cache Slow QueriesConfigurablev3+ option: enable/disable query caching with custom cron interval for large DBs.

Bottlenecks & Scaling Limits

BottleneckLimitMitigation
Single Postgres~10M subs before slowdownMaterialized views, cache_slow_queries, read replicas
Single ProcessNo horizontal scaling built-inRun --passive for read-only replicas. One active sender.
SMTP Rate LimitsProvider-imposed (SES: 14/sec)Sliding window limiter, multiple SMTPs, message_rate config
Template RenderingCPU-bound for complex templatesKeep templates simple. Goroutine pool bounds CPU.
link_clicks TableCan grow to billions of rowsDATE expression index. Consider partitioning at scale.

What Would You Do to Scale to 100M+ Subscribers?

Great system design interview follow-up:

Shard Postgres
Hash-based sharding on subscriber_id. Each shard handles batch fetches independently.
Add Message Queue
Kafka/SQS between batch fetch and send. Decouple producers from consumers. Enable multi-node senders.
Partition Analytics
Time-partition link_clicks, campaign_views by month. Auto-drop old partitions.
Horizontal Senders
Multiple sender nodes consuming from shared queue. Coordinator assigns campaign segments.
Cache Layer
Redis for hot subscriber lookups, template cache, rate limiter state across nodes.
CDN for Archives
Public campaign archives served from CDN. Static generation for high-traffic newsletters.

What Happens When 1 Million Requests Hit listmonk?

listmonk handles 6 fundamentally different request types. Each follows a different hot path through the system. Understanding these flows is critical for system design interviews — an interviewer will ask "walk me through what happens when..." and expect you to trace from TCP accept to DB write to response.

Key Insight: listmonk is NOT a typical CRUD API under load. Campaign sending is a push pipeline (server-initiated, async). Tracking pixels and link clicks are the real high-throughput inbound paths — these get 1M+ hits per campaign.

Flow 1: Tracking Pixel Open (Highest Volume — 1M+ per campaign)

When a subscriber opens an email, their client loads a 1x1 transparent PNG. This is the hottest path in the system.

1. EMAIL CLIENT ─── GET /campaign/{campaignUUID}/{subscriberUUID}/px.png ───▶ ECHO ROUTER
      │
      │  // No auth middleware — public endpoint. No session lookup.2. ECHO ROUTER ─── matches route "/campaign/:campUUID/:subUUID/px.png" ───▶ HANDLER: handleCampaignPixel()
      │
      │  // Parse UUIDs from path params. Validate format (fast regex). No DB lookup yet.3. HANDLER ─── checks privacy.disable_tracking setting ───▶ DECISION POINT
      │
      ├── IF tracking disabled: Return 1x1 PNG immediately. No DB write. O(1).
      │
      ├── IF privacy.individual_tracking = false:
      │     Insert into campaign_views with subscriber_id = NULL (anonymous).
      │     // Still counts the view, but can't attribute to a subscriber.
      │
      └── IF individual tracking enabled:
            │
            ▼
4. DB INSERT ─── INSERT INTO campaign_views (campaign_id, subscriber_id, created_at) ───▶ POSTGRESQL
      │
      │  // campaign_id resolved from UUID via campaigns table lookup (indexed).// subscriber_id resolved from UUID via subscribers table lookup (indexed).// Two UUID→ID lookups + one INSERT. Total: 3 queries.// campaign_views has BIGSERIAL PK — write-optimized append-only table.5. RESPONSE ─── 200 OK, Content-Type: image/png, body: 1x1 transparent PNG (68 bytes) ───▶ EMAIL CLIENT
At 1M Pixel RequestsWhat HappensBottleneck
Echo HTTP ServerGoroutine-per-request model. 1M requests = 1M goroutines (but short-lived, ~2KB each). Echo's radix tree router matches in O(1). No middleware overhead on public routes (no auth, no session).Not a bottleneck — Go HTTP server handles 100k+ req/s on a single core.
UUID → ID LookupsTwo indexed lookups: campaigns(uuid) and subscribers(uuid). Both have UNIQUE indexes. B-tree lookup = O(log N).At 1M req/s this is 2M index lookups/sec. Could become the bottleneck. Fix: cache UUID→ID mapping in-process (Go map with RWMutex, or sync.Map).
campaign_views INSERTs1M INSERT operations. BIGSERIAL PK = append-only. No index updates except on campaign_id and subscriber_id FK indexes. DATE expression index updated per row.High write pressure. PostgreSQL can do ~10-50k INSERTs/sec depending on hardware. 1M requests would need: batch inserts, async buffering, or write-ahead table with periodic flush.
Connection Pool25 connections shared across all goroutines. Each INSERT holds a connection for ~1ms. Effective throughput: ~25,000 INSERTs/sec.Pool exhaustion at high concurrency. Goroutines block waiting for free connection. Solution: increase max_open, or buffer writes in a Go channel and batch-insert.
Response68-byte PNG from memory (hardcoded). No disk I/O. No template rendering. Fastest possible response after DB write.Not a bottleneck.
How You'd Optimize This at Scale
1) Buffer pixel events in a Go channel, batch INSERT every 100ms or every 1000 events (whichever comes first). 2) Cache UUID→ID mappings in-process with TTL. 3) If tracking is anonymous, skip subscriber UUID lookup entirely. 4) Use COPY instead of INSERT for batch writes (10x faster). 5) Partition campaign_views by campaign_id or date. 6) Consider async write: return 200 immediately, write to DB in background goroutine (accept small data loss risk).

Flow 2: Link Click Tracking (High Volume — 100K+ per campaign)

Every link in a campaign email is wrapped. When a subscriber clicks, they hit listmonk first, which records the click then redirects to the actual URL.

1. BROWSER ─── GET /link/{linkUUID}/{campaignUUID}/{subscriberUUID} ───▶ ECHO ROUTER
      │
      │  // Public endpoint. No auth. Three UUIDs in path.2. HANDLER ─── handleLinkRedirect()
      │
      ├── Resolve link UUID → link record (get actual URL)// links table: url TEXT NOT NULL UNIQUE. UUID indexed.
      │
      ├── Resolve campaign UUID → campaign_id
      │
      ├── Resolve subscriber UUID → subscriber_id (if individual tracking on)
      │
      ├── INSERT INTO link_clicks (campaign_id, link_id, subscriber_id)// BIGSERIAL PK. Indexes on campaign_id, link_id, subscriber_id, DATE.// 4 index updates per INSERT. Heavier than campaign_views.
      │
      └── 302 REDIRECT → actual URL
            // User sees the destination page. Redirect is instant.
At 1M Click RequestsWhat HappensOptimization
3 UUID LookupsLink, campaign, and subscriber UUIDs resolved to integer IDs. Three indexed lookups per request.Cache link UUID→(id, url) in-process. Links are immutable once created — perfect cache candidate with no invalidation needed.
link_clicks INSERTHeavier than campaign_views: 4 index updates per row (campaign_id, link_id, subscriber_id, DATE expression index).Batch inserts via buffered channel. COPY command for bulk. Consider unlogged table for click data if durability isn't critical.
Redirect LatencyUser-facing! The subscriber waits for the 302. DB insert is on the critical path — if DB is slow, user sees delay before reaching destination.Move INSERT off the critical path: write to in-memory buffer, return 302 immediately, flush to DB async. Accept ~1s data delay for instant redirect.
Table Growthlink_clicks grows unboundedly. 1M clicks/campaign × 100 campaigns = 100M rows. DATE expression index helps but table gets large.Time-based partitioning (monthly). Drop partitions older than retention period. Or archive to cold storage.

Flow 3: Campaign Send Pipeline (1M Outbound Emails)

This is not a request flow — it's a server-initiated push pipeline. But it's what people mean by "1M requests" in the context of listmonk.

MANAGER GOROUTINE (long-lived, polls DB every ~5s)
      │
      ├── SELECT campaigns WHERE status IN ('running','scheduled')// status column indexed. Cheap scan — usually 0-5 active campaigns.
      │
      ▼  For each active campaign:
BATCH PRODUCER (one goroutine per campaign)
      │
      │   // Loop until all subscribers processed:
      ├── SELECT subscribers WHERE id > last_subscriber_id  AND id IN (subscriber_lists WHERE list_id IN campaign_lists)  AND status = 'enabled'  ORDER BY id LIMIT 1000  // ← keyset pagination, O(1) via PK index
      │
      │   // For each subscriber in batch:
      ├── TEMPLATE RENDER ─── Go html/template.Execute()
      │     // Inject: subscriber.name, subscriber.attribs, campaign.subject// Generate: tracking pixel URL, wrapped link URLs// Sprig functions available. Compiled template cached (sync.Once).// CPU cost: ~0.1ms per render for simple templates.
      │
      ├── msgChan <- message  // Send to buffered channel. Blocks if channel full (backpressure).
      │
      └── UPDATE campaigns SET last_subscriber_id = ?, sent = ?  // Checkpoint after batch

                          ║ channel (buffer = batch_size)
                          ▼
WORKER POOL (N = app.concurrency, default 10 goroutines)
      │
      │   // Each worker loops: for msg := range msgChan { ... }
      │
      ├── RATE LIMITER ─── rateLimiter.Wait()
      │     // Token bucket: app.message_rate tokens/sec// OR sliding window: app.message_sliding_window_rate per window_duration// Blocks goroutine until token available. Natural throttle.
      │
      ├── MESSENGER.Push(msg) ─── SMTP connection from pool
      │     // Acquires connection from pool (max_conns per SMTP server)// SMTP EHLO → MAIL FROM → RCPT TO → DATA → message body → QUIT// Connection reused for next message (persistent TCP). Pipelining.// On failure: retry up to max_msg_retries times
      │
      ├── atomic.AddInt64(&sent, 1)  // Lock-free counter update
      │
      └── ON ERROR: increment error counter. If errors > max_send_errors → PAUSE campaign

1M Emails — Time & Resource Breakdown

PhaseOperationsTime @ DefaultsResource
DB Fetch1M / 1000 batch_size = 1000 queries~1ms/query × 1000 = ~1 second total1 DB connection (sequential per campaign)
Template Render1M renders × ~0.1ms each~100 seconds (single-threaded per campaign)CPU-bound. ~1 core for simple templates.
Rate Limit Wait1M msgs ÷ 10 msg/sec~27.7 hours at default message_rate=10Near-zero (goroutine sleep)
SMTP Send1M connections (reused) × ~50ms avg10 workers × 50ms = ~200 msg/sec → ~83 minutes10 SMTP TCP connections
Progress Updates1000 checkpoint UPDATEs~1ms each = ~1 second total1 DB connection
Memory10 goroutines + 1000-msg channel buffer + template cache~50-60 MB peak (proven at 7M scale)
The Real Bottleneck: Rate Limiting, Not Go
At default settings (message_rate=10), 1M emails takes ~28 hours. Go processes them in minutes — the rate limiter is the intentional brake. This is by design: SMTP providers (SES, SendGrid) impose rate limits (SES default: 14/sec). Sending faster than the provider allows means bounced connections and IP reputation damage. Production tuning: set message_rate=100-500 with SES production access, concurrency=50, multiple SMTP servers. 1M emails in ~30 minutes.
Tuned Configuration for 1M Emails
concurrency=50, message_rate=500, batch_size=5000, max_conns=20 per SMTP × 3 SMTP servers = 60 total connections. Estimated throughput: ~500 msgs/sec = 1M emails in ~33 minutes. Memory stays under 100MB. CPU usage: 2-3 cores.

Flow 4: Public Subscription Form Submit

When a user subscribes via a public form on your website. Lower volume but has more steps.

1. BROWSER ─── POST /subscription/form ───▶ ECHO ROUTER
      │
      ├── CAPTCHA VERIFY ─── ALTCHA proof-of-work check OR hCaptcha API call
      │     // ALTCHA: CPU verification, no external API call (~1ms)// hCaptcha: external HTTP call to verify token (~100-300ms)
      │
      ├── DOMAIN FILTER ─── check email domain against blocklist/allowlist
      │     // privacy.domain_blocklist: ["*.disposable.com", "tempmail.org"]// In-memory check. O(N) against list but lists are tiny.
      │
      ├── UPSERT SUBSCRIBER ─── INSERT ... ON CONFLICT (email) DO UPDATE
      │     // Case-insensitive: idx_subs_email ON LOWER(email)// If exists: update name/attribs. If new: create with UUID.
      │
      ├── INSERT subscriber_lists ─── subscribe to requested lists
      │     // Status = 'unconfirmed' for double-optin lists// Status = 'confirmed' for single-optin lists
      │
      ├── IF double-optin: SEND CONFIRMATION EMAIL// Render optin template with confirmation link// Push to SMTP via messenger (same pool as campaigns)// Confirmation link: /subscription/optin/{subscriberUUID}/{listUUID}
      │
      └── 200 OK ─── render success template page

Flow 5: REST API Request (Admin/Programmatic)

API-triggered operations: creating subscribers, managing lists, triggering transactional emails. Authenticated path.

1. CLIENT ─── GET /api/subscribers?page=1&per_page=50 ───▶ ECHO ROUTER
      │
      │  // Authorization header: "username:api-token" (base64)2. MIDDLEWARE CHAIN
      │
      ├── Auth Middleware ─── validate API token
      │     // Lookup user by username in users table (indexed)// Verify token (bcrypt compare or direct match for API users)// Load user role + permissions from roles table// Set auth context on echo.Context
      │
      ├── Permission Check ─── does user have "subscribers:get_all" permission?
      │     // Check user_role permissions array. If list-scoped, filter by allowed lists.
      │
      └── CORS Middleware ─── check Origin header against security.cors_origins
      ▼
3. HANDLER ─── GetSubscribers()
      │
      ├── Parse query params ─── page, per_page, query, list_id, order_by
      │
      ├── Build SQL ─── dynamic WHERE clause from search/filter params
      │     // If "query" is plain text: ILIKE search on email and name// If "query" starts with SQL expression: parse as raw WHERE clause// Permission-filtered: only show subscribers in user's allowed lists
      │
      ├── EXECUTE SQL ─── via prepared statement from goyesql
      │     // Uses sqlx.Select() for struct scanning// COUNT(*) for total (separate query)// OFFSET/LIMIT pagination for API (not keyset — acceptable for admin UI)
      │
      └── 200 OK ─── JSON response with data[] + total + per_page + page
At 1M API RequestsBottleneck Analysis
Auth OverheadEvery API request hits the DB for user/role lookup. At 1M req: 1M user queries + 1M role queries. Fix: cache authenticated sessions in-process with TTL. API tokens are static — perfect for caching.
OFFSET PaginationAPI uses OFFSET/LIMIT (not keyset). page=10000&per_page=50 means scanning 500K rows. Degrades linearly. Acceptable for admin UI (low page numbers) but problematic for bulk API consumers.
No API Rate LimitingNo per-consumer rate limiting on the API. A misbehaving client can exhaust the DB connection pool. Fix: per-token rate limiter middleware (token bucket or sliding window).
Connection Pool ContentionAPI, campaign engine, tracking pixels, and bounce processor all share the same 25-connection pool. Under 1M API requests, campaign sending would slow down. Fix: separate pools or connection prioritization.

Flow 6: Bounce Webhook (SES/SendGrid/Postmark)

After a campaign, bounce notifications flow back from email providers. Volume scales with send volume — expect 2-5% bounce rate.

1. SES/SENDGRID ─── POST /webhooks/bounce/{type} ───▶ ECHO ROUTER
      │
      ├── Authenticate webhook ─── verify provider-specific auth
      │     // SES: verify SNS signature. SendGrid: verify key. Postmark: basic auth.
      │
      ├── Parse bounce payload ─── extract: email, bounce type, campaign ID
      │     // Normalize across providers into internal bounce_type ENUM
      │
      ├── INSERT INTO bounces ─── (subscriber_id, campaign_id, type, source, meta)
      │     // Subscriber looked up by email. campaign_id nullable (might be unknown).
      │
      ├── CHECK THRESHOLD ─── count bounces for this subscriber by type
      │     // SELECT COUNT(*) FROM bounces WHERE subscriber_id = ? AND type = ?// Compare against configured actions: hard.count=1, soft.count=2
      │
      └── IF threshold exceeded: UPDATE subscribers SET status = 'blocklisted'
            // Or DELETE subscriber if action = 'delete'
            // Cascading: subscriber_lists entries also cleaned up (FK CASCADE)

Flow 7: Transactional Email API

Single message sends triggered by your application (welcome emails, password resets, order confirmations). Different from campaign sends — synchronous, one-at-a-time.

1. YOUR APP ─── POST /api/tx ───▶ ECHO ROUTER
      │
      │  // Body: { "subscriber_email": "...", "template_id": 5, "data": {...} }// Auth: API token (required)2. HANDLER
      │
      ├── Resolve subscriber ─── lookup by email or ID
      ├── Load TX template ─── from template cache (in-memory)
      ├── Render template ─── with subscriber data + custom data payload
      ├── messenger.Push(msg) ─── synchronous SMTP send
      │     // Blocks until SMTP ACK or error. Caller waits.// No rate limiting — each TX call = one immediate send.
      └── 200 OK ─── { "data": true }
At 1M TX RequestsWhat Breaks
Synchronous SMTPEach TX call blocks until SMTP responds (~50-200ms). With 25 DB connections, max concurrent TX sends = ~25. At 200ms each = ~125 TX/sec. 1M would take ~2.2 hours. Fix: async queue with delivery confirmation callback.
No TX Rate LimitingTX calls bypass the campaign rate limiter. A burst of 10K TX calls would exhaust SMTP connections. Fix: separate TX rate limiter or shared token bucket.
Shared SMTP PoolTX emails share the same SMTP connection pool as campaigns. Heavy TX load can starve campaign sending. Fix: dedicated SMTP server for TX (use named messenger).

Request Flow Summary — All 7 Flows at 1M Scale

FlowTypeVolume ProfileHot Path CostPrimary BottleneckDB Queries/Req
Tracking PixelInbound GET1M+ per large campaign2 UUID lookups + 1 INSERTDB write throughput3
Link ClickInbound GET → 302100K-500K per campaign3 UUID lookups + 1 INSERTDB write + redirect latency4
Campaign SendOutbound push1M emails (async)Batch fetch + render + SMTPRate limiter (intentional)~1 per 1000
SubscriptionInbound POST100-10K/dayCAPTCHA + upsert + optin emailCAPTCHA verification2-4
API CRUDInbound RESTDepends on integrationAuth + query + JSON marshalAuth DB lookup (cacheable)3-5
Bounce WebhookInbound POST2-5% of send volumeAuth + INSERT + threshold checkNegligible at normal rates3-4
TX EmailInbound POST → SMTPDepends on appAuth + render + sync SMTPSynchronous SMTP blocking2-3

Will Goroutines Run Into Memory Errors?

Short answer: the campaign engine is safe, but the HTTP server is not protected. listmonk has two completely different goroutine models running simultaneously, and they have very different risk profiles.

Go Goroutine Memory Model — The Basics

PropertyValueWhy It Matters
Initial Stack Size2 KB (Go 1.4+)A goroutine starts tiny. 1000 idle goroutines = ~2 MB. Cheap to create.
Stack GrowthDynamically grows up to 1 GB (default)Stack doubles when needed (copy-on-grow). A goroutine doing real work (allocating buffers, rendering templates, building HTTP responses) can grow to 8-64 KB each.
Heap AllocationsVaries per goroutine workloadTemplate rendering, JSON marshaling, SQL result scanning all allocate on the heap. These are the real memory consumers — not the goroutine stack itself.
GC PressureGo GC runs concurrentlyHigh allocation rate from 100K+ goroutines triggers frequent GC cycles. GC latency spikes (STW pauses ~1-5ms) can compound under load.
OS ThreadsGOMAXPROCS (default = num CPUs)Goroutines are multiplexed onto OS threads. 1M goroutines still only uses ~8-16 OS threads. This is NOT the bottleneck.

The Two Goroutine Models in listmonk

✓ SAFE: Campaign Engine (Bounded Pool)
Model: Fixed worker pool of N goroutines (default 10, configurable via app.concurrency).
Bounded by: Channel buffer (batch_size) + fixed worker count. Even sending 7M emails, there are only 10-50 goroutines alive.
Memory per campaign: ~10 workers × ~64KB stack + channel buffer of 1000 messages × ~2KB each = ~2.6 MB total.
Why it's safe: Producer blocks on channel send when buffer is full (backpressure). Workers block on rate limiter. Goroutine count never exceeds concurrency. You cannot OOM from campaign sending.
The pattern:
// Fixed pool — goroutine count = concurrency (constant)
for i := 0; i < concurrency; i++ {
    go worker(msgChan)  // Exactly N goroutines. No more.
}
// Producer blocks if channel full — natural backpressure
msgChan <- msg  // Blocks here, NOT by spawning new goroutines
⚠ UNBOUNDED: HTTP Server (Goroutine-Per-Request)
Model: Go's net/http server spawns one goroutine per incoming connection. Echo sits on top of this. No built-in limit.
Bounded by: Nothing in listmonk's code. If 100K tracking pixel requests arrive simultaneously, Go creates 100K goroutines.
Memory per request: ~8-64KB stack + ~2-10KB heap (UUID parsing, DB result scanning, response writing) = ~50-70KB per concurrent request.
At 100K concurrent: 100K × 70KB = ~7 GB. At 500K concurrent: ~35 GB. At 1M concurrent: ~70 GB → OOM on most machines.
The pattern:
// Go's net/http — UNBOUNDED goroutine creation
func (srv *Server) Serve(l net.Listener) error {
    for {
        conn, _ := l.Accept()
        go srv.serve(conn)  // New goroutine for EVERY connection!
    }                        // No limit. No backpressure.
}

When Does OOM Actually Happen?

The key distinction is concurrent vs total requests. 1M requests over an hour is fine. 1M requests in 1 second is a problem.

ScenarioConcurrent GoroutinesMemoryOOM Risk
Campaign send: 1M emails10-50 (fixed pool)~3-10 MBNone — bounded by design
Tracking pixels: 1M over 24 hours
~12 req/sec avg
~12-50~1-3 MBNone — requests complete fast (~5ms each)
Tracking pixels: 1M in 1 hour
~278 req/sec avg
~278-1000~20-70 MBLow — manageable if DB keeps up
Tracking pixels: 1M in 1 minute
~16,667 req/sec
~5,000-50,000~350 MB - 3.5 GBMedium — DB becomes bottleneck, goroutines pile up waiting for connections
Tracking pixels: 100K concurrent
spike / DDoS / viral email
100,000~7 GBHigh — goroutines block on 25-conn DB pool, stack up in memory
Tracking pixels: 1M concurrent
extreme / unrealistic
1,000,000~70 GBOOM crash — Go will allocate until killed by OS

The Real Danger: Goroutine Pile-Up on DB Pool

The OOM risk isn't from goroutine creation — it's from goroutine accumulation. Here's the cascade failure:

CASCADE FAILURE SCENARIO — 50K req/sec tracking pixel burst

t=0s    50,000 requests arrive. Go spawns 50,000 goroutines. (~3.5 GB)
          │
          ▼
t=0.001s All 50K goroutines try to acquire a DB connection from pool (max_open=25).
          25 goroutines get connections. 49,975 goroutines BLOCK waiting.
          │
          ▼
t=0.005s Each DB query takes ~5ms. First 25 connections freed. Next 25 goroutines proceed.
          But another 50K requests arrived! Now 99,950 goroutines blocked. (~7 GB)
          │
          ▼
t=1.0s   At 50K/sec inflow, 25 conn pool processes ~5000 req/sec (25 × 200 queries/sec).
          Deficit: 45,000 goroutines/sec accumulating. After 1 second: ~45K blocked.
          │
          ▼
t=10s    ~450K goroutines blocked. Memory: ~30 GB. GC thrashing. Latency spikes.
          │
          ▼
t=20s    ~900K goroutines. OOM killer triggers. Process killed.
          Campaign sending (which shares the same process) dies too.

What listmonk Does NOT Have (Protection Gaps)

Missing ProtectionConsequenceWhat You'd Add
No HTTP connection limitUnbounded goroutine creation on burst trafficEcho middleware: middleware.RateLimiter() or custom semaphore. Reject with 429 when concurrent > threshold. Or server.MaxConnsPerHost.
No request queue / shedAll requests accepted even when DB pool is full — they just blockLoad shedding: if DB pool queue > N, return 503 immediately. Fail fast instead of accumulate. context.WithTimeout on DB calls.
No goroutine budgetNo visibility into goroutine count. No alarm threshold.Expose runtime.NumGoroutine() as Prometheus metric. Alert when > 10K. Circuit break at 50K.
No request timeoutIf DB is slow, goroutines hang indefinitely holding memoryecho.Middleware(TimeoutMiddleware(5 * time.Second)). Or http.Server{ReadTimeout: 10s, WriteTimeout: 10s}.
No async write pathEvery tracking pixel does a synchronous DB INSERT before respondingWrite to in-memory ring buffer. Background goroutine batch-flushes to DB every 100ms. Decouple request handling from DB writes.

How You'd Fix This — Production-Grade Architecture

PRODUCTION FIX: Bounded Concurrency + Async Writes

// 1. Add server-level connection limits
srv := &http.Server{
    Addr:         ":9000",
    ReadTimeout:  10 * time.Second,   // Prevent slow-read attacks
    WriteTimeout: 15 * time.Second,   // Prevent goroutine hang on slow clients
    IdleTimeout:  60 * time.Second,   // Close idle keep-alive connections
    MaxHeaderBytes: 1 << 20,          // 1MB header limit
}

// 2. Add concurrency limiter middleware
sem := make(chan struct{}, 10000)      // Max 10K concurrent requests
e.Use(func(next echo.HandlerFunc) echo.HandlerFunc {
    return func(c echo.Context) error {
        select {
        case sem <- struct{}{}:
            defer func() { <-sem }()
            return next(c)
        default:
            return c.String(503, "server busy")  // Load shedding
        }
    }
})

// 3. Async tracking writes (decouple from request path)
trackChan := make(chan TrackEvent, 100000)  // Buffered channel

// Handler: write to channel, return immediately
func handlePixel(c echo.Context) error {
    select {
    case trackChan <- TrackEvent{campID, subID}:
        // Queued successfully
    default:
        // Channel full — drop event (acceptable for analytics)
    }
    return c.Blob(200, "image/png", pixel1x1)  // Instant response
}

// Background flusher: batch inserts every 100ms
go func() {
    ticker := time.NewTicker(100 * time.Millisecond)
    batch := make([]TrackEvent, 0, 1000)
    for {
        select {
        case ev := <-trackChan:
            batch = append(batch, ev)
            if len(batch) >= 1000 {
                flushBatch(batch)        // COPY INTO campaign_views
                batch = batch[:0]
            }
        case <-ticker.C:
            if len(batch) > 0 {
                flushBatch(batch)
                batch = batch[:0]
            }
        }
    }
}()

// 4. DB call timeouts
ctx, cancel := context.WithTimeout(c.Request().Context(), 3*time.Second)
defer cancel()
db.QueryContext(ctx, query, args...)   // Cancels if DB slow

// 5. Monitor goroutine count
go func() {
    for range time.Tick(5 * time.Second) {
        n := runtime.NumGoroutine()
        metrics.Gauge("goroutines", n)  // Prometheus metric
        if n > 50000 {
            log.Error("goroutine count critical", "count", n)
        }
    }
}()

Interview Answer: "How Does Go Handle 1M Concurrent Requests?"

The nuanced answer: Go's goroutine-per-request model works brilliantly when requests are fast (sub-millisecond). A goroutine costs ~2KB initially, so 10K concurrent = ~20MB — trivial. The danger is when goroutines block on shared resources — specifically the database connection pool. If your pool has 25 connections and 50K goroutines are waiting, you have 50K goroutines accumulating at ~50-70KB each = ~3.5GB. The goroutine count grows at (request rate - DB throughput) per second.

listmonk's campaign engine avoids this by using a fixed goroutine pool with channel backpressure — the producer-consumer pattern. But the HTTP server uses Go's default unbounded model with no concurrency limit, no request timeout, and no load shedding.

The production fix is three layers: (1) Server-level timeouts (ReadTimeout, WriteTimeout) prevent slow clients from holding goroutines. (2) Concurrency limiter middleware (semaphore channel) caps concurrent requests and returns 503 when saturated. (3) Async write path for hot paths (tracking pixels, link clicks) — decouple the DB write from the HTTP response using a buffered channel with a background batch flusher. This turns a 5ms synchronous DB call into a <50µs channel send.

Contention & Concurrency — Where Things Fight Over Shared Resources

Concurrency is about structure (multiple things can run). Contention is about conflict (multiple things fighting for the same resource). listmonk has both, and understanding the contention points is what separates a senior answer from a junior one in interviews.

Resource Contention Map — Every Shared Resource in listmonk

Shared ResourceWho CompetesContention TypeProtection MechanismRisk Level
PostgreSQL Connection Pool
max_open = 25
HTTP handlers, Campaign engine, Bounce processor, Importer, Cron jobs (matview refresh)Mutex-like (pool internal lock)Go's database/sql pool with internal mutex. Goroutines block on db.Conn() when pool exhausted.HIGH — This is the #1 contention point. All subsystems share one pool. No priority, no isolation.
SMTP Connection Pool
max_conns per server
Campaign workers, TX email handler, Optin confirmation sender, Notification senderChannel-based semaphoreBuffered channel acts as connection pool. Workers block on channel receive when all connections in use.MEDIUM — Campaign workers dominate. TX emails can starve during active campaigns.
Campaign Message Channel
buffer = batch_size
Batch producer (1 goroutine) vs Worker pool (N goroutines)Channel (CSP)Go buffered channel. Producer blocks on send if full. Workers block on receive if empty. Lock-free.LOW — By design. Channel provides natural flow control. No contention, only coordination.
Campaign Sent Counter
shared int64
All N worker goroutines (10-50 concurrent)Atomic CASatomic.AddInt64(&sent, 1). Lock-free compare-and-swap at hardware level. No mutex.NONE — Atomic operations have zero contention overhead. O(1) per operation regardless of concurrency.
Template Cache
compiled templates in memory
Campaign workers (read) vs Admin updating template (write)sync.Once / recompileTemplates compiled once (sync.Once). On admin update: recompile and swap pointer. Workers read stale until swap. No read-side lock.NONE during normal operation. Momentary during recompile (new pointer swap is atomic).
Settings / Config
app.constants struct
All handlers (read) vs Settings update (write → restart)Process restartSettings changes trigger SIGHUP → full process restart. No concurrent read/write possible — the entire process is replaced via syscall.Exec().NONE — listmonk avoids the problem entirely by restarting. No RWMutex needed.
SSE Event Bus
events.Events
Campaign manager (publish) vs Browser clients (subscribe)Channel per subscriberEach SSE client gets its own channel. Publisher fans out to all subscriber channels. No shared state between clients.LOW — Fan-out pattern. Publisher may slow if a client channel is full (slow consumer).
CSV Import Processor
single goroutine
Import goroutine vs HTTP API (checking import status)Mutex (likely)Importer runs as a single goroutine. Status checked via API. Likely uses a mutex or atomic for progress state.LOW — Single writer, occasional reader. Minimal contention.

DB Pool Contention — The Priority Inversion Problem

This is the most important contention point in listmonk and a great interview discussion topic:

PRIORITY INVERSION: All subsystems share 25 DB connections

┌─────────────────────────────────────────────────────────┐
│                  sql.DB Pool (max_open=25)               │
│   ┌────┐┌────┐┌────┐┌────┐┌────┐  ...  ┌────┐          │
│   │conn││conn││conn││conn││conn│       │conn│  ×25     │
│   └──┬─┘└──┬─┘└──┬─┘└──┬─┘└──┬─┘       └──┬─┘          │
└──────┼─────┼─────┼─────┼─────┼──────────────┼───────────┘
       │     │     │     │     │              │
       ▼     ▼     ▼     ▼     ▼              ▼
  Campaign Campaign Pixel  Pixel  API          Bounce
  Batch1   Batch2   Track  Track  List          Check

Problem scenarios:

1. Campaign sending large batches holds connections for batch SELECT (~5-50ms).
   Meanwhile, admin API requests queue behind campaign queries.
   Admin dashboard feels slow during active campaigns.

2. 1M tracking pixel INSERTs saturate the pool.
   Campaign batch fetch can't get a connection.
   Campaign throughput drops. Send time extends.

3. Materialized view REFRESH (cron job) takes 30-60 seconds.
   Holds 1 connection during entire refresh.
   24 connections left for everything else.

4. Bulk subscriber import doing 10K upserts.
   Competes with campaign sending for connections.
   Both slow down.
Fix StrategyHowTrade-off
Separate Connection PoolsCreate 3 sql.DB instances: one for campaign engine (10 conns), one for HTTP handlers (10 conns), one for background jobs (5 conns). Each with independent max_open.More total connections to Postgres. Needs max_connections increase on DB side. Slightly more memory.
Connection PriorityCustom pool wrapper that reserves N connections for high-priority callers (campaign engine). HTTP requests use remaining. Implement with two semaphores.Complex. Can cause HTTP starvation if campaign is too aggressive.
Read Replica SplitRoute all SELECT queries (subscriber lookups, dashboard, API reads) to a read replica. Writes (INSERTs, UPDATEs) go to primary.Replication lag (milliseconds). Needs application-level routing. listmonk doesn't support this natively.
Context Timeoutscontext.WithTimeout(ctx, 3*time.Second) on all DB calls. If a connection isn't available within 3s, fail fast with 503 instead of blocking.Requests fail under load instead of queuing. Better for latency SLOs. Some data operations may need longer timeouts.
PgBouncerExternal connection pooler between listmonk and Postgres. Transaction-mode pooling. Multiplexes 25 application connections into 100+ Postgres connections.Additional infrastructure. Doesn't work with prepared statements in session mode. Adds ~0.1ms latency.

Row-Level Contention in PostgreSQL

Contention PointScenarioWhat PostgreSQL DoesImpact
campaigns row UPDATECampaign manager updates last_subscriber_id and sent after each batch. Admin simultaneously views campaign status.Row-level lock (MVCC). Writer acquires RowExclusiveLock. Reader sees old snapshot (no block). Writers don't block readers.None — MVCC handles this perfectly. Reads see consistent snapshot.
subscribers UPSERTCSV import upserting 10K subscribers while public subscription form creates new subscribers. Both touch idx_subs_email unique index.Each INSERT/UPDATE acquires row lock. Concurrent upserts on different emails: no conflict. Same email: one waits for other's transaction to commit.Low — conflicts only on same email. Import batches in transactions, so a stuck import blocks other writes to same subscribers.
subscriber_lists INSERTTwo campaigns targeting overlapping lists. Both reading subscriber_lists to find recipients. Campaign manager only reads; doesn't write to this table during send.No contention — campaign send only SELECTs from subscriber_lists. Subscription changes (add/remove) acquire row locks on specific (subscriber_id, list_id) pairs.None — read-only during campaign processing.
campaign_views / link_clicks INSERTThousands of concurrent tracking pixel and link click INSERTs. All writing to the same tables.Append-only tables with BIGSERIAL PK. Each INSERT acquires a nextval() on the sequence (lightweight lock) + index locks. No row-level conflicts.Sequence lock is a bottleneck at very high insert rates (~50K+/sec). Fix: CACHE 100 on sequence to reduce lock acquisitions. Or batch inserts.
settings UPDATEAdmin saves settings while campaign is reading config.Settings are read at startup and cached in-memory (app.constants). DB write doesn't affect running config. Full process restart needed to pick up changes.None — decoupled by design. Config is immutable during process lifetime.
REFRESH MATERIALIZED VIEW CONCURRENTLYCron job refreshes dashboard stats while admin views dashboard.CONCURRENTLY keyword allows reads during refresh. Creates new version of matview, swaps atomically. Requires UNIQUE index on matview.None for readers. The refresh itself holds an ExclusiveLock on the matview — two concurrent refreshes would block.
bounces threshold CHECKMultiple bounce webhooks for same subscriber arrive simultaneously. Each does SELECT COUNT(*) FROM bounces WHERE subscriber_id = ? then potentially UPDATE subscribers SET status = 'blocklisted'.TOCTOU race condition. Two webhooks both count 0 bounces, both insert, both check threshold — subscriber may get N+1 bounces before blocklist triggers.Minor — subscriber gets one extra email before blocklist. Not dangerous. Fix: SELECT ... FOR UPDATE on subscriber row during bounce processing.

Go-Level Concurrency Primitives Used

PrimitiveWhere UsedWhy This ChoiceContention Characteristics
Buffered ChannelCampaign message pipeline
SMTP connection pool
SSE event fan-out
CSP model. Decouples producer from consumer. Natural backpressure. No explicit locking needed.Zero contention when buffer isn't full/empty. Contention only at boundaries: producer blocks when full (backpressure), consumer blocks when empty (idle).
atomic.AddInt64Campaign sent counter
Campaign error counter
Lock-free counter. Hardware CAS instruction. No goroutine blocking, ever.Near-zero. CAS retry on contention (extremely rare, nanoseconds). Outperforms mutex by 10-100x for simple counters.
sync.OnceTemplate compilation
One-time initialization
Thread-safe lazy init. First caller executes, all others wait then return cached result.First call: brief mutex hold during init. All subsequent calls: atomic read (zero contention). Perfect for "compute once, read forever" patterns.
database/sql PoolAll PostgreSQL accessBuilt-in connection pooling. Thread-safe. Handles connection lifecycle.Internal mutex on connRequests map. Under high concurrency, goroutines queue in FIFO order. This is the primary contention point in the entire system.
No explicit Mutexlistmonk avoids sync.Mutex and sync.RWMutex in hot paths. Prefers channels, atomics, and immutable data (restart on config change).Architectural choice: channels for coordination, atomics for counters, process restart for config. Eliminates most mutex contention by design.

Concurrent Campaign Execution — Overlapping Lists

What happens when two campaigns target overlapping subscriber lists simultaneously?

SCENARIO: Campaign A targets List 1 (500K subs), Campaign B targets List 2 (500K subs)
          200K subscribers are on BOTH lists.

Campaign A goroutines:        Campaign B goroutines:
  1 batch producer               1 batch producer
  10 send workers                10 send workers
  ──────────────                 ──────────────
  11 goroutines                  11 goroutines    // Total: 22 goroutines

What happens to the 200K overlapping subscribers?

  ✓ Both campaigns independently fetch and send to them.
  ✓ The subscriber receives BOTH emails (intentional — different campaigns).
  ✓ No deduplication across campaigns (by design).
  ✓ No row locks conflict — campaigns only SELECT from subscriber_lists.
  ✓ Each campaign has independent last_subscriber_id cursor.
  ✓ Each campaign has independent sent counter (atomic).

Contention points:

  1. DB Pool: 22 goroutines competing for 25 connections.
     Batch fetches: 2 long-running SELECTs.
     Workers doing progress UPDATEs: occasional contention.
     Mitigation: workers mostly wait on SMTP (I/O bound), not DB.

  2. SMTP Pool: 20 workers sharing max_conns connections.
     If max_conns=10, workers queue for connections.
     Mitigation: each campaign can use different named SMTP servers.

  3. Rate Limiter: GLOBAL rate limit (message_rate) shared across campaigns.
     Two campaigns each wanting 100 msg/sec with rate=100 → each gets ~50.
     Campaign throughput halves with each concurrent campaign.

  4. No campaign-level resource isolation.
     A slow campaign (complex template) slows all campaigns
     by holding DB connections longer.

Race Conditions — Known & Potential

Race ConditionSeverityDescriptionFix
Bounce TOCTOULowTwo bounce webhooks arrive for same subscriber simultaneously. Both read count=0, both insert, both check threshold — neither triggers blocklist because each sees count=1 when threshold=2. Next bounce will trigger it.SELECT ... FOR UPDATE on subscriber row. Or INSERT + SELECT COUNT in a single transaction with SERIALIZABLE isolation.
Campaign Status TransitionLowAdmin clicks "Pause" while manager is updating sent count. PostgreSQL row-level MVCC prevents corruption — the UPDATE acquires a row lock. But the pause might not take effect until the current batch completes.Current behavior is acceptable. Campaign checks for pause signal between batches. Near-instant in v3.0.0 rewrite.
Duplicate SubscriptionNoneTwo form submissions for same email at the same time. INSERT ... ON CONFLICT (email) DO UPDATE handles this atomically — PostgreSQL serializes at the unique index level.Already handled by DB unique constraint + upsert. No application-level fix needed.
Matview Concurrent RefreshLowTwo cron triggers fire simultaneously (unlikely but possible). REFRESH MATERIALIZED VIEW CONCURRENTLY acquires ExclusiveLock — second call blocks until first completes.Not harmful — just wastes time. Could add application-level lock (pg_advisory_lock) to skip if already running.
Template Hot-SwapNoneAdmin updates template while campaign is mid-send using that template. Campaign workers hold reference to compiled template in memory. Template recompile creates new object; old one is GC'd after campaign finishes.Safe — Go's GC keeps old template alive as long as goroutines reference it. New campaigns get the updated template. In-flight campaign uses the old version.
Subscriber Delete During SendNoneAdmin deletes subscriber while campaign is sending to them. Campaign already fetched the batch — subscriber data is in memory. DB INSERT for tracking: subscriber_id FK SET NULL handles gracefully.Already handled by schema design. ON DELETE SET NULL on campaign_views and link_clicks preserves analytics data.

Interview Answer: "How Do You Handle Contention in a Concurrent System?"

Layer the answer by resource type:

1. Application-level: listmonk uses channels (not mutexes) for coordination — the CSP model. The campaign engine is a bounded worker pool where backpressure is built into the channel. Counters use lock-free atomics. Config is immutable — changes trigger a full process restart, completely sidestepping read-write contention. This is an architectural decision to prefer simplicity over fine-grained locking.

2. Connection pool: The single shared DB pool (25 connections) is the primary contention point. All subsystems — HTTP handlers, campaign engine, bounce processor, cron jobs — compete for the same connections. Under high load, goroutines queue behind the pool's internal mutex. The fix is pool isolation: separate pools for campaign engine vs HTTP handlers, or a priority queue that reserves connections for critical paths.

3. Database-level: PostgreSQL MVCC eliminates most row-level contention — readers never block writers, writers never block readers. The real contention is on sequences (BIGSERIAL PK on high-insert tables like link_clicks) and unique index locks during concurrent upserts. Mitigate with sequence caching (CACHE 100) and batch inserts.

4. Cross-campaign: The global rate limiter is shared across all campaigns — two concurrent campaigns each get half the throughput. There's no per-campaign resource isolation. At scale, you'd partition resources per campaign: dedicated worker pools, separate rate limiters, and named SMTP servers for high-priority campaigns.

Interview Framing: "The system handles 1M requests across different hot paths. The campaign send is I/O-bound on SMTP with intentional rate limiting. Tracking pixels and link clicks are the surprise high-throughput paths — they're write-heavy, user-facing, and scale linearly with subscriber count. The architectural insight is that these are append-only writes to analytics tables — perfect candidates for buffered batch inserts, table partitioning, and async processing. The Go HTTP server and goroutine model are never the bottleneck; PostgreSQL write throughput is."

Resilience & High Availability

Failure Modes & Recovery

FailureImpactRecovery Mechanism
App Crash Mid-CampaignCampaign paused at last batchlast_subscriber_id persisted per-batch. Auto-resumes on restart.
SMTP Server DownMessages fail for that serverPer-message retry. Auto-pause at error threshold. Multiple SMTP fallback.
DB Connection LostAll operations failsqlx auto-reconnect via pool. max_open/max_idle/max_lifetime.
Bounce FloodSender reputation at riskAuto-blocklist after N hard bounces. Configurable per bounce type.
Config Change NeededRequires restartSIGHUP hot restart: graceful shutdown → process self-replace. No downtime.
Campaign StuckNever finishes--passive flag for read-only. Admin force status change via API.

What listmonk Does NOT Have (HA Gaps)

No Multi-Node Active-Active
Single process sends campaigns. --passive mode provides read replicas but only one sender.
No Built-in DB Replication
Relies on PostgreSQL's native streaming replication or managed services (RDS, Cloud SQL).
No Distributed Locking
Campaign ownership is implicit (single process). No Redis/etcd coordination.
No Circuit Breaker
SMTP failures use simple retry + threshold. No exponential backoff or circuit breaker.

Data Integrity Guarantees

PostgreSQL provides ACID transactions for all subscriber/list/campaign mutations. FK constraints with CASCADE ensure referential integrity. ENUM types enforce valid state transitions. The sent counter in v3.0.0 is exact (not approximated), ensuring no duplicate or missed sends on pause/resume.

Idempotent Upgrades

./listmonk --upgrade is idempotent. Running it multiple times has no side effects. Migrations use version checks and are applied sequentially. Critical for automated deployment pipelines (Kubernetes rollouts, CI/CD).

Design Principles & Patterns

Unix Philosophy
Single binary that does one thing well. PostgreSQL for storage, SMTP for delivery. Compose with external tools (Caddy, Nginx, SES, S3).
Convention over Configuration
Sensible defaults for everything (batch_size=1000, concurrency=10, port=9000). Override via config, env, or UI. Works out of the box.
Separation of Concerns
HTTP handlers know nothing about SQL. Core knows nothing about HTTP. Manager knows nothing about SMTP internals. Each layer has one job.
Strategy Pattern
Messenger interface (SMTP, HTTP webhook), Media provider (filesystem, S3). Swap implementations without changing callers. Open/Closed principle.
State Machine (DB-Enforced)
Campaign lifecycle: draft→running→scheduled→paused→cancelled→finished. PostgreSQL ENUMs prevent invalid transitions at storage level.
Fail-Safe Defaults
Privacy tracking off by default. Unsubscribe headers on. Blocklist on hard bounce. Conservative error thresholds. Safe by default.
Idempotent Operations
DB upgrades are idempotent. Subscriber import upserts. Campaign resume replays from checkpoint. Safe to retry any operation.
12-Factor App Compliance
Config in env vars. Stateless process (state in Postgres). Logs to stdout. Port binding. Dev/prod parity via Docker.

GoF & Architectural Patterns Catalog

PatternCategoryWhere in listmonk
StrategyBehavioralMessenger interface, Media provider interface
ObserverBehavioralSSE events bus for real-time UI updates
Template MethodBehavioralGo html/template with Sprig function injection
Producer-ConsumerConcurrencyCampaign batch fetch → channel → worker goroutines
Connection PoolCreationalsqlx DB pool, SMTP connection pool per server
RepositoryStructuralCore layer wraps all DB access behind domain methods
FacadeStructuralApp struct provides single entry point to all subsystems
Middleware ChainStructuralEcho middleware: auth → CORS → logging → handler
Materialized ViewDataPre-aggregated dashboard stats, refreshed on cron
Cursor PaginationDataKeyset pagination via last_subscriber_id bookmark

Non-Functional Requirements — How listmonk Handles Them

NFRs are the make-or-break qualities that interviewers probe after your functional design. listmonk is an excellent case study because it's a production system handling millions of messages — every NFR decision below was battle-tested, not theoretical.

Security

NFR ConcernImplementationCode Reference
AuthenticationThree modes: password login (bcrypt-hashed), OIDC/SSO (Google, Microsoft, Apple), API tokens (username:token header). Sessions stored in PostgreSQL via simplesessions.internal/auth/auth.go, cmd/auth.go
Authorization (RBAC)Role-based access control with two role types: user roles (global permissions) and list roles (per-list permissions). Permissions defined in permissions.json. Each API endpoint checks permissions via middleware. Users can have different access levels per list.internal/auth/, roles table, permissions.json
CSRF ProtectionCookie-based sessions with SameSite attribute. OIDC flows use state parameter for CSRF prevention. Admin UI is SPA (same-origin API calls).cmd/auth.go
XSS PreventionCampaign preview iframes sandboxed. Custom CSS/JS injection scoped to admin/public separately. Go's html/template auto-escapes by default. v5.0.2 patched stored XSS via Sprig template injection.cmd/admin.go, security advisories
Secret ManagementPasswords masked in UI responses with characters. Backend merges existing passwords via UUID matching when masked values are submitted. SMTP passwords, S3 keys, OIDC secrets all masked.cmd/settings.go
CORSConfigurable allowed origins via security.cors_origins. Supports wildcard * or specific URLs. Validated and normalized on save.cmd/settings.go:261-280
CAPTCHATwo providers: ALTCHA (proof-of-work, privacy-friendly, no external calls) and hCaptcha. Protects public subscription forms from bot abuse.internal/captcha/
2FATOTP-based two-factor authentication for user accounts. Stored as twofa_type ENUM and twofa_key in users table.users table, cmd/auth.go
Sprig Template HardeningDangerous Sprig functions (env, expandenv) removed to prevent environment variable leakage from templates. Patched in v5.0.2 after CVE.internal/manager/manager.go

Privacy & GDPR Compliance

NFR ConcernImplementationConfig Key
Tracking Controlsprivacy.individual_tracking (off by default) controls per-subscriber open/click attribution. privacy.disable_tracking turns off all tracking entirely. When disabled, tracking pixels and link wrapping are skipped.privacy.individual_tracking, privacy.disable_tracking
Self-Service Data ExportSubscribers can export their own data (profile, subscriptions, campaign views, link clicks) via public pages. Exportable fields configurable via privacy.exportable.privacy.allow_export, privacy.exportable[]
Self-Service Data WipeSubscribers can request complete deletion of their data. Cascades via FK constraints to remove all associated records.privacy.allow_wipe
Self-Service BlocklistSubscribers can blocklist themselves, preventing any future emails. Status set to 'blocklisted' in DB.privacy.allow_blocklist
Subscription PreferencesSubscribers can manage their own list subscriptions via public preference pages.privacy.allow_preferences
Unsubscribe HeadersRFC 8058 List-Unsubscribe header added to all campaign emails by default. Required by Gmail/Yahoo for bulk senders.privacy.unsubscribe_header (default: true)
Domain FilteringBlocklist and allowlist for email domains. Supports wildcard patterns (*.example.com). Applied during subscription and import. Prevents abuse from disposable email domains.privacy.domain_blocklist[], privacy.domain_allowlist[]
IP RecordingOpt-in IP address recording on subscription confirmation. Off by default for privacy.privacy.record_optin_ip (default: false)
Data OwnershipSelf-hosted = you own all data. No third-party analytics. No external tracking pixels. PostgreSQL under your control.Architecture decision

Observability & Logging

NFR ConcernImplementationGap / Notes
LoggingStandard Go log.Logger to stdout. Structured log lines with timestamps. Campaign manager logs start/finish/errors per campaign with subscriber IDs.No structured JSON logging. No log levels (debug/info/warn/error). Basic but functional.
Dashboard AnalyticsMaterialized views provide: subscriber counts by status, campaign stats by status, 30-day link click and view trends, per-list subscriber breakdowns. Refreshed on cron.mat_dashboard_counts, mat_dashboard_charts, mat_list_subscriber_stats
Campaign TrackingPer-campaign: sent count, open rate (pixel tracking), click-through rate (link wrapping), bounce count by type. Real-time progress via SSE events.campaign_views, link_clicks, bounces tables
Health ChecksHTTP server responds on configured address. About endpoint exposes version, Go runtime stats (CPU, memory alloc, OS memory). DB connectivity implicit in operations.No dedicated /health or /readyz endpoint. Would need reverse proxy health checks.
Real-Time EventsSSE (Server-Sent Events) bus via internal/events/. Frontend receives live campaign progress updates, import status, notifications without polling.internal/events/events.go
Metrics / APMNot built-in. No Prometheus metrics endpoint, no OpenTelemetry instrumentation.Gap — would need external instrumentation for production monitoring at scale.

Operability & Deployment

NFR ConcernImplementation
Zero-Downtime Config ReloadSIGHUP signal triggers graceful shutdown (HTTP drain, campaign flush, DB close) then syscall.Exec() self-replaces process. Campaigns in progress: sets needsRestart flag and shows warning banner — admin restarts later.
Idempotent Migrations--upgrade is safe to run multiple times. Version-checked sequential migrations. Critical for CI/CD pipelines and Kubernetes rolling deployments.
Single Binary DistributionAll assets (frontend, SQL, i18n, templates) embedded via stuffbin. One binary + one config.toml + one PostgreSQL. No Node.js, no file dependencies.
Docker / KubernetesOfficial Docker image on DockerHub. docker-compose.yml included. Community Helm chart available. Environment variable configuration via LISTMONK_* prefix.
Passive Mode--passive flag runs the app without processing campaigns. Useful for read-only API replicas behind a load balancer while one instance handles sending.
Systemd IntegrationShips with listmonk.service and listmonk@.service (template unit for multiple instances). Production-ready process management.
Backup StrategyAll state in PostgreSQL — standard pg_dump for backups. Media on filesystem or S3 (backed up via provider tools). No application-level backup mechanism needed.

Performance (as NFR)

NFR ConcernImplementation
Throughput TuningThree knobs: app.concurrency (worker goroutines), app.message_rate (msgs/sec), app.batch_size (DB fetch size). All configurable via UI without code changes.
Resource Efficiency57MB peak RAM for 7M+ emails. Fractional CPU. Go's goroutines are ~2KB each vs threads at ~1MB. Connection pooling avoids socket exhaustion.
Slow Query Mitigationapp.cache_slow_queries enables cron-refreshed materialized views for expensive dashboard aggregations. Configurable interval (default: daily at 3 AM).
Connection LimitsDB: max_open=25, max_idle=25, max_lifetime=300s. SMTP: max_conns per server with idle_timeout and wait_timeout. Prevents resource exhaustion.
Streaming OperationsCSV subscriber export streams rows as fetched (no buffering entire dataset). Import processes in configurable batch sizes. Constant memory for large operations.

Internationalization (i18n)

NFR ConcernImplementation
Multi-Language SupportJSON language files in i18n/*.json. Loaded via stuffbin embedded filesystem. Backend uses internal/i18n package with T() and Ts() (with substitutions) functions.
Frontend LocalizationVue.js admin dashboard uses vue-i18n with the same JSON language files served from /admin/static/. Dynamically loaded based on user language setting.
Public Page LocalizationSubscription forms, unsubscribe pages, preference pages all localized. System email templates (opt-in confirmation, notifications) use L() function for translations.
Date/Time LocalizationDay.js configured with localized relative time strings. Absolute dates use translated day/month names from i18n files.

Maintainability & Testability

NFR ConcernImplementation
Code Organizationinternal/ packages enforce encapsulation. Layered architecture (handlers → core → DB) prevents spaghetti dependencies. Each package has a single responsibility.
SQL as First-Class ArtifactAll queries in version-controlled .sql files. No generated SQL. No ORM magic. Reviewable, diffable, optimizable independently of Go code.
Schema MigrationsVersioned migrations in internal/migrations/. Each version file (e.g., v5.1.0.go) contains the delta. Applied sequentially with version checks.
TestabilityCore business logic has zero HTTP dependencies — can be unit tested with a DB mock. Interface-based design (Messenger, Media provider) enables test doubles.
Dev Environment.devcontainer/ config for VS Code dev containers. Makefile with make dist build target. Docker Compose for local Postgres. Frontend hot-reload with Vue CLI.

Error Handling & Fault Tolerance

ConcernMechanismHow It Works
Per-Message RetrySMTP max_msg_retries (default: 2)Each failed message is retried N times before being counted as a send error. Retry happens immediately within the same worker goroutine. Failed messages don't block the channel — other workers continue sending.
Campaign Error Thresholdapp.max_send_errors (default: 1000)Cumulative send errors tracked per campaign via atomic counter. When threshold is hit, campaign auto-pauses — prevents burning through an entire list when SMTP is misconfigured or provider is throttling. Admin can investigate and resume.
SMTP Connection FailureConnection pool + wait_timeoutIf a connection dies mid-send, the pool creates a new one. wait_timeout prevents indefinite blocking waiting for a free connection. idle_timeout closes stale connections proactively.
Bounce ClassificationThree-tier: soft / hard / complaintEach bounce type has configurable count and action. Soft bounce: ignore until threshold (transient issues). Hard bounce: blocklist after 1 (permanent — invalid address). Complaint: blocklist after 1 (spam report). Actions: none, blocklist, delete.
Crash RecoveryCheckpoint via last_subscriber_idAfter each batch, campaign progress is persisted to DB. On process crash/restart, campaigns with status='running' auto-resume from the last checkpoint. No duplicate sends because keyset cursor skips already-processed subscribers.
Subscriber Import ErrorsPer-row validation + summaryCSV import validates each row (email format, domain blocklist, required fields). Invalid rows are skipped with error details in import log. Valid rows are upserted. Import can be stopped/retried without corrupting data.
Template Render ErrorsPer-subscriber isolationIf a template fails to render for a specific subscriber (e.g., missing attribute), that single message is marked as error. Other subscribers are unaffected. Campaign continues processing.
DB Connection ExhaustionPool limits + lifetimemax_open=25 hard-caps total connections. max_lifetime=300s recycles connections preventing stale state. max_idle=25 keeps warm connections ready. sqlx auto-reconnects on transient failures.
Graceful DegradationCampaign pause + passive modeNo circuit breaker pattern, but the error threshold serves a similar purpose: after enough failures, the system stops trying (pauses campaign). --passive mode allows serving the UI/API while campaign sending is disabled.

What's Missing in Error Handling (Interview Talking Points)

No Exponential Backoff
Retries are immediate, not with increasing delays. At scale, this can worsen SMTP server overload. Fix: implement backoff with jitter (time.Sleep(baseDelay * 2^attempt + rand)).
No Circuit Breaker
If SMTP is down, all workers keep hitting it until error threshold. Fix: per-SMTP circuit breaker (closed → open → half-open) that short-circuits after N consecutive failures.
No Dead Letter Queue
Failed messages are counted but not persisted for later retry. Once max_send_errors is hit, those messages are lost. Fix: DLQ table to store failed messages for manual/automated retry.
No Per-SMTP Health
All SMTP servers in the "email" pool are treated equally. A degraded server gets the same traffic as a healthy one. Fix: weighted round-robin with health scoring.

Go-Specific Scalability Patterns

PatternGo ImplementationWhy It Scales
Goroutine Worker PoolFixed-size pool (default 10) consuming from a buffered channel. for msg := range msgChan { ... }Goroutines are ~2-4KB stack (vs ~1MB threads). 10 goroutines handle millions of messages. Channel provides natural backpressure — if workers are busy, producer blocks on channel send. No unbounded goroutine creation.
Channel-Based Flow ControlBuffered channel between batch producer and send workers. Buffer size = batch_size.Channels are Go's CSP primitive. They handle synchronization, ordering, and backpressure without explicit locks. If SMTP is slow, channel fills up, producer pauses DB fetching automatically. Self-regulating.
Atomic Countersatomic.AddInt64(&sent, 1) for campaign progressLock-free concurrent increment. No mutex contention across 10+ workers updating the same counter. Hardware CAS instruction. O(1) regardless of worker count.
sync.Once for CachingTemplate compilation cached via sync.Once. Recompiled only on explicit template update.Thread-safe lazy initialization. First call compiles, subsequent calls return cached result. Zero allocation after first call. Critical for hot-path template rendering.
Context Cancellationcontext.Context propagated from HTTP request → Core → DB queryRequest-scoped timeouts and cancellation. If a client disconnects, the entire call chain cancels — DB query aborted, goroutine freed. Prevents resource leaks on slow queries or abandoned requests.
Connection Pool (database/sql)Go's sql.DB is already a connection pool. sqlx wraps it. max_open=25.Pool manages connection lifecycle, reuse, health checks. Concurrent goroutines share the pool safely. Idle connections kept warm. Lifetime rotation prevents stale TCP connections.
Embedded Filesystemstuffbin.FileSystem embeds all assets into the binary at compile time.Zero disk I/O for serving frontend assets. Memory-mapped access. No file descriptor overhead. Eliminates deployment complexity (no asset directory sync). Scales vertically with zero ops burden.
Streaming ResponseCSV export: csv.Writer wrapping http.ResponseWriter. Rows written as DB cursor advances.O(1) memory for exporting 10M subscribers. No buffering entire result set. HTTP chunked transfer encoding. Client sees data immediately. DB cursor keeps server-side state.

Go Scalability Mental Model for Interviews: listmonk proves that a single Go process with goroutine pools + channels + connection pooling can handle millions of operations. The key insight: Go's concurrency model maps perfectly to the producer-consumer pattern. The producer is I/O-bound (DB fetch), workers are I/O-bound (SMTP send), and channels decouple them. You don't need Kafka for this workload — Go channels are an in-process message queue. You add Kafka when you need multi-node distribution or replay guarantees.

High Availability (HA) — What Exists & What Doesn't

HA DimensionCurrent StateHow You'd Improve It (Interview Answer)
Process Availability✓ Hot restart via SIGHUP + syscall.Exec(). Systemd auto-restart on crash. Docker restart policies. Millisecond startup time.Add liveness/readiness probes for K8s. Currently no /healthz — requests to any endpoint implicitly confirm liveness.
Database HA◐ Relies on external PostgreSQL HA (RDS Multi-AZ, Patroni, pg_auto_failover). Connection pool handles transient failures.For self-hosted: Patroni + pgBouncer. For cloud: RDS/Cloud SQL with read replicas. listmonk's --passive mode can point to a read replica.
Campaign Continuity✓ Checkpoint-based resume. Campaigns survive process restart. last_subscriber_id persisted per batch. Status remains 'running' in DB.For zero-gap: WAL-based approach — log each message to a journal before sending, mark complete after ACK. Current approach has a small window of potential duplicates within a batch.
Horizontal Scaling (Read)--passive mode serves API/UI without processing campaigns. Multiple passive instances behind a load balancer.Add session stickiness or shared session store (Redis/PostgreSQL sessions already in DB). API token auth is stateless and scales naturally.
Horizontal Scaling (Write/Send)✗ Single sender process. No campaign sharding, no distributed locking, no work stealing.Campaign partitioning: assign subscriber ID ranges to sender nodes. Distributed lock (etcd/Redis) for campaign ownership. Message queue (Kafka/SQS) between fetch and send.
SMTP Failover◐ Multiple SMTP servers can be configured. Default "email" messenger load-balances across all enabled servers. But no active health checking or automatic failover — if one server is slow, it still gets traffic.Add per-server health scoring. Circuit breaker per SMTP. Weighted routing based on success rate. Remove unhealthy servers from pool temporarily.
Data Durability✓ All state in PostgreSQL with ACID guarantees. FK constraints prevent orphaned data. WAL provides crash consistency. Standard pg_dump for backups.Point-in-time recovery via WAL archiving. Cross-region replication for DR. Media on S3 with cross-region replication.
Zero-Downtime Deploys✓ Idempotent --upgrade. SIGHUP hot restart. Campaigns auto-resume post-restart. Docker rolling update compatible.Blue-green deployment: run new version in passive mode, verify, then switch active sender. K8s rolling update with readiness probe.

HA Architecture for 99.9% Uptime — What You'd Propose in an Interview

┌──────────────┐     ┌──────────────────────────────────────────────┐
│   Load       │     │  Application Tier                            │
│   Balancer   │────▶│  ┌─────────────┐  ┌─────────────┐           │
│   (Nginx/    │     │  │ listmonk    │  │ listmonk    │           │
│    ALB)      │     │  │ (active     │  │ (passive    │           │
│              │     │  │  sender)    │  │  read-only) │  × N      │
└──────────────┘     │  └──────┬──────┘  └──────┬──────┘           │
                     └─────────┼────────────────┼──────────────────┘
                               │                │
                     ┌─────────▼────────────────▼──────────────────┐
                     │  Database Tier                               │
                     │  ┌──────────┐    ┌──────────┐               │
                     │  │ Postgres │───▶│ Postgres │               │
                     │  │ Primary  │    │ Replica  │  (streaming)  │
                     │  └──────────┘    └──────────┘               │
                     └─────────────────────────────────────────────┘

Active sender: processes campaigns, handles writes
Passive instances: serve UI/API reads, handle public pages
Primary DB: all writes, campaign state
Replica DB: passive instances read from here

NFR Summary Matrix — Interview Quick Reference

When an interviewer asks "how would you handle X?" — point to these concrete implementations:

✓ Strong
Security (RBAC, OIDC, 2FA, CAPTCHA), Privacy/GDPR (full self-service), Operability (single binary, hot restart, idempotent upgrades), i18n, Performance tuning, Data integrity (ACID, FK, ENUMs), Crash recovery (checkpoint + auto-resume), Go concurrency (goroutine pool + channels + atomics), Zero-downtime deploys, Data durability
◐ Adequate
Logging (stdout, no levels), Error handling (retry + threshold auto-pause, but no backoff), HA for reads (passive mode), SMTP failover (multi-server but no health scoring), DB HA (relies on external Postgres replication), Testability (interface-based but limited visible test coverage)
✗ Gaps
No Prometheus/OpenTelemetry, no structured logging, no /healthz endpoint, no circuit breaker, no exponential backoff, no dead letter queue, no distributed tracing, no horizontal send scaling, no per-API-consumer rate limits, no per-SMTP health scoring

The gaps are intentional trade-offs for simplicity. In an interview, acknowledge them and propose solutions: "listmonk prioritizes operational simplicity — a single binary serving 7M+ emails. At scale, I'd add: /healthz endpoint for K8s probes, Prometheus metrics via expvar or echo middleware, structured logging with slog (Go 1.21+), circuit breakers per SMTP server using sony/gobreaker, exponential backoff with jitter on retries, and a dead letter table for failed messages. For horizontal send scaling, I'd introduce Kafka between the batch producer and send workers, with campaign partition assignment via etcd distributed locks."

System Design Interview Cheat Sheet

Use listmonk as a reference when answering questions about designing email/notification systems, producer-consumer pipelines, or self-hosted SaaS alternatives.

If Asked: "Design a Newsletter/Email System"

1. Data Model: Subscribers (with JSONB attrs for segmentation), Lists (many-to-many via junction table with subscription status), Campaigns (state machine with 6 states), Templates (Go/Jinja templating).
2. Send Pipeline: Producer fetches batches via keyset pagination (WHERE id > cursor). Workers consume from buffered channel. Rate limiter (token bucket + sliding window). SMTP connection pool per server.
3. Tracking: Pixel tracking for opens (1x1 transparent PNG). Link wrapping for click tracking. Privacy toggle to disable/anonymize. Expression indexes on DATE for time-series queries.
4. Bounce Handling: Webhook receivers for SES/SendGrid/Postmark. POP/IMAP mailbox scanning. Auto-blocklist on hard bounce. Configurable thresholds per bounce type.
5. Scale: Materialized views for dashboards. Batch processing amortizes DB cost. Connection pooling. Single process handles 7M+ emails. For 100M+: shard DB, add message queue, horizontal senders.
6. Reliability: Checkpoint-based crash recovery (last_subscriber_id). Idempotent resume. Error threshold auto-pause. ACID transactions for mutations. Graceful hot restart via SIGHUP.

Key Talking Points

TopicWhat to Saylistmonk Reference
Why PostgreSQL?JSONB for flexible schemas without NoSQL complexity. ENUMs for state machines. Materialized views for read optimization. ACID for correctness.subscribers.attribs JSONB, campaign_status ENUM, mat_dashboard_counts
Why not an ORM?Named SQL via goyesql gives full PostgreSQL feature access. No N+1 queries. SQL is reviewable, optimizable. Complex queries don't fit ORM patterns.queries/*.sql loaded at startup
Cursor vs OffsetOFFSET scans N rows then discards. Cursor (keyset) uses WHERE id > X which hits the index directly. O(1) vs O(N). Critical at scale.campaigns.last_subscriber_id
Rate LimitingToken bucket for steady rate. Sliding window for burst control. Per-connection limits for SMTP backpressure. Layered approach.message_rate, sliding_window, max_conns
Single Binary Trade-offsPro: zero-dep deployment, fast startup, simple ops. Con: vertical scaling only. Good for 80% of use cases.stuffbin embeds all assets
When to Add a QueueCurrent: in-process channel. At scale: Kafka/SQS decouples fetch from send, enables multi-node senders, provides replay.Campaign manager uses Go channels

Quick-Reference: Numbers to Know

57 MB
Peak RAM for 7M+ emails
1000
Default batch size
10
Default worker concurrency
25
Default DB connection pool
12
PostgreSQL ENUM types
3
Materialized views
6
Campaign states
~12
Database tables

Related System Design Problems

If you study listmonk deeply, you can answer variations of these interview questions: Design a notification system (email/SMS/push), Design a mailing list manager, Design a campaign analytics platform, Design a self-hosted SaaS tool, Design a producer-consumer pipeline with rate limiting, Design a system with crash recovery and exactly-once processing, How would you handle millions of concurrent email sends, Design a multi-tenant newsletter platform.