What is listmonk?
listmonk is a high-performance, self-hosted newsletter and mailing list manager. It ships as a single Go binary with an embedded Vue.js frontend, backed by PostgreSQL. It has demonstrated production workloads of 7+ million emails per campaign with peak RAM of ~57MB and fractional CPU usage. The project has 19k+ GitHub stars and is built by Kailash Nadh (CTO of Zerodha, India's largest stock broker).
Why Study This for System Design?
Tech Stack
| Layer | Technology | Role |
|---|---|---|
| Language | Go | Backend, CLI, campaign engine |
| Web Framework | labstack/echo v4 | HTTP routing, middleware |
| Database | PostgreSQL | All persistent state, JSONB attributes |
| DB Driver | jmoiron/sqlx + lib/pq | SQL execution, struct scanning |
| SQL Management | knadh/goyesql | Named SQL queries from .sql files |
| Config | knadh/koanf | Multi-source: TOML → env vars → DB |
| Frontend | Vue.js 3 + Buefy | Admin SPA dashboard |
| Asset Embed | knadh/stuffbin | Embed frontend/SQL/i18n into binary |
| Auth | Sessions + OIDC + RBAC | Cookie sessions, SSO, role perms |
| Templating | html/template + Sprig | Dynamic email templates, 100+ funcs |
System Architecture Diagram
Hover over each component to see its responsibilities:
Layered Architecture
listmonk follows a clean 4-layer architecture. Each layer has a single responsibility and communicates only with adjacent layers.
Layer 1: HTTP Handlers (cmd/*.go)
Thin handlers that parse HTTP requests (path params, query strings, JSON bodies), call the Core layer, and serialize responses. No business logic here. Each handler is a method on the App struct which holds references to all subsystems. Echo framework provides routing, middleware, and context.
Layer 2: Core Business Logic (internal/core/)
All domain operations: CRUD for subscribers, lists, campaigns, templates. The Core struct wraps the DB and query runner. This layer is pure Go with zero HTTP dependencies — it could be called from CLI, tests, or any other interface. It enforces validation, permission checks, and domain invariants.
Layer 3: Campaign Manager (internal/manager/)
The concurrent campaign processing engine. Runs as a long-lived goroutine that polls the DB for active campaigns. It owns the entire send pipeline: batch fetching, template rendering, rate limiting, worker pool dispatch, progress tracking, error handling. Completely rewritten in v3.0.0 for near-instant pause/cancel and lossless counting.
Layer 4: Data Layer (queries/*.sql + PostgreSQL)
All SQL lives in .sql files, loaded at startup via goyesql. No ORM. This gives full control over query optimization, PostgreSQL-specific features (JSONB, arrays, materialized views), and makes SQL reviewable and versionable. The sqlx library provides struct scanning.
Key Architectural Decision: The App Struct
type App struct {
core *core.Core // Business logic layer
fs stuffbin.FileSystem // Embedded filesystem (assets, SQL, i18n)
db *sqlx.DB // PostgreSQL connection pool
queries *models.Queries // Pre-loaded named SQL statements
constants *constants // Runtime config snapshot
manager *manager.Manager // Campaign processing engine
importer *subimporter.Importer // CSV/bulk import processor
notifs *notifs.Notifs // Admin email notifications
i18n *i18n.I18n // Internationalization
bounceProc *bounce.Manager // Bounce email processor
captcha *captcha.Captcha // ALTCHA/hCaptcha
auth *auth.Auth // Sessions + RBAC + OIDC
events *events.Events // SSE event bus
paginator *paginator.Paginator // Cursor/offset pagination
log *log.Logger
}
This is Go's idiomatic alternative to dependency injection containers. All subsystems are initialized in init.go and wired together via this struct. Handlers access them via a.core, a.manager, etc.
Database Schema (ER Diagram)
PostgreSQL schema with ~12 tables, JSONB for extensibility, materialized views for analytics:
Key Database Design Decisions
1. Custom ENUM Types (12 types)
PostgreSQL ENUMs enforce valid states at the DB level: campaign_status has 6 states (draft → running → scheduled → paused → cancelled → finished), subscriber_status has 3 (enabled, disabled, blocklisted), subscription_status tracks per-list relationship (unconfirmed, confirmed, unsubscribed). This is a state machine enforced by the database, not application code.
2. JSONB for Flexible Attributes
subscribers.attribs stores arbitrary key-value data as JSONB, enabling schema-less subscriber segmentation. You can query attribs->>'city' = 'Atlanta' with GIN indexes. The settings table stores all app config as JSONB key-value pairs, enabling runtime configuration changes through the UI without schema migrations.
3. Junction Table Pattern (subscriber_lists)
Many-to-many with a composite primary key PK(subscriber_id, list_id). The junction table carries its own state (subscription_status) and metadata (meta JSONB). Separate indexes on each FK column for efficient queries in both directions.
4. Keyset Pagination Columns
campaigns.last_subscriber_id and max_subscriber_id enable cursor-based pagination for campaign sending. Instead of OFFSET N (O(N) scan), it uses WHERE id > last_subscriber_id ORDER BY id LIMIT batch_size (O(1) via index). Critical for sending campaigns to millions of subscribers.
5. Materialized Views for Dashboard
Three materialized views precompute expensive aggregations: mat_dashboard_counts (subscriber/list/campaign totals), mat_dashboard_charts (30-day click/view trends), mat_list_subscriber_stats (per-list subscriber counts by status). Refreshed on cron schedule via REFRESH MATERIALIZED VIEW CONCURRENTLY. Orders-of-magnitude speedup for large databases.
6. Soft References & Denormalization
campaign_views.subscriber_id is nullable with ON DELETE SET NULL — if a subscriber is deleted, their view records persist for analytics. campaign_lists.list_name denormalizes the list name so campaign history survives list deletion. Preserves historical accuracy while allowing entity cleanup.
7. Indexing Strategy
Strategic indexes on: email (case-insensitive unique via LOWER(email)), status columns (for filtered scans), composite (id, status) for campaign batch fetching, DATE(created_at) expression indexes for time-series analytics. Partial unique index on templates(is_default) WHERE is_default = true ensures only one default template.
8. UUID + Serial ID Dual Identity
Internal operations use fast integer serial IDs for joins and pagination. External/public-facing operations use UUIDs (subscriber unsubscribe links, campaign archives, media references). Best of both worlds: performance internally, security externally (UUIDs aren't guessable).
Campaign Processing Pipeline
The campaign engine is the heart of listmonk. Rewritten in v3.0.0 for lossless operation:
Concurrency Model Deep Dive
Producer-Consumer with Channels
// Simplified mental model of the campaign engine
// Producer: fetches subscriber batches from DB
go func() {
for {
batch := db.Query("SELECT ... WHERE id > ? LIMIT ?",
lastSubID, batchSize)
for _, sub := range batch {
msg := renderTemplate(campaign, sub)
msgChan <- msg // Send to worker pool
}
lastSubID = batch[len(batch)-1].ID
updateProgress(campaign, lastSubID)
}
}()
// Consumer pool: N goroutines sending messages
for i := 0; i < concurrency; i++ {
go func() {
for msg := range msgChan {
rateLimiter.Wait() // Token bucket
err := messenger.Push(msg) // SMTP/HTTP
if err != nil {
handleRetry(msg, err)
}
atomic.AddInt64(&sent, 1)
}
}()
}
Rate Limiting Strategies
| Strategy | Config | Behavior |
|---|---|---|
| Fixed Rate | app.message_rate = 10 | Max 10 msgs/second globally. Token bucket. |
| Sliding Window | sliding_window + duration + rate | Max N messages within rolling time window (e.g., 10000/hour). |
| Per-SMTP Limits | smtp[].max_conns = 10 | Connection pool per SMTP server. Backpressure via channel blocking. |
| Error Threshold | app.max_send_errors = 1000 | Auto-pause campaign after N cumulative send failures. |
Crash Recovery & Resumption
The campaign stores last_subscriber_id after each batch. On restart, campaigns with status='running' are automatically resumed from the last checkpoint. The v3.0.0 rewrite ensures every single message is counted (not approximated), making pause/resume lossless. The to_send field is computed at campaign start as the total subscriber count for the target lists.
SMTP Connection Pool
Each configured SMTP server maintains a pool of max_conns persistent TCP connections. Connections are reused across messages (SMTP pipelining). idle_timeout and wait_timeout control connection lifecycle. Multiple SMTP servers can be load-balanced under the default "email" messenger, or targeted individually by naming them.
Go Patterns & Best Practices
// Messenger interface (Strategy pattern)
type Messenger interface {
Name() string
Push(msg Message) error
Flush() error
Close() error
}
// Implementations: email.Emailer, postback.Postback
type App struct {
core *core.Core
manager *manager.Manager
db *sqlx.DB
// ... all subsystems
}
// Methods: func (a *App) GetSubscribers(...)
-- queries/queries.sql -- name: get-subscriber SELECT * FROM subscribers WHERE id = $1; -- name: get-campaign-subscribers SELECT ... WHERE id > $1 ORDER BY id LIMIT $2;
internal/ core/ # Business logic (unexportable) manager/ # Campaign engine (unexportable) bounce/ # Bounce processing auth/ # Authentication media/ # Media storage subimporter/ # CSV import
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGHUP)
go func() {
<-sigChan
srv.Shutdown(ctx) // HTTP
manager.Close() // Campaigns
db.Close() // Postgres
syscall.Exec(...) // Self-replace
}()
// Precedence (last wins):
// 1. CLI flags (--config, --install)
// 2. TOML file (config.toml)
// 3. Env vars (LISTMONK_*)
// 4. Database settings table
ko.Load(file.Provider("config.toml"),
toml.Parser())
ko.Load(env.Provider("LISTMONK_", ...))
Additional Go Idioms Used
| Pattern | Where | Why |
|---|---|---|
| Functional Options | SMTP config, manager setup | Flexible initialization without constructor explosion |
| Context Propagation | HTTP handlers → Core → DB | Request-scoped deadlines, cancellation, auth info |
| sync.Once | Template compilation caching | Thread-safe lazy initialization of expensive resources |
| atomic Operations | Campaign sent counter | Lock-free concurrent counter updates in worker pool |
| embed.FS (stuffbin) | Static assets, SQL, i18n | Single binary deployment with zero external file deps |
| Error Wrapping | Throughout | fmt.Errorf with %w for error chain inspection |
| Table-Driven Tests | Core package | Declarative test cases with expected inputs/outputs |
| Middleware Chain | Echo middleware | Auth → CORS → logging → rate limit → handler |
Scalability & Performance at 1M+ Scale
Proven Production Numbers
listmonk.app states: "A production instance sending 7+ million emails. CPU usage is a fraction of a single core with peak RAM of 57 MB."
What Makes It Fast?
| Technique | Impact | Details |
|---|---|---|
| Keyset Pagination | O(1) batch fetch | WHERE id > cursor instead of OFFSET. Constant time regardless of dataset size. |
| Batch Processing | Amortized DB cost | Default batch_size=1000. One query fetches 1000 subscribers. Reduces round-trips 1000x. |
| Connection Pooling | DB: 25, SMTP: 10/srv | max_open=25, max_idle=25 for Postgres. max_conns per SMTP server. Reuse over create. |
| Materialized Views | Instant dashboards | Pre-aggregated stats. REFRESH CONCURRENTLY allows reads during rebuild. |
| Template Caching | Zero recompile | Templates compiled once at startup, cached in memory. Re-compiled only on update. |
| Goroutine Pool | Bounded concurrency | Fixed pool (default 10 workers). No goroutine leak. Channel backpressure for flow control. |
| Streaming Export | Constant memory | Subscriber export writes CSV rows as they're fetched. No full-dataset buffering. |
| Expression Indexes | DATE() fast | idx_clicks_date ON (TIMEZONE('UTC', created_at)::DATE) avoids per-row function eval. |
| Single Binary | Fast startup | No file I/O for assets. stuffbin serves from memory. Startup in milliseconds. |
| Cache Slow Queries | Configurable | v3+ option: enable/disable query caching with custom cron interval for large DBs. |
Bottlenecks & Scaling Limits
| Bottleneck | Limit | Mitigation |
|---|---|---|
| Single Postgres | ~10M subs before slowdown | Materialized views, cache_slow_queries, read replicas |
| Single Process | No horizontal scaling built-in | Run --passive for read-only replicas. One active sender. |
| SMTP Rate Limits | Provider-imposed (SES: 14/sec) | Sliding window limiter, multiple SMTPs, message_rate config |
| Template Rendering | CPU-bound for complex templates | Keep templates simple. Goroutine pool bounds CPU. |
| link_clicks Table | Can grow to billions of rows | DATE expression index. Consider partitioning at scale. |
What Would You Do to Scale to 100M+ Subscribers?
Great system design interview follow-up:
What Happens When 1 Million Requests Hit listmonk?
listmonk handles 6 fundamentally different request types. Each follows a different hot path through the system. Understanding these flows is critical for system design interviews — an interviewer will ask "walk me through what happens when..." and expect you to trace from TCP accept to DB write to response.
Key Insight: listmonk is NOT a typical CRUD API under load. Campaign sending is a push pipeline (server-initiated, async). Tracking pixels and link clicks are the real high-throughput inbound paths — these get 1M+ hits per campaign.
Flow 1: Tracking Pixel Open (Highest Volume — 1M+ per campaign)
When a subscriber opens an email, their client loads a 1x1 transparent PNG. This is the hottest path in the system.
1. EMAIL CLIENT ─── GET /campaign/{campaignUUID}/{subscriberUUID}/px.png ───▶ ECHO ROUTER │ │ // No auth middleware — public endpoint. No session lookup. ▼ 2. ECHO ROUTER ─── matches route "/campaign/:campUUID/:subUUID/px.png" ───▶ HANDLER: handleCampaignPixel() │ │ // Parse UUIDs from path params. Validate format (fast regex). No DB lookup yet. ▼ 3. HANDLER ─── checks privacy.disable_tracking setting ───▶ DECISION POINT │ ├── IF tracking disabled: Return 1x1 PNG immediately. No DB write. O(1). │ ├── IF privacy.individual_tracking = false: │ Insert into campaign_views with subscriber_id = NULL (anonymous). │ // Still counts the view, but can't attribute to a subscriber. │ └── IF individual tracking enabled: │ ▼ 4. DB INSERT ─── INSERT INTO campaign_views (campaign_id, subscriber_id, created_at) ───▶ POSTGRESQL │ │ // campaign_id resolved from UUID via campaigns table lookup (indexed). │ // subscriber_id resolved from UUID via subscribers table lookup (indexed). │ // Two UUID→ID lookups + one INSERT. Total: 3 queries. │ // campaign_views has BIGSERIAL PK — write-optimized append-only table. ▼ 5. RESPONSE ─── 200 OK, Content-Type: image/png, body: 1x1 transparent PNG (68 bytes) ───▶ EMAIL CLIENT
| At 1M Pixel Requests | What Happens | Bottleneck |
|---|---|---|
| Echo HTTP Server | Goroutine-per-request model. 1M requests = 1M goroutines (but short-lived, ~2KB each). Echo's radix tree router matches in O(1). No middleware overhead on public routes (no auth, no session). | Not a bottleneck — Go HTTP server handles 100k+ req/s on a single core. |
| UUID → ID Lookups | Two indexed lookups: campaigns(uuid) and subscribers(uuid). Both have UNIQUE indexes. B-tree lookup = O(log N). | At 1M req/s this is 2M index lookups/sec. Could become the bottleneck. Fix: cache UUID→ID mapping in-process (Go map with RWMutex, or sync.Map). |
| campaign_views INSERTs | 1M INSERT operations. BIGSERIAL PK = append-only. No index updates except on campaign_id and subscriber_id FK indexes. DATE expression index updated per row. | High write pressure. PostgreSQL can do ~10-50k INSERTs/sec depending on hardware. 1M requests would need: batch inserts, async buffering, or write-ahead table with periodic flush. |
| Connection Pool | 25 connections shared across all goroutines. Each INSERT holds a connection for ~1ms. Effective throughput: ~25,000 INSERTs/sec. | Pool exhaustion at high concurrency. Goroutines block waiting for free connection. Solution: increase max_open, or buffer writes in a Go channel and batch-insert. |
| Response | 68-byte PNG from memory (hardcoded). No disk I/O. No template rendering. Fastest possible response after DB write. | Not a bottleneck. |
Flow 2: Link Click Tracking (High Volume — 100K+ per campaign)
Every link in a campaign email is wrapped. When a subscriber clicks, they hit listmonk first, which records the click then redirects to the actual URL.
1. BROWSER ─── GET /link/{linkUUID}/{campaignUUID}/{subscriberUUID} ───▶ ECHO ROUTER │ │ // Public endpoint. No auth. Three UUIDs in path. ▼ 2. HANDLER ─── handleLinkRedirect() │ ├── Resolve link UUID → link record (get actual URL) │ // links table: url TEXT NOT NULL UNIQUE. UUID indexed. │ ├── Resolve campaign UUID → campaign_id │ ├── Resolve subscriber UUID → subscriber_id (if individual tracking on) │ ├── INSERT INTO link_clicks (campaign_id, link_id, subscriber_id) │ // BIGSERIAL PK. Indexes on campaign_id, link_id, subscriber_id, DATE. │ // 4 index updates per INSERT. Heavier than campaign_views. │ └── 302 REDIRECT → actual URL // User sees the destination page. Redirect is instant.
| At 1M Click Requests | What Happens | Optimization |
|---|---|---|
| 3 UUID Lookups | Link, campaign, and subscriber UUIDs resolved to integer IDs. Three indexed lookups per request. | Cache link UUID→(id, url) in-process. Links are immutable once created — perfect cache candidate with no invalidation needed. |
| link_clicks INSERT | Heavier than campaign_views: 4 index updates per row (campaign_id, link_id, subscriber_id, DATE expression index). | Batch inserts via buffered channel. COPY command for bulk. Consider unlogged table for click data if durability isn't critical. |
| Redirect Latency | User-facing! The subscriber waits for the 302. DB insert is on the critical path — if DB is slow, user sees delay before reaching destination. | Move INSERT off the critical path: write to in-memory buffer, return 302 immediately, flush to DB async. Accept ~1s data delay for instant redirect. |
| Table Growth | link_clicks grows unboundedly. 1M clicks/campaign × 100 campaigns = 100M rows. DATE expression index helps but table gets large. | Time-based partitioning (monthly). Drop partitions older than retention period. Or archive to cold storage. |
Flow 3: Campaign Send Pipeline (1M Outbound Emails)
This is not a request flow — it's a server-initiated push pipeline. But it's what people mean by "1M requests" in the context of listmonk.
MANAGER GOROUTINE (long-lived, polls DB every ~5s) │ ├── SELECT campaigns WHERE status IN ('running','scheduled') │ // status column indexed. Cheap scan — usually 0-5 active campaigns. │ ▼ For each active campaign: BATCH PRODUCER (one goroutine per campaign) │ │ // Loop until all subscribers processed: ├── SELECT subscribers WHERE id > last_subscriber_id │ AND id IN (subscriber_lists WHERE list_id IN campaign_lists) │ AND status = 'enabled' │ ORDER BY id LIMIT 1000 // ← keyset pagination, O(1) via PK index │ │ // For each subscriber in batch: ├── TEMPLATE RENDER ─── Go html/template.Execute() │ // Inject: subscriber.name, subscriber.attribs, campaign.subject │ // Generate: tracking pixel URL, wrapped link URLs │ // Sprig functions available. Compiled template cached (sync.Once). │ // CPU cost: ~0.1ms per render for simple templates. │ ├── msgChan <- message // Send to buffered channel. Blocks if channel full (backpressure). │ └── UPDATE campaigns SET last_subscriber_id = ?, sent = ? // Checkpoint after batch ║ channel (buffer = batch_size) ▼ WORKER POOL (N = app.concurrency, default 10 goroutines) │ │ // Each worker loops: for msg := range msgChan { ... } │ ├── RATE LIMITER ─── rateLimiter.Wait() │ // Token bucket: app.message_rate tokens/sec │ // OR sliding window: app.message_sliding_window_rate per window_duration │ // Blocks goroutine until token available. Natural throttle. │ ├── MESSENGER.Push(msg) ─── SMTP connection from pool │ // Acquires connection from pool (max_conns per SMTP server) │ // SMTP EHLO → MAIL FROM → RCPT TO → DATA → message body → QUIT │ // Connection reused for next message (persistent TCP). Pipelining. │ // On failure: retry up to max_msg_retries times │ ├── atomic.AddInt64(&sent, 1) // Lock-free counter update │ └── ON ERROR: increment error counter. If errors > max_send_errors → PAUSE campaign
1M Emails — Time & Resource Breakdown
| Phase | Operations | Time @ Defaults | Resource |
|---|---|---|---|
| DB Fetch | 1M / 1000 batch_size = 1000 queries | ~1ms/query × 1000 = ~1 second total | 1 DB connection (sequential per campaign) |
| Template Render | 1M renders × ~0.1ms each | ~100 seconds (single-threaded per campaign) | CPU-bound. ~1 core for simple templates. |
| Rate Limit Wait | 1M msgs ÷ 10 msg/sec | ~27.7 hours at default message_rate=10 | Near-zero (goroutine sleep) |
| SMTP Send | 1M connections (reused) × ~50ms avg | 10 workers × 50ms = ~200 msg/sec → ~83 minutes | 10 SMTP TCP connections |
| Progress Updates | 1000 checkpoint UPDATEs | ~1ms each = ~1 second total | 1 DB connection |
| Memory | 10 goroutines + 1000-msg channel buffer + template cache | — | ~50-60 MB peak (proven at 7M scale) |
concurrency=50, message_rate=500, batch_size=5000, max_conns=20 per SMTP × 3 SMTP servers = 60 total connections. Estimated throughput: ~500 msgs/sec = 1M emails in ~33 minutes. Memory stays under 100MB. CPU usage: 2-3 cores.Flow 4: Public Subscription Form Submit
When a user subscribes via a public form on your website. Lower volume but has more steps.
1. BROWSER ─── POST /subscription/form ───▶ ECHO ROUTER │ ├── CAPTCHA VERIFY ─── ALTCHA proof-of-work check OR hCaptcha API call │ // ALTCHA: CPU verification, no external API call (~1ms) │ // hCaptcha: external HTTP call to verify token (~100-300ms) │ ├── DOMAIN FILTER ─── check email domain against blocklist/allowlist │ // privacy.domain_blocklist: ["*.disposable.com", "tempmail.org"] │ // In-memory check. O(N) against list but lists are tiny. │ ├── UPSERT SUBSCRIBER ─── INSERT ... ON CONFLICT (email) DO UPDATE │ // Case-insensitive: idx_subs_email ON LOWER(email) │ // If exists: update name/attribs. If new: create with UUID. │ ├── INSERT subscriber_lists ─── subscribe to requested lists │ // Status = 'unconfirmed' for double-optin lists │ // Status = 'confirmed' for single-optin lists │ ├── IF double-optin: SEND CONFIRMATION EMAIL │ // Render optin template with confirmation link │ // Push to SMTP via messenger (same pool as campaigns) │ // Confirmation link: /subscription/optin/{subscriberUUID}/{listUUID} │ └── 200 OK ─── render success template page
Flow 5: REST API Request (Admin/Programmatic)
API-triggered operations: creating subscribers, managing lists, triggering transactional emails. Authenticated path.
1. CLIENT ─── GET /api/subscribers?page=1&per_page=50 ───▶ ECHO ROUTER │ │ // Authorization header: "username:api-token" (base64) ▼ 2. MIDDLEWARE CHAIN │ ├── Auth Middleware ─── validate API token │ // Lookup user by username in users table (indexed) │ // Verify token (bcrypt compare or direct match for API users) │ // Load user role + permissions from roles table │ // Set auth context on echo.Context │ ├── Permission Check ─── does user have "subscribers:get_all" permission? │ // Check user_role permissions array. If list-scoped, filter by allowed lists. │ └── CORS Middleware ─── check Origin header against security.cors_origins ▼ 3. HANDLER ─── GetSubscribers() │ ├── Parse query params ─── page, per_page, query, list_id, order_by │ ├── Build SQL ─── dynamic WHERE clause from search/filter params │ // If "query" is plain text: ILIKE search on email and name │ // If "query" starts with SQL expression: parse as raw WHERE clause │ // Permission-filtered: only show subscribers in user's allowed lists │ ├── EXECUTE SQL ─── via prepared statement from goyesql │ // Uses sqlx.Select() for struct scanning │ // COUNT(*) for total (separate query) │ // OFFSET/LIMIT pagination for API (not keyset — acceptable for admin UI) │ └── 200 OK ─── JSON response with data[] + total + per_page + page
| At 1M API Requests | Bottleneck Analysis |
|---|---|
| Auth Overhead | Every API request hits the DB for user/role lookup. At 1M req: 1M user queries + 1M role queries. Fix: cache authenticated sessions in-process with TTL. API tokens are static — perfect for caching. |
| OFFSET Pagination | API uses OFFSET/LIMIT (not keyset). page=10000&per_page=50 means scanning 500K rows. Degrades linearly. Acceptable for admin UI (low page numbers) but problematic for bulk API consumers. |
| No API Rate Limiting | No per-consumer rate limiting on the API. A misbehaving client can exhaust the DB connection pool. Fix: per-token rate limiter middleware (token bucket or sliding window). |
| Connection Pool Contention | API, campaign engine, tracking pixels, and bounce processor all share the same 25-connection pool. Under 1M API requests, campaign sending would slow down. Fix: separate pools or connection prioritization. |
Flow 6: Bounce Webhook (SES/SendGrid/Postmark)
After a campaign, bounce notifications flow back from email providers. Volume scales with send volume — expect 2-5% bounce rate.
1. SES/SENDGRID ─── POST /webhooks/bounce/{type} ───▶ ECHO ROUTER │ ├── Authenticate webhook ─── verify provider-specific auth │ // SES: verify SNS signature. SendGrid: verify key. Postmark: basic auth. │ ├── Parse bounce payload ─── extract: email, bounce type, campaign ID │ // Normalize across providers into internal bounce_type ENUM │ ├── INSERT INTO bounces ─── (subscriber_id, campaign_id, type, source, meta) │ // Subscriber looked up by email. campaign_id nullable (might be unknown). │ ├── CHECK THRESHOLD ─── count bounces for this subscriber by type │ // SELECT COUNT(*) FROM bounces WHERE subscriber_id = ? AND type = ? │ // Compare against configured actions: hard.count=1, soft.count=2 │ └── IF threshold exceeded: UPDATE subscribers SET status = 'blocklisted' // Or DELETE subscriber if action = 'delete' // Cascading: subscriber_lists entries also cleaned up (FK CASCADE)
Flow 7: Transactional Email API
Single message sends triggered by your application (welcome emails, password resets, order confirmations). Different from campaign sends — synchronous, one-at-a-time.
1. YOUR APP ─── POST /api/tx ───▶ ECHO ROUTER │ │ // Body: { "subscriber_email": "...", "template_id": 5, "data": {...} } │ // Auth: API token (required) ▼ 2. HANDLER │ ├── Resolve subscriber ─── lookup by email or ID ├── Load TX template ─── from template cache (in-memory) ├── Render template ─── with subscriber data + custom data payload ├── messenger.Push(msg) ─── synchronous SMTP send │ // Blocks until SMTP ACK or error. Caller waits. │ // No rate limiting — each TX call = one immediate send. └── 200 OK ─── { "data": true }
| At 1M TX Requests | What Breaks |
|---|---|
| Synchronous SMTP | Each TX call blocks until SMTP responds (~50-200ms). With 25 DB connections, max concurrent TX sends = ~25. At 200ms each = ~125 TX/sec. 1M would take ~2.2 hours. Fix: async queue with delivery confirmation callback. |
| No TX Rate Limiting | TX calls bypass the campaign rate limiter. A burst of 10K TX calls would exhaust SMTP connections. Fix: separate TX rate limiter or shared token bucket. |
| Shared SMTP Pool | TX emails share the same SMTP connection pool as campaigns. Heavy TX load can starve campaign sending. Fix: dedicated SMTP server for TX (use named messenger). |
Request Flow Summary — All 7 Flows at 1M Scale
| Flow | Type | Volume Profile | Hot Path Cost | Primary Bottleneck | DB Queries/Req |
|---|---|---|---|---|---|
| Tracking Pixel | Inbound GET | 1M+ per large campaign | 2 UUID lookups + 1 INSERT | DB write throughput | 3 |
| Link Click | Inbound GET → 302 | 100K-500K per campaign | 3 UUID lookups + 1 INSERT | DB write + redirect latency | 4 |
| Campaign Send | Outbound push | 1M emails (async) | Batch fetch + render + SMTP | Rate limiter (intentional) | ~1 per 1000 |
| Subscription | Inbound POST | 100-10K/day | CAPTCHA + upsert + optin email | CAPTCHA verification | 2-4 |
| API CRUD | Inbound REST | Depends on integration | Auth + query + JSON marshal | Auth DB lookup (cacheable) | 3-5 |
| Bounce Webhook | Inbound POST | 2-5% of send volume | Auth + INSERT + threshold check | Negligible at normal rates | 3-4 |
| TX Email | Inbound POST → SMTP | Depends on app | Auth + render + sync SMTP | Synchronous SMTP blocking | 2-3 |
Will Goroutines Run Into Memory Errors?
Short answer: the campaign engine is safe, but the HTTP server is not protected. listmonk has two completely different goroutine models running simultaneously, and they have very different risk profiles.
Go Goroutine Memory Model — The Basics
| Property | Value | Why It Matters |
|---|---|---|
| Initial Stack Size | 2 KB (Go 1.4+) | A goroutine starts tiny. 1000 idle goroutines = ~2 MB. Cheap to create. |
| Stack Growth | Dynamically grows up to 1 GB (default) | Stack doubles when needed (copy-on-grow). A goroutine doing real work (allocating buffers, rendering templates, building HTTP responses) can grow to 8-64 KB each. |
| Heap Allocations | Varies per goroutine workload | Template rendering, JSON marshaling, SQL result scanning all allocate on the heap. These are the real memory consumers — not the goroutine stack itself. |
| GC Pressure | Go GC runs concurrently | High allocation rate from 100K+ goroutines triggers frequent GC cycles. GC latency spikes (STW pauses ~1-5ms) can compound under load. |
| OS Threads | GOMAXPROCS (default = num CPUs) | Goroutines are multiplexed onto OS threads. 1M goroutines still only uses ~8-16 OS threads. This is NOT the bottleneck. |
The Two Goroutine Models in listmonk
app.concurrency).Bounded by: Channel buffer (batch_size) + fixed worker count. Even sending 7M emails, there are only 10-50 goroutines alive.
Memory per campaign: ~10 workers × ~64KB stack + channel buffer of 1000 messages × ~2KB each = ~2.6 MB total.
Why it's safe: Producer blocks on channel send when buffer is full (backpressure). Workers block on rate limiter. Goroutine count never exceeds
concurrency. You cannot OOM from campaign sending.The pattern:
// Fixed pool — goroutine count = concurrency (constant)
for i := 0; i < concurrency; i++ {
go worker(msgChan) // Exactly N goroutines. No more.
}
// Producer blocks if channel full — natural backpressure
msgChan <- msg // Blocks here, NOT by spawning new goroutines
net/http server spawns one goroutine per incoming connection. Echo sits on top of this. No built-in limit.Bounded by: Nothing in listmonk's code. If 100K tracking pixel requests arrive simultaneously, Go creates 100K goroutines.
Memory per request: ~8-64KB stack + ~2-10KB heap (UUID parsing, DB result scanning, response writing) = ~50-70KB per concurrent request.
At 100K concurrent: 100K × 70KB = ~7 GB. At 500K concurrent: ~35 GB. At 1M concurrent: ~70 GB → OOM on most machines.
The pattern:
// Go's net/http — UNBOUNDED goroutine creation
func (srv *Server) Serve(l net.Listener) error {
for {
conn, _ := l.Accept()
go srv.serve(conn) // New goroutine for EVERY connection!
} // No limit. No backpressure.
}
When Does OOM Actually Happen?
The key distinction is concurrent vs total requests. 1M requests over an hour is fine. 1M requests in 1 second is a problem.
| Scenario | Concurrent Goroutines | Memory | OOM Risk |
|---|---|---|---|
| Campaign send: 1M emails | 10-50 (fixed pool) | ~3-10 MB | None — bounded by design |
| Tracking pixels: 1M over 24 hours ~12 req/sec avg | ~12-50 | ~1-3 MB | None — requests complete fast (~5ms each) |
| Tracking pixels: 1M in 1 hour ~278 req/sec avg | ~278-1000 | ~20-70 MB | Low — manageable if DB keeps up |
| Tracking pixels: 1M in 1 minute ~16,667 req/sec | ~5,000-50,000 | ~350 MB - 3.5 GB | Medium — DB becomes bottleneck, goroutines pile up waiting for connections |
| Tracking pixels: 100K concurrent spike / DDoS / viral email | 100,000 | ~7 GB | High — goroutines block on 25-conn DB pool, stack up in memory |
| Tracking pixels: 1M concurrent extreme / unrealistic | 1,000,000 | ~70 GB | OOM crash — Go will allocate until killed by OS |
The Real Danger: Goroutine Pile-Up on DB Pool
The OOM risk isn't from goroutine creation — it's from goroutine accumulation. Here's the cascade failure:
CASCADE FAILURE SCENARIO — 50K req/sec tracking pixel burst t=0s 50,000 requests arrive. Go spawns 50,000 goroutines. (~3.5 GB) │ ▼ t=0.001s All 50K goroutines try to acquire a DB connection from pool (max_open=25). 25 goroutines get connections. 49,975 goroutines BLOCK waiting. │ ▼ t=0.005s Each DB query takes ~5ms. First 25 connections freed. Next 25 goroutines proceed. But another 50K requests arrived! Now 99,950 goroutines blocked. (~7 GB) │ ▼ t=1.0s At 50K/sec inflow, 25 conn pool processes ~5000 req/sec (25 × 200 queries/sec). Deficit: 45,000 goroutines/sec accumulating. After 1 second: ~45K blocked. │ ▼ t=10s ~450K goroutines blocked. Memory: ~30 GB. GC thrashing. Latency spikes. │ ▼ t=20s ~900K goroutines. OOM killer triggers. Process killed. Campaign sending (which shares the same process) dies too.
What listmonk Does NOT Have (Protection Gaps)
| Missing Protection | Consequence | What You'd Add |
|---|---|---|
| No HTTP connection limit | Unbounded goroutine creation on burst traffic | Echo middleware: middleware.RateLimiter() or custom semaphore. Reject with 429 when concurrent > threshold. Or server.MaxConnsPerHost. |
| No request queue / shed | All requests accepted even when DB pool is full — they just block | Load shedding: if DB pool queue > N, return 503 immediately. Fail fast instead of accumulate. context.WithTimeout on DB calls. |
| No goroutine budget | No visibility into goroutine count. No alarm threshold. | Expose runtime.NumGoroutine() as Prometheus metric. Alert when > 10K. Circuit break at 50K. |
| No request timeout | If DB is slow, goroutines hang indefinitely holding memory | echo.Middleware(TimeoutMiddleware(5 * time.Second)). Or http.Server{ReadTimeout: 10s, WriteTimeout: 10s}. |
| No async write path | Every tracking pixel does a synchronous DB INSERT before responding | Write to in-memory ring buffer. Background goroutine batch-flushes to DB every 100ms. Decouple request handling from DB writes. |
How You'd Fix This — Production-Grade Architecture
PRODUCTION FIX: Bounded Concurrency + Async Writes // 1. Add server-level connection limits srv := &http.Server{ Addr: ":9000", ReadTimeout: 10 * time.Second, // Prevent slow-read attacks WriteTimeout: 15 * time.Second, // Prevent goroutine hang on slow clients IdleTimeout: 60 * time.Second, // Close idle keep-alive connections MaxHeaderBytes: 1 << 20, // 1MB header limit } // 2. Add concurrency limiter middleware sem := make(chan struct{}, 10000) // Max 10K concurrent requests e.Use(func(next echo.HandlerFunc) echo.HandlerFunc { return func(c echo.Context) error { select { case sem <- struct{}{}: defer func() { <-sem }() return next(c) default: return c.String(503, "server busy") // Load shedding } } }) // 3. Async tracking writes (decouple from request path) trackChan := make(chan TrackEvent, 100000) // Buffered channel // Handler: write to channel, return immediately func handlePixel(c echo.Context) error { select { case trackChan <- TrackEvent{campID, subID}: // Queued successfully default: // Channel full — drop event (acceptable for analytics) } return c.Blob(200, "image/png", pixel1x1) // Instant response } // Background flusher: batch inserts every 100ms go func() { ticker := time.NewTicker(100 * time.Millisecond) batch := make([]TrackEvent, 0, 1000) for { select { case ev := <-trackChan: batch = append(batch, ev) if len(batch) >= 1000 { flushBatch(batch) // COPY INTO campaign_views batch = batch[:0] } case <-ticker.C: if len(batch) > 0 { flushBatch(batch) batch = batch[:0] } } } }() // 4. DB call timeouts ctx, cancel := context.WithTimeout(c.Request().Context(), 3*time.Second) defer cancel() db.QueryContext(ctx, query, args...) // Cancels if DB slow // 5. Monitor goroutine count go func() { for range time.Tick(5 * time.Second) { n := runtime.NumGoroutine() metrics.Gauge("goroutines", n) // Prometheus metric if n > 50000 { log.Error("goroutine count critical", "count", n) } } }()
Interview Answer: "How Does Go Handle 1M Concurrent Requests?"
listmonk's campaign engine avoids this by using a fixed goroutine pool with channel backpressure — the producer-consumer pattern. But the HTTP server uses Go's default unbounded model with no concurrency limit, no request timeout, and no load shedding.
The production fix is three layers: (1) Server-level timeouts (ReadTimeout, WriteTimeout) prevent slow clients from holding goroutines. (2) Concurrency limiter middleware (semaphore channel) caps concurrent requests and returns 503 when saturated. (3) Async write path for hot paths (tracking pixels, link clicks) — decouple the DB write from the HTTP response using a buffered channel with a background batch flusher. This turns a 5ms synchronous DB call into a <50µs channel send.
Contention & Concurrency — Where Things Fight Over Shared Resources
Concurrency is about structure (multiple things can run). Contention is about conflict (multiple things fighting for the same resource). listmonk has both, and understanding the contention points is what separates a senior answer from a junior one in interviews.
Resource Contention Map — Every Shared Resource in listmonk
| Shared Resource | Who Competes | Contention Type | Protection Mechanism | Risk Level |
|---|---|---|---|---|
| PostgreSQL Connection Pool max_open = 25 | HTTP handlers, Campaign engine, Bounce processor, Importer, Cron jobs (matview refresh) | Mutex-like (pool internal lock) | Go's database/sql pool with internal mutex. Goroutines block on db.Conn() when pool exhausted. | HIGH — This is the #1 contention point. All subsystems share one pool. No priority, no isolation. |
| SMTP Connection Pool max_conns per server | Campaign workers, TX email handler, Optin confirmation sender, Notification sender | Channel-based semaphore | Buffered channel acts as connection pool. Workers block on channel receive when all connections in use. | MEDIUM — Campaign workers dominate. TX emails can starve during active campaigns. |
| Campaign Message Channel buffer = batch_size | Batch producer (1 goroutine) vs Worker pool (N goroutines) | Channel (CSP) | Go buffered channel. Producer blocks on send if full. Workers block on receive if empty. Lock-free. | LOW — By design. Channel provides natural flow control. No contention, only coordination. |
| Campaign Sent Counter shared int64 | All N worker goroutines (10-50 concurrent) | Atomic CAS | atomic.AddInt64(&sent, 1). Lock-free compare-and-swap at hardware level. No mutex. | NONE — Atomic operations have zero contention overhead. O(1) per operation regardless of concurrency. |
| Template Cache compiled templates in memory | Campaign workers (read) vs Admin updating template (write) | sync.Once / recompile | Templates compiled once (sync.Once). On admin update: recompile and swap pointer. Workers read stale until swap. No read-side lock. | NONE during normal operation. Momentary during recompile (new pointer swap is atomic). |
| Settings / Config app.constants struct | All handlers (read) vs Settings update (write → restart) | Process restart | Settings changes trigger SIGHUP → full process restart. No concurrent read/write possible — the entire process is replaced via syscall.Exec(). | NONE — listmonk avoids the problem entirely by restarting. No RWMutex needed. |
| SSE Event Bus events.Events | Campaign manager (publish) vs Browser clients (subscribe) | Channel per subscriber | Each SSE client gets its own channel. Publisher fans out to all subscriber channels. No shared state between clients. | LOW — Fan-out pattern. Publisher may slow if a client channel is full (slow consumer). |
| CSV Import Processor single goroutine | Import goroutine vs HTTP API (checking import status) | Mutex (likely) | Importer runs as a single goroutine. Status checked via API. Likely uses a mutex or atomic for progress state. | LOW — Single writer, occasional reader. Minimal contention. |
DB Pool Contention — The Priority Inversion Problem
This is the most important contention point in listmonk and a great interview discussion topic:
PRIORITY INVERSION: All subsystems share 25 DB connections ┌─────────────────────────────────────────────────────────┐ │ sql.DB Pool (max_open=25) │ │ ┌────┐┌────┐┌────┐┌────┐┌────┐ ... ┌────┐ │ │ │conn││conn││conn││conn││conn│ │conn│ ×25 │ │ └──┬─┘└──┬─┘└──┬─┘└──┬─┘└──┬─┘ └──┬─┘ │ └──────┼─────┼─────┼─────┼─────┼──────────────┼───────────┘ │ │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ ▼ Campaign Campaign Pixel Pixel API Bounce Batch1 Batch2 Track Track List Check Problem scenarios: 1. Campaign sending large batches holds connections for batch SELECT (~5-50ms). Meanwhile, admin API requests queue behind campaign queries. Admin dashboard feels slow during active campaigns. 2. 1M tracking pixel INSERTs saturate the pool. Campaign batch fetch can't get a connection. Campaign throughput drops. Send time extends. 3. Materialized view REFRESH (cron job) takes 30-60 seconds. Holds 1 connection during entire refresh. 24 connections left for everything else. 4. Bulk subscriber import doing 10K upserts. Competes with campaign sending for connections. Both slow down.
| Fix Strategy | How | Trade-off |
|---|---|---|
| Separate Connection Pools | Create 3 sql.DB instances: one for campaign engine (10 conns), one for HTTP handlers (10 conns), one for background jobs (5 conns). Each with independent max_open. | More total connections to Postgres. Needs max_connections increase on DB side. Slightly more memory. |
| Connection Priority | Custom pool wrapper that reserves N connections for high-priority callers (campaign engine). HTTP requests use remaining. Implement with two semaphores. | Complex. Can cause HTTP starvation if campaign is too aggressive. |
| Read Replica Split | Route all SELECT queries (subscriber lookups, dashboard, API reads) to a read replica. Writes (INSERTs, UPDATEs) go to primary. | Replication lag (milliseconds). Needs application-level routing. listmonk doesn't support this natively. |
| Context Timeouts | context.WithTimeout(ctx, 3*time.Second) on all DB calls. If a connection isn't available within 3s, fail fast with 503 instead of blocking. | Requests fail under load instead of queuing. Better for latency SLOs. Some data operations may need longer timeouts. |
| PgBouncer | External connection pooler between listmonk and Postgres. Transaction-mode pooling. Multiplexes 25 application connections into 100+ Postgres connections. | Additional infrastructure. Doesn't work with prepared statements in session mode. Adds ~0.1ms latency. |
Row-Level Contention in PostgreSQL
| Contention Point | Scenario | What PostgreSQL Does | Impact |
|---|---|---|---|
| campaigns row UPDATE | Campaign manager updates last_subscriber_id and sent after each batch. Admin simultaneously views campaign status. | Row-level lock (MVCC). Writer acquires RowExclusiveLock. Reader sees old snapshot (no block). Writers don't block readers. | None — MVCC handles this perfectly. Reads see consistent snapshot. |
| subscribers UPSERT | CSV import upserting 10K subscribers while public subscription form creates new subscribers. Both touch idx_subs_email unique index. | Each INSERT/UPDATE acquires row lock. Concurrent upserts on different emails: no conflict. Same email: one waits for other's transaction to commit. | Low — conflicts only on same email. Import batches in transactions, so a stuck import blocks other writes to same subscribers. |
| subscriber_lists INSERT | Two campaigns targeting overlapping lists. Both reading subscriber_lists to find recipients. Campaign manager only reads; doesn't write to this table during send. | No contention — campaign send only SELECTs from subscriber_lists. Subscription changes (add/remove) acquire row locks on specific (subscriber_id, list_id) pairs. | None — read-only during campaign processing. |
| campaign_views / link_clicks INSERT | Thousands of concurrent tracking pixel and link click INSERTs. All writing to the same tables. | Append-only tables with BIGSERIAL PK. Each INSERT acquires a nextval() on the sequence (lightweight lock) + index locks. No row-level conflicts. | Sequence lock is a bottleneck at very high insert rates (~50K+/sec). Fix: CACHE 100 on sequence to reduce lock acquisitions. Or batch inserts. |
| settings UPDATE | Admin saves settings while campaign is reading config. | Settings are read at startup and cached in-memory (app.constants). DB write doesn't affect running config. Full process restart needed to pick up changes. | None — decoupled by design. Config is immutable during process lifetime. |
| REFRESH MATERIALIZED VIEW CONCURRENTLY | Cron job refreshes dashboard stats while admin views dashboard. | CONCURRENTLY keyword allows reads during refresh. Creates new version of matview, swaps atomically. Requires UNIQUE index on matview. | None for readers. The refresh itself holds an ExclusiveLock on the matview — two concurrent refreshes would block. |
| bounces threshold CHECK | Multiple bounce webhooks for same subscriber arrive simultaneously. Each does SELECT COUNT(*) FROM bounces WHERE subscriber_id = ? then potentially UPDATE subscribers SET status = 'blocklisted'. | TOCTOU race condition. Two webhooks both count 0 bounces, both insert, both check threshold — subscriber may get N+1 bounces before blocklist triggers. | Minor — subscriber gets one extra email before blocklist. Not dangerous. Fix: SELECT ... FOR UPDATE on subscriber row during bounce processing. |
Go-Level Concurrency Primitives Used
| Primitive | Where Used | Why This Choice | Contention Characteristics |
|---|---|---|---|
| Buffered Channel | Campaign message pipeline SMTP connection pool SSE event fan-out | CSP model. Decouples producer from consumer. Natural backpressure. No explicit locking needed. | Zero contention when buffer isn't full/empty. Contention only at boundaries: producer blocks when full (backpressure), consumer blocks when empty (idle). |
| atomic.AddInt64 | Campaign sent counter Campaign error counter | Lock-free counter. Hardware CAS instruction. No goroutine blocking, ever. | Near-zero. CAS retry on contention (extremely rare, nanoseconds). Outperforms mutex by 10-100x for simple counters. |
| sync.Once | Template compilation One-time initialization | Thread-safe lazy init. First caller executes, all others wait then return cached result. | First call: brief mutex hold during init. All subsequent calls: atomic read (zero contention). Perfect for "compute once, read forever" patterns. |
| database/sql Pool | All PostgreSQL access | Built-in connection pooling. Thread-safe. Handles connection lifecycle. | Internal mutex on connRequests map. Under high concurrency, goroutines queue in FIFO order. This is the primary contention point in the entire system. |
| No explicit Mutex | — | listmonk avoids sync.Mutex and sync.RWMutex in hot paths. Prefers channels, atomics, and immutable data (restart on config change). | Architectural choice: channels for coordination, atomics for counters, process restart for config. Eliminates most mutex contention by design. |
Concurrent Campaign Execution — Overlapping Lists
What happens when two campaigns target overlapping subscriber lists simultaneously?
SCENARIO: Campaign A targets List 1 (500K subs), Campaign B targets List 2 (500K subs) 200K subscribers are on BOTH lists. Campaign A goroutines: Campaign B goroutines: 1 batch producer 1 batch producer 10 send workers 10 send workers ────────────── ────────────── 11 goroutines 11 goroutines // Total: 22 goroutines What happens to the 200K overlapping subscribers? ✓ Both campaigns independently fetch and send to them. ✓ The subscriber receives BOTH emails (intentional — different campaigns). ✓ No deduplication across campaigns (by design). ✓ No row locks conflict — campaigns only SELECT from subscriber_lists. ✓ Each campaign has independent last_subscriber_id cursor. ✓ Each campaign has independent sent counter (atomic). Contention points: 1. DB Pool: 22 goroutines competing for 25 connections. Batch fetches: 2 long-running SELECTs. Workers doing progress UPDATEs: occasional contention. Mitigation: workers mostly wait on SMTP (I/O bound), not DB. 2. SMTP Pool: 20 workers sharing max_conns connections. If max_conns=10, workers queue for connections. Mitigation: each campaign can use different named SMTP servers. 3. Rate Limiter: GLOBAL rate limit (message_rate) shared across campaigns. Two campaigns each wanting 100 msg/sec with rate=100 → each gets ~50. Campaign throughput halves with each concurrent campaign. 4. No campaign-level resource isolation. A slow campaign (complex template) slows all campaigns by holding DB connections longer.
Race Conditions — Known & Potential
| Race Condition | Severity | Description | Fix |
|---|---|---|---|
| Bounce TOCTOU | Low | Two bounce webhooks arrive for same subscriber simultaneously. Both read count=0, both insert, both check threshold — neither triggers blocklist because each sees count=1 when threshold=2. Next bounce will trigger it. | SELECT ... FOR UPDATE on subscriber row. Or INSERT + SELECT COUNT in a single transaction with SERIALIZABLE isolation. |
| Campaign Status Transition | Low | Admin clicks "Pause" while manager is updating sent count. PostgreSQL row-level MVCC prevents corruption — the UPDATE acquires a row lock. But the pause might not take effect until the current batch completes. | Current behavior is acceptable. Campaign checks for pause signal between batches. Near-instant in v3.0.0 rewrite. |
| Duplicate Subscription | None | Two form submissions for same email at the same time. INSERT ... ON CONFLICT (email) DO UPDATE handles this atomically — PostgreSQL serializes at the unique index level. | Already handled by DB unique constraint + upsert. No application-level fix needed. |
| Matview Concurrent Refresh | Low | Two cron triggers fire simultaneously (unlikely but possible). REFRESH MATERIALIZED VIEW CONCURRENTLY acquires ExclusiveLock — second call blocks until first completes. | Not harmful — just wastes time. Could add application-level lock (pg_advisory_lock) to skip if already running. |
| Template Hot-Swap | None | Admin updates template while campaign is mid-send using that template. Campaign workers hold reference to compiled template in memory. Template recompile creates new object; old one is GC'd after campaign finishes. | Safe — Go's GC keeps old template alive as long as goroutines reference it. New campaigns get the updated template. In-flight campaign uses the old version. |
| Subscriber Delete During Send | None | Admin deletes subscriber while campaign is sending to them. Campaign already fetched the batch — subscriber data is in memory. DB INSERT for tracking: subscriber_id FK SET NULL handles gracefully. | Already handled by schema design. ON DELETE SET NULL on campaign_views and link_clicks preserves analytics data. |
Interview Answer: "How Do You Handle Contention in a Concurrent System?"
1. Application-level: listmonk uses channels (not mutexes) for coordination — the CSP model. The campaign engine is a bounded worker pool where backpressure is built into the channel. Counters use lock-free atomics. Config is immutable — changes trigger a full process restart, completely sidestepping read-write contention. This is an architectural decision to prefer simplicity over fine-grained locking.
2. Connection pool: The single shared DB pool (25 connections) is the primary contention point. All subsystems — HTTP handlers, campaign engine, bounce processor, cron jobs — compete for the same connections. Under high load, goroutines queue behind the pool's internal mutex. The fix is pool isolation: separate pools for campaign engine vs HTTP handlers, or a priority queue that reserves connections for critical paths.
3. Database-level: PostgreSQL MVCC eliminates most row-level contention — readers never block writers, writers never block readers. The real contention is on sequences (BIGSERIAL PK on high-insert tables like link_clicks) and unique index locks during concurrent upserts. Mitigate with sequence caching (
CACHE 100) and batch inserts.4. Cross-campaign: The global rate limiter is shared across all campaigns — two concurrent campaigns each get half the throughput. There's no per-campaign resource isolation. At scale, you'd partition resources per campaign: dedicated worker pools, separate rate limiters, and named SMTP servers for high-priority campaigns.
Interview Framing: "The system handles 1M requests across different hot paths. The campaign send is I/O-bound on SMTP with intentional rate limiting. Tracking pixels and link clicks are the surprise high-throughput paths — they're write-heavy, user-facing, and scale linearly with subscriber count. The architectural insight is that these are append-only writes to analytics tables — perfect candidates for buffered batch inserts, table partitioning, and async processing. The Go HTTP server and goroutine model are never the bottleneck; PostgreSQL write throughput is."