Infrastructure

Connection Pool Monitoring: Finding the Real Bottlenecks

PG Monitoring Team March 25, 2026 7 min read

The Connection Problem Nobody Talks About

"Too many connections" is a symptom. The real questions are:

  • Which queries hold connections longest?
  • Are connections idle, active, or waiting?
  • Is the pool too small, or are queries too slow?
  • What's the optimal pool size for your workload?

Most monitoring tools just show connection count. That's useless for diagnosis.

What Traditional Tools Show You

pg_stat_activity

SELECT count(*) FROM pg_stat_activity WHERE state = 'active';
 count 
-------
    87

Okay, 87 active connections. Now what? Which ones matter? Which are problematic?

CloudWatch / Datadog RDS Metrics

Show: DatabaseConnections metric over time

Alert when: Connections > 80% of max

Problem: By the time you alert, users are already getting "too many connections" errors. You're reactive.

pgBouncer Statistics

If you're using pgBouncer, you get pool stats:

SHOW POOLS;
 database | cl_active | cl_waiting | sv_active | sv_idle 
----------+-----------+------------+-----------+---------
 mydb     |        45 |         12 |        20 |        5

Better, but still limited:

  • No correlation with query performance
  • No wait time analysis
  • No optimal sizing recommendations

PG Monitoring's Connection Intelligence

1. Connection State Breakdown

Total connections: 94/100 (94%)

Breakdown:
├─ active: 67 (71%) 
│  ├─ idle in transaction: 12 (problematic)
│  ├─ long running (>30s): 8 (investigate)
│  └─ normal queries: 47
├─ idle: 23 (24%)
│  ├─ idle < 1min: 18 (normal)
│  └─ idle > 5min: 5 (leak?)
└─ waiting: 4 (4% - pool bottleneck)

Wait time analysis:
- Average wait: 230ms
- p95 wait: 1.2s
- Max wait: 4.7s (connection storm at 14:23)

2. Query-to-Connection Correlation

Find which queries consume the most connection-seconds:

Top connection consumers (last hour):
1. reports/monthly_summary: 23 connections × 45s avg = 1,035 conn-seconds
2. analytics/user_cohort: 15 connections × 32s avg = 480 conn-seconds  
3. api/order_history: 67 connections × 0.8s avg = 54 conn-seconds

Optimization opportunity:
Move reports to read replica or background job.
Would free up 40% of connection capacity.

3. Pool Size Recommendations

Dynamic calculation based on actual usage patterns:

Current: max_connections = 100
Analysis:
- Peak active: 89 (89% utilization)
- Average active: 34
- Wait queue peaks: 3x daily at 09:00, 14:00, 17:00
- Wait time during peaks: 2.1s average

Recommendation:
- Immediate: Increase to 150 (adds headroom)
- Better: Implement connection pooling (pgBouncer)
  Suggested: pool_size = 40, max_client_conn = 200
  Expected: 60% connection reduction, wait time → 50ms

Cost comparison:
- RDS next tier: +$180/month
- pgBouncer on existing instance: $0

4. Connection Leak Detection

Idle connection analysis:
5 connections idle > 5 minutes:
- App server: web-03
- First seen: 14:23:17 (same time as deployment)
- Query last run: BEGIN; (never committed!)

Diagnosis: Application deployment didn't close transaction
Action: Terminate connections, fix app rollback logic

Case Study: The Connection Storm

A SaaS app kept hitting "too many connections" errors during peak hours.

Initial response: Keep increasing max_connections (100 → 150 → 200)

Problem: PostgreSQL performance degraded with more connections (context switching overhead)

PG Monitoring analysis revealed:

Root cause: 3 analytics queries running synchronously
- Each took 30-45 seconds
- During peak: 20+ concurrent analytics requests
- Connection pile-up: 60+ waiting
- Transaction locks held for 45s → other queries blocked

Real problem: Not connection count, but connection duration
Solution: Move analytics to read replica with dedicated pool
Result: 
- Primary connections: 200 → 45
- Wait times: 2.1s → 50ms
- Query performance: +40% (no lock contention)

Feature Comparison

Featurepg_stat_activityCloudWatchpgBouncerPG Monitoring
Connection count
State breakdown (active/idle/wait)
Wait time trackingPartial
Query correlationManual
Pool sizing recommendations
Leak detection
Cost optimization

The Bottom Line

Traditional tools: "You have 94 connections."

PG Monitoring: "94 connections with 4 waiting (230ms avg). Top consumer is reports/monthly_summary using 23 connections for 45 seconds each. 40% of your capacity consumed by 3 queries. Move to read replica to eliminate waits and save $180/month on RDS tier upgrade."

Related Articles

Ready to experience better PostgreSQL monitoring?

Join thousands of teams who switched from traditional tools to PG Monitoring's AI-powered platform.

Talk to us