A Different Mental Model: MPP
Netezza is a massively parallel processing (MPP) appliance. Data is spread across many data slices, and every slice processes its share in parallel. That makes the most important tuning decision not an index — it is how the data is distributed across slices.
The Distribution Key Decides Everything
CREATE TABLE sales (
sale_id BIGINT,
customer_id BIGINT,
amount NUMERIC(12,2)
) DISTRIBUTE ON (customer_id);
Two failure modes to avoid:
- Data skew — a key with few distinct values (or many nulls) piles rows onto a handful of slices. Those slices become the bottleneck while the rest sit idle. Check with
nz_skew. - Redistribution on join — if two joined tables use different distribution keys, Netezza must redistribute one across the network at query time. Co-locate joins by distributing both tables on the join column.
Pick a high-cardinality column that is frequently used in joins. DISTRIBUTE ON RANDOM avoids skew but forces redistribution on every join — use it only for staging.
Zone Maps Replace Indexes
Netezza has no traditional indexes. Instead, zone maps automatically record the min/max of certain columns per extent, letting the system skip extents that cannot match a WHERE range — conceptually identical to PostgreSQL's BRIN. They work best when data is naturally ordered, so loading in date order makes date-range queries dramatically faster.
GROOM and Statistics
GROOM TABLE sales; -- reclaim space from deleted/updated rows
GENERATE STATISTICS ON sales; -- keep the optimizer honest
Like PostgreSQL's VACUUM, updates and deletes leave logically-deleted rows behind until GROOM reclaims them. Stale statistics produce bad plans — regenerate them after large loads.
Fast Bulk Loading with nzload
nzload -t sales -df /data/sales.csv \
-delim ',' -skipRows 1 -maxErrors 100 \
-bf /data/sales.bad
Load into a table that is already distributed correctly, in the natural order of your most common range filter, then GENERATE STATISTICS. Loading first and distributing later wastes a full redistribution pass.
The Constant: Watch Your Skew and Stats
Whether the engine is Netezza, PostgreSQL or something else, the operational truths rhyme: uneven data layout and stale statistics quietly turn fast queries slow. The monitoring discipline PG Monitoring brings to PostgreSQL — track plan changes, catch the regression the day it appears — applies directly to any analytical platform you run alongside it.