Skip to main content
2025-08-0512 min read
Software Engineering

The Coming Death of Denormalization: How Differential Dataflow Will Save Us All

The Coming Death of Denormalization: How Differential Dataflow Will Save Us All

Or: How I Learned to Stop Worrying and Love the Incremental Computation
Friends, developers, database administrators: lend me your ears. I come not to praise denormalization, but to bury it. For too long have we suffered under its tyrannical reign. Too long have we manually maintained redundant data across tables, praying to the consistency gods that our application logic doesn't fail us. But lo, a new dawn approaches. The age of perfect, consistent materialized views is upon us, powered by the dark magic of Differential Dataflow.
May god have mercy on all of us.

The Original Sin: Why We Denormalized in the First Place

Let me paint you a picture of simpler times. You have a beautifully normalized database schema. Your computer science professor would weep with joy at its third normal form perfection. Users table here, Orders table there, Products over yonder. Everything in its place, and a place for everything.
Then reality hits you like a freight train.
Your seemingly innocent query needs data from four different tables:
sql
1SELECT
2 u.name,
3 u.email,
4 COUNT(o.id) as order_count,
5 SUM(p.price * oi.quantity) as total_spent,
6 AVG(p.rating) as avg_product_rating
7FROM users u
8JOIN orders o ON u.id = o.user_id
9JOIN order_items oi ON o.id = oi.order_id
10JOIN products p ON oi.product_id = p.id
11WHERE o.created_at > NOW() - INTERVAL '30 days'
12GROUP BY u.id, u.name, u.email
Your database coughs, sputters, and takes 30 seconds to return results. Your API timeout is 5 seconds. Your users are already closing the tab and tweeting about how your app is "literally unusable."
So what do you do? You make a deal with the devil. You create a user_stats table:
sql
1CREATE TABLE user_stats (
2 user_id INT PRIMARY KEY,
3 name VARCHAR(255),
4 email VARCHAR(255),
5 order_count INT,
6 total_spent DECIMAL(10,2),
7 avg_product_rating DECIMAL(3,2),
8 last_updated TIMESTAMP
9);
You denormalize. You duplicate data. You violate everything sacred about database design. But your queries are fast now, and that's all that matters.

The Curse of Manual Maintenance

But here's where the nightmare begins. You're now responsible for keeping this denormalized view in sync with reality. Every time someone places an order, you need to:
  1. Insert the order
  2. Insert the order items
  3. Update the user_stats table
  4. Pray that nothing fails in between
You write triggers. You write background jobs. You implement eventual consistency and tell yourself it's fine that the numbers are sometimes wrong for a few minutes. You add a "last_updated" timestamp and show a little spinner when the data might be stale.
Your codebase becomes littered with sync logic:
python
1def update_user_stats(user_id):
2 # Oh god, here we go again
3 stats = calculate_user_stats(user_id) # Does the 4 joins
4
5 # What if this fails?
6 db.execute("""
7 UPDATE user_stats
8 SET order_count = %s,
9 total_spent = %s,
10 avg_product_rating = %s,
11 last_updated = NOW()
12 WHERE user_id = %s
13 """, (stats['order_count'], stats['total_spent'],
14 stats['avg_product_rating'], user_id))
15
16 # What if the user placed another order while we were calculating?
17 # What if we're running this in parallel?
18 # What if... what if... what if...
You've traded query complexity for operational complexity. You've replaced slow reads with the constant anxiety of inconsistent data.

Your Soul for Sale

Query takes 30 seconds

Users Rage Quit

Panic Sets In

Beautiful 3NF Schema 😇

API Timeout 💀

Twitter: 'Literally Unusable'

The Devil's Bargain

CREATE TABLE user_stats 😈

Fast Queries ⚡

Data Duplication 👹

Sync Hell 🔥

Your Sanity 🪦

Developer's BrainStatsDBAPIUserDeveloper's BrainStatsDBAPIUser✅ Success✅ Success🔥 DEADLOCKPlace OrderINSERT orderINSERT order_itemsUPDATE user_statsWhat if it fails?What if parallel updates?What if calculation is stale?What if... what if...😱 3 AM PANIC ATTACKWhere's my order?🤷‍♂️ Eventually consistent™

Enter Differential Dataflow: The Savior We Don't Deserve

Differential Dataflow is here to absolve us of our sins. Created by Frank McSherry and others, it's a computational framework that can maintain complex views over changing data with perfect consistency and remarkable efficiency.
The key insight is deceptively simple: instead of recomputing everything when data changes, compute only what's different. But unlike your hacky incremental update scripts, Differential Dataflow does this correctly and automatically.
Here's the magic: you define your computation once, as if it were a batch computation over static data. Differential Dataflow figures out how to incrementally maintain the results as the input changes. It's like having a perfectly competent junior developer who never sleeps, never makes mistakes, and actually understands distributed systems.
rust
1// Define your view declaratively
2let user_stats = users
3 .join(&orders)
4 .join(&order_items)
5 .join(&products)
6 .reduce(|_key, vals, output| {
7 // Your aggregation logic here
8 // This looks like batch processing code
9 // But it runs incrementally!
10 });
When a new order comes in, Differential Dataflow:
  1. Figures out exactly which parts of the computation are affected
  2. Computes only the deltas (changes)
  3. Updates the materialized view with perfect consistency
  4. Does all of this faster than you can blink
No more manual sync logic. No more eventually consistent nightmares. No more explaining to your CEO why the dashboard numbers don't add up.

Differential Dataflow Magic

Order Placed

Δ = What Changed

Update = Old + Δ

Instant & Correct ✨

Complex Join?

Window Function?

Recursive CTE?

Traditional Hellscape

Order Placed

Recalculate EVERYTHING

30 second query

Update user_stats

Prayer to consistency gods 🙏

The False Prophets and the True Messiah

But wait, you say. Haven't we heard this song before? Didn't Materialize promise us salvation? Doesn't ClickHouse have materialized views?
Oh, sweet summer child. Let me tell you why those were but shadows on the cave wall, and why Feldera is the real fucking deal.

The Pretenders to the Throne

Yes, Materialize exists. Yes, ClickHouse can materialize some things. But here's the dirty secret: they could only handle the easy stuff. Got a simple aggregation? Sure. A basic join? Maybe. But that gnarly 7-way join with window functions, CTEs, and a subquery that would make a DBA weep? Good luck with that.
These tools were like bringing a butter knife to a gunfight. They'd materialize your toy examples beautifully, but the moment you tried to replace your actual production denormalization – you know, the stuff that's keeping your business running – they'd tap out faster than a freshman at their first frat party.
The result? You'd end up with two problems:
  1. Your original denormalization nightmare
  2. A half-implemented "modern" solution that only works for 20% of your use cases
⚠️
The Partial Solution Trap
The worst position to be in is having a "modern" solution that only handles simple cases. You end up maintaining both systems, doubling your complexity, and explaining to stakeholders why some views are real-time and others are "eventually consistent" (aka broken half the time).

Enter Feldera: The Chosen One

But lo, from the depths of academic computer science comes Feldera. And they didn't just bring promises and venture capital. They brought something far more powerful: a mathematical proof.
That's right. A proof. Not "it works pretty well in our benchmarks." Not "we support most common SQL patterns." A mathematical fucking proof that they can incrementally maintain ANY standard SQL query.
ANY. SQL. QUERY.
Let that sink in. That impossible aggregation with multiple GROUP BYs? Covered. That recursive CTE that calculates organizational hierarchies? Done. That window function monstrosity that even you don't fully understand anymore? Feldera's got your back.
And here's the kicker: it's not just theory. The madlads actually built it. Code that runs. In production. Today.

What They Promise vs What You Need

ClickHouse: ✅

Materialize: ✅

Everyone: 💀

Feldera: ✅
Mathematical Proof

Simple COUNT(*) queries

Demo Success

Basic JOIN on two tables

Your Actual Production Query:
7-way JOIN with
3 Window Functions
2 CTEs
GROUP BY ROLLUP
HAVING clause from hell

Back to Denormalization

🎉 ACTUALLY WORKS

Why This Changes Everything

The difference between "we can materialize some views" and "we can materialize ANY view" is the difference between a toy and a revolution. It's the difference between "nice demo" and "holy shit, we can actually kill denormalization."
With Feldera, you can finally:
  • Take that heinous denormalized table that powers your dashboard and replace it with a real-time materialized view
  • Maintain perfect consistency across ALL your derived data, not just the simple stuff
  • Delete 90% of your data synchronization code
  • Stop settling for eventual consistency. Sure, it's "acceptable" – in the same way that a root canal is "acceptable." But what if you could have hard consistency AND better performance? Why make the tradeoff when you don't have to?

The Inevitable Rise

Is Feldera everywhere yet? No. Rome wasn't built in a day, and the entire database industry won't transform overnight. But make no mistake: its rise is inevitable.
Why? Because math doesn't lie. When you have a provably correct solution to a problem that's plagued our industry for decades, adoption isn't a matter of "if" – it's a matter of "when."
The early adopters are already moving. They're deleting their sync jobs, burning their denormalization scripts, and sleeping soundly for the first time in years. Soon, the rest of the industry will follow. Not because it's trendy, but because it's correct.

The New World Order

Imagine a world where:
  • You can have arbitrarily complex views that update in real-time
  • Join performance is a solved problem because joins are pre-computed
  • You never have to choose between normalization and performance
  • Your materialized views are always, always consistent
  • You can sleep at night without dreaming about data synchronization bugs
This isn't science fiction. Companies like Materialize (built on Differential Dataflow) are making this a reality today. They're turning the impossible into the inevitable.

🔥 Before: The Code Genocide 🔥

sync_user_stats.py
500 lines

update_triggers.sql
300 lines

eventual_consistency_handler.js
800 lines

retry_logic.py
200 lines

data_reconciliation.sh
400 lines

please_work.py
1000 lines

3am_hotfix.sql
150 lines

🐛 Bug Reports

Total: 3,350 lines of sync hell

💡 After: The Solution

CREATE MATERIALIZED VIEW user_stats AS ...

✅ Done. Go home. Sleep well.

The Numbers That Matter

Differential Dataflow

Query Time: 5ms ⚡

Update Delay: <10ms ⚡

Consistency: ALWAYS ✅

Dev Sanity: Restored 😌

Bugs/Month: 0 🎯

Manual Denormalization

Query Time: 5ms ⚡

Sync Delay: 2-300s 🐌

Consistency: Eventually™ 🤷

Dev Sanity: 404 ❌

Bugs/Month: 12-15 🐛

The Death Rattle of Denormalization

So here we stand, at the precipice of a new era. Denormalization, that necessary evil we've carried for decades, is drawing its last breath. Its death won't be sudden – legacy systems will haunt us for years to come. But make no mistake: its days are numbered.
To the denormalized tables in production: we thank you for your service. You were a hack, but you were our hack. You kept our applications fast when we had no better options. But your time has come.
To the junior developers who will never know the pain of manual cache invalidation: count yourselves lucky. You're entering a world where materialized views just work, where consistency is guaranteed, and where you can focus on building features instead of maintaining data copies.
And to Differential Dataflow and its implementations: welcome. We've been waiting for you. Please, save us from ourselves.

Epilogue: A Prayer for the Future

As we stand on the brink of this new age, let us bow our heads in reverence:
O Differential Dataflow, harbinger of consistency,
Deliver us from denormalization.
Grant us the wisdom to recognize your power,
And the courage to abandon our hacky ways.
May our views be forever materialized,
Our queries forever fast,
And our data forever consistent.
In the name of the Delta, the Frontier, and the Holy Arrangement,
Amen.