If you've ever built something slightly bigger than a todo app, you've probably hit that moment where two different services need to agree on something together. And of course, they decide to disagree at the worst possible time.
Imagine you're running a tiny online store: one service handles payments, another handles inventory. Someone buys a laptop. Payment succeeds… but inventory fails… and now you've charged the customer for a ghost laptop that doesn't exist. Not ideal. Your support inbox will cry.
That messy moment, when a multi-step operation across different systems needs to act like one operation, is exactly why distributed transactions exist.
So what's the actual problem here?
When everything is inside the same database, life is simple: wrap it in a transaction, hit commit, and the DB ensures things happen atomically. Either everything succeeds, or everything rolls back.
But once you split your system into multiple services or multiple databases, there's no single magical "transaction manager" holding everything together. Every service is living its own life, responding at its own speed, failing in its own creative ways.
The core nightmare is this:
- What if one part of a distributed operation succeeds and another fails?
- How do you avoid half-done work leaking into the real world?
- You want everything to succeed together, or fail together, even though everyone is running separately.
So how do we fix it?
Well, you can either:
- Give up and shove everything back into one database (tempting, honestly), OR
- Build a protocol where all the participants can agree before committing.
Enter Two-Phase Commit (2PC) - the classic "let's take a vote" approach to distributed transactions.
2PC: The Group Project of Distributed Systems
If you've ever survived a college group project, you already understand 2PC emotionally.
There's a coordinator (your over-organized friend) and participants (everyone else pretending to care). The coordinator asks everyone if they're ready before turning in the project.
It works like this:
Phase 1: "Can you do this?" - the Prepare phase
The coordinator pings every participant:
Hey, can you commit your part? Don't do it yet - just say yes or no.
Each participant checks everything internally:
- "Do I have the required data?"
- "Is my local transaction valid?"
- "Am I sure I won't blow up in the next 10 seconds?"
If yes, they reply "I'm prepared."
If not, they reply "Nope, I'm out."
Importantly: When a participant says "I'm prepared," it locks the resources it needs. It's basically saying: "I'm ready and holding everything-just waiting for your signal."
Phase 2: "Alright, go!" - the Commit phase
If everyone voted "yes," the coordinator broadcasts:
Cool, let's commit.
All participants finalize their local transactions. The world continues in harmony.
If anyone voted "no," the coordinator says:
Abort mission.
Everyone rolls back.
Simple idea. Elegant. Old-school. And still used in many systems.
But… 2PC isn't perfect
It works, but it's also a little fragile - like that one friend who shuts down when plans change.
The biggest problem? The coordinator.
If it dies at the wrong time, participants might get stuck holding locks, unsure whether to commit or abort. They can't guess, because guessing leads to inconsistent data - the exact thing we're avoiding.
This is why you'll hear people say things like:
- 2PC can block forever.
- 2PC adds latency.
- 2PC is sensitive to network hiccups.
- 2PC scales… meh.
It's reliable, but it's also strict, slow-ish, and not great at handling failure gracefully.
So why do people still use it?
Because sometimes you need strict atomicity across systems, and you don't mind the overhead.
Databases like Postgres, MySQL, and many transaction managers support it. Some message brokers use variations of it. It's one of those "boring but solid" protocols that gets the job done when consistency really matters.
It's like an old, sturdy Toyota Corolla of distributed systems - not fancy, not the most modern, but dependable.
TL;DR - the human version
Distributed transactions exist because once your data is spread across multiple services, nothing automatically behaves atomically anymore. 2PC tries to fix this by asking everyone to agree before committing.
Phase 1: "Are you ready?"
Phase 2: "Okay, do it."
It works. It's safe. But it can block and doesn't love failure scenarios.
Modern systems often use alternatives like idempotent operations, outbox patterns (i'm using this in a video transcoder system), etc., especially in microservices - but 2PC is still very much part of the distributed systems toolbox.
