A Surprisingly Small Change
In the very early days of megabus, with the database server under heavy load, and the primary bottleneck, I’d cobbled together a quick mirroring system to buy us some time. I didn’t think it would actually work, but all it did was mirror every write to a second database server.
I knew that it was temporary - a sticking plaster which wasn’t that sticky. For the most part, the worst issue was some of the IDs would be reversed in the follower. At least, that was the worst, until we started getting overbookings.
It happened under specific circumstances - high load, busy routes.
The year was 2004, and PostgreSQL was around 7.3. I had built megabus.com and the database server was the main choke point. I was the only person working on building it and I needed replication.
PostgreSQL did not support replication at this time and there were no clear solutions on the market. pgpool was out there, but it was a personal project and there was a server limit of 2 — we definitely needed more. Slony was not quite released yet, and it looked like a lot of operational overhead. I really wanted something simpler.
I scoured the web, looking for a simpler solution until I stumbled across dbmirror.
The project contains a trigger written in C, a mirroring Perl script
and some setup SQL along with documentation.
The key features the system has are:
-Support for multiple slaves.
-Preserves transactions while mirroring.
-Per table selection of what tables get mirrored.\
source, but it ticked some boxes for me — particularly the multiple slaves and preservation of transactions.
My first step was to use the product as is, and it was a perfect fit for disaster recovery. I built and deployed the C module, and the perl () script for offsite backup.
The C module would intercept every write operation and write a table which captured the change data. The perl script would then replicate these changes, in my case to another database server across the internet — in the office. There would sometimes be a delay of a few minutes before everything was replicated, but that was ok.
The next step was to figure out how to make this work in a multi-master style. There was of course a fundamental problem. DBMirror, the perl script, ran separately and replicated the data “at leisure.” I needed each transaction to be replicated immediately after write. Otherwise, subsequent queries would give stale data.
I spent a bit of time understanding the code, but was at a loss as to how to bridge that gap. With few options left, I reached out to the author Steven Singer. He turned out to be really friendly and helpful.
The main problem with multi-master was that PostgreSQL at the time did not support two-phase commits.
There is no ‘ON COMMIT’ trigger in postgresql and even if there was, if you commit the transaction on the remote system first and the commit on the local system fails later, you can’t rollback the remote system. Alternatively if you commit the local system first and the remote fails, you’re out of luck. You need 2-phase commit for this.
I didn’t need that level of correctness. It was more important that I get a working, reliable system than one that was fully correct.
He also suggested PGCluster, but that was a fork of PostgreSQL focused more on HA than read-scaling. He also pointed out that it was not possible to replicate within a transaction — due to the isolation of ACID.
The final suggestion was to trim the perl script down to replicate a single transaction, then call it on commit.
I now had enough detail to solve the problem. I understood when and how the replication data was generated, and what I needed to do to replicate it.
Shelling out from PHP to run a perl script felt clunky at best, and highly unreliable at worst. I already had a DB layer, and what I could do was add the DBMirror.pl logic in there. What I needed to do was keep transactions intact, executed against the primary, then on commit, replicate those changes to the other servers.
All the changes went into my DB layer. It was a surprisingly small change, for a mighty difference.
With the new system, all the writes went to a primary server, and the reads were weighted across all the servers. Under load, no reads were routed to the primary — it needed all the juice just to write and replicate.
When this new system went live, I still remember the sense of relief. The previous write-mirroring system had kept me up at night - it was scary! This system felt stable and reliable, and there would never be another overbooking attributed to the replication.
This system, which was cobbled together, supported megabus.com from 2004 till 2010, at which point it switched over to the Java EE ticketing system.
PostgreSQL released native streaming replication in version 9.0, in 2010.
