Monday, September 29, 2008

Falcon in London

[I'm still playing catch-up after a five-month blogging hiatus]

By May of 2008, the Falcon team's technical center of gravity lay squarely within the European continent, so it was only fair and sensible that we gather where the combined travel effort would be at a minimum.

This was to be the first assemblage of the entire team, including new team members. We were also promised (and later cruelly denied) a rare sighting of Hakan K. who was unable to attend the all-hands staff meeting in January or the UC in April, as well as appearances by MC Brown and James Day.

MC did an outstanding job as a one-man advance team, deftly applying his hitherto unknown logistical genius to the task of arranging for accommodations at a reasonably-priced, centrally-located, well-appointed and distinctly British hotel..."Holiday Inn" I think was the name.

Big News

We started the week in London on an interesting note: Jim announced that he was stepping down from the Falcon project in order to pursue his cloud database idea. This wasn't a complete shock, because Jim had been itching to explore his cloud database concept for some time, but it was still big news.

The timing of the announcement was pretty good, because Falcon was nearly feature complete and the team had gained enough expertise to carry on. We were also relieved to find that Ann Harrison (Jim's wife) intended to remain with Falcon, because she has a holographic knowledge of much of the architecture and would provide a reliable back channel of communication to Jim--reliable because they share the same office and that she can, and does, holler questions to him over the partition.

We also had a full-time project manager and three new engineers, so Falcon was clearly in capable hands. The only remaining issue was whether Jim could arrange a deal with Sun to continue assisting the project (which has since been worked out, so not much has changed on a day-to-day basis.)

Falcon Immersion

A key item on the agenda was Falcon training for the new team members. Jim provided an excellent, in-depth review of the salient aspects of the Falcon design, reinforced by line-by-line walkthroughs of certain core operations.

I took notes which will ultimately make their way into Falcon design documents and even the code itself (the stringent use of source code comments within Falcon warrants further discussion, as does the overall coding style. I will address this in a future post.)

It was a long week of long days, and we covered a considerable amount of material (I will also cover some of these in future posts):

  • Record insert process, from StorageInterface to data page
  • Serial Log record formats
  • Page Cache design and areas for optimization
  • Internal SQL engine
  • Low memory operations and record cache maintenance
  • Falcon on-disk structure
  • Falcon indexes, design and operation
  • Page types: Data, Index, Inventory, Overflow, Blobs
  • Engine initialization sequence
  • Falcon handler and the MySQL Storage Interface API
  • Overview of the Falcon Storage classes (StorageConnection, StorageTable, etc.)
  • Synchronization objects
  • Replication: Statement-based vs Row-based
Statement-based Replication

So let's explore that last item a bit.

You know how it goes: Someone asks an innocent question in a room full of people, and before you know it, an hour of unintended discussion ensues. Such was the case with Falcon and replication.

The question was: Why doesn't Falcon support statement-based replication? The complete answer is complicated and worthy of a separate post, so I will only summarize here.

Falcon supports row-based replication (RBR), where every row event on the master lands in the binlog and is subsequently pulled in by the slaves. One advantage of RBR is also it's greatest disadvantage, that is, everything is logged, which is fine except for unqualified statements like "DELETE FROM T1", or for operations that are later rolled back.

On the other hand, statement-based replication does not handle non-deterministic or concurrent operations very well, and must therefore rely upon the assumption that two statements won't have any interaction. Ann puts this very succinctly:

"Falcon does not and will not support statement based logging because Falcon transactions are not serializable, and in the absence of serializable transactions, statement based logging does not produce consistent results."

As I understand, InnoDB gets around the problem by locking record touched by subqueries, obviously at the expense of concurrency. For Falcon to support SBR, the same would also hold true, requiring several expensive changes that would also affect performance:

  • Query expressions that drive data changes must be locking
  • Autoincrement must be serialized
  • Implement next-key and end-of-table locks
  • Other stuff

This is not the complete list of changes that we'd have to make to support SBR, nor does it take into account potential enhancements to MySQL replication, but captures the essence of the issue.

(My interest in replication stems from a two-year stint on the replication project at Pervasive Software, however, that was a peer-to-peer implementation employing a change-capture mechanism rather than binlogs--a fundamentally different approach.)

Falcon Schedule

[Disclaimer: I cannot disclose product release dates. Sorry.]

Release planning is a black art, subject to the hazards of miscalculation, fuzzy estimations and plain old bad luck. Our biggest question at the London meeting was how to coordinate the performance optimization effort with that of fixing stability and managing the bug load. Assigning a block of time to any of one of these tasks, much less all three, is not only difficult but outright dangerous if not done carefully. But it had to be done, and so we did.

Assigning priorities is a classic problem in software development, but the textbook approach doesn't always apply. For example, performance optimization should be done late in the project, right? Ok, but what if performance is a major feature, or what if the performance in some cases, is really bad? Isn't that a bug? And what if the performance fix requires a change to a core component, such as the synchronization model or caching algorithm? Wouldn't you want those sort of changes in earlier rather than later?

I suppose with well-defined release criteria and well-defined performance metrics and well-defined defect criteria and well-defined quality metrics, the answers to such questions might be readily apparent, but alas, ours is not the tidy, frictionless universe of textbook theory. No, ours is a messy, non-Newtonian hodgepodge of money, marketing and mission creep, an existential riot fueled by personality, tempered by perception and begging for order in an unordered cosmos.

But I digress.

Actually, we really do have well-defined release criteria and even some pretty good performance criteria, although "good performance" has been something of a moving target for Falcon. The biggest unknown was the time required to bring the bug count under control.

[Disclaimer: I am wholly unqualified to comment upon MySQL 5.1]

After considerable and sometimes heated debate, we did manage to hammer out a shaky consensus on a GA date, saddled with preconditions and "only ifs" as it was. The Falcon schedule has since been revised, and last week in Riga it was thoroughly rehashed again within the context of the Server 6.0 release schedule.

And that's all I have to say about that.

Pilgrimage

The stark reality of the Falcon team meetings is that we spend 80% of the time sitting around a conference table screaming at each other.

No, that's not true. Let me try again.

The reality of our meetings is that we spend a long week of long days in a conference room digging into technical stuff. Despite the travel overhead, I find these meetings to be extremely productive and personally energizing.

In London, there were only two things I absolutely had to do: Visit the British Museum, and, well, see for yourself.

I won't bore you with museum photos, but I did manage to I rope Kevin, Vlad and Vlad's wife, Tatyana, to make the pilgrimage with me to Apple Studios. Tatyana was kind enough to shoot three of us crossing the zebra crossing. Unfortunately, some bearded dude jumped in front of us while we crossed.

He didn't say much. I think he was French.

Saturday, September 27, 2008

Troublemakers, Part I

Philip Stoev, Software Engineer

The mission of the System QA team is to beat the living daylights out of a MySQL release
before it is set free into the world. Philip Stoev is on the System QA team, not the Falcon team, but bless his Bulgarian heart, he's caused more trouble on Falcon than anyone in recent memory except for perhaps Peter Z.

The "trouble" of which I speak is best illustrated by this vignette: Imagine that HMS Falcon is on a shakedown cruise in the Mediterranean. Weather is clear, crew is happy, course is plotted. Midshipman Stoev voices a concern.


"I beg your pardon, Captain, but I believe the ship has run aground."

"Nonsense, Mr. Stoev. The ship is quite sound. She floats upright, the sails are full and the crew is happy with rum and song.
"

"Yes, of course, Captain, but if you'll please have a look belowdecks, sir. The bilges are awash, I'm afraid, and the keel is broken."

"Nonsense, Mr. Stoev, you imagine the worst. Now, hand me my spyglass, if you please, and go on about your duties."

"As you wish, sir."

Get the picture?

Philip does his job quite well such that he can, with the power of a two-line email, stop an entire release dead in its tracks. Last year, this caused tremendous frustration because System QA didn't begin testing until after a release clone-off, which is perilously late in the release cycle. One or two showstopper bugs can, and did, gum up the works for weeks, even months.

Since then we've fixed the process, and now System QA tests are run regularly against pre-release code. There even exists an array of Pushbuild2 servers dedicated solely to System QA stress tests, so each push into the tree results in a battery of stress test executions.

"Executions", indeed. These unassuming, cold-hearted tests, cobbled together from Perl and PHP, efficiently render the very fabric of our precious little engine into hot, screaming shards of digital shrapnel.

Case in point: falcon_online_alter.
This little gem is designed to "exercise" Falcon's online alter capability in the same sense that having your shirt pulled over your head and being kicked down the stairs is "exercise". If I sound bitter, it's only because I've been chasing stress test bugs for the better part of two months.

Let's take an example:

ALTER TABLE t1 ADD INDEX i1 (s1), ADD INDEX i2 (s2);

In MySQL 6.0, "ONLINE" is implicit. Prior to the ONLINE ALTER implementation in Falcon, ADD INDEX worked like this:

1. Get shared lock on table T1

2. Create temporary table #sql-abc with new attribute (e.g. index)
3. Copy records from T1 to #sql-abc (index created during copy)
4. Get exclusive lock on T1
5. Rename T1 to #sql-xyz
6. Rename #sql-abc to T1
7. Drop #sql-xyz

This is a horrendously inefficient way to create an index, so Falcon now creates them online such that a new index is created and populated in place--no temporary tables, no copying, etc. (the same is true with ADD COLUMN, although it's temporarily disabled due to a bug.)


So, with online alter, ADD INDEX works like this:

1. Server queries Falcon to see if it can do the operation online
2. Falcon says, "Yeah, sure, go for it."
3. Server commands online alter
4. Falcon creates new index
5. Falcon populates new index

Falcon briefly locks the table during the DDL phase, but then normal operations resume, even while the index is populated. Pretty cool, right?

Well, sure, except when Philip runs his random query generator that issues thousands of random ALTER operations emanating from many client threads. This is a wholly unnatural act--I mean, what sane application performs thousands of ALTERs from multiple clients? It's nuts, just nuts.

But, and this is a big but, this test finds stuff. All sorts of stuff. Imagine putting a beautiful, Ford 4.6L 3-valve SOHC up on an engine test stand, bolt it down, then run it at 8,000 RPM until it flies apart. Automotive engineers actually do this, so why not software engineers? We need to know where the weak spots are, and, ideally, fix them. And so we do.

Chill/Thaw

Philip also has a chill/thaw stress test, falcon_chill_thaw. This, too, is pure, resident evil, and I mean this in the kindest way.

The Falcon "chill/thaw" mechanism works like this: Each record in the record cache consists of a "system" part and a "data" part. When the total amount of record data associated with an active transaction exceeds a predefined threshold, Falcon "chills" all of the records in the transaction by writing the data portion of each record to the serial log and freeing it from the record cache. When necessary, Falcon "thaws" individual records by restoring their data portions from the serial log.


By default, the falcon_record_chill_threshold is 5MB, which means that when the size of a transaction exceeds 5MB, Falcon moves the record data for the transaction into the serial log. For the falcon_chill_thaw stress tests, Philip cranks the record chill threshold down to 1 byte, which means every record for every transaction is chilled and thawed countless times.

Again, this is a wholly unnatural act, no doubt illegal in the state (not country) of Georgia, and, in fact, I believe we are going to disallow chill thresholds lower than 32K. However, this test has exposed flaws in the chill/thaw code path that ordinarily would never see the light of day.

I have in a logfile a beautiful example of three threads trying to thaw the same record. Falcon is designed to resolve this intrinsic race condition with a compare-and-swap of the data pointer in Record::thaw(). However, the code path leading to the CAS is not (yet) properly reentrant, so we get all sorts of interesting behavior as a consequence of multiple concurrent thaws.

On a live system, this would likely manifest in seemingly random, untraceable errors.
So, it's better to find this stuff now, but it is necessarily time-consuming.

The simple fact remains, however, that Philip is responsible for exposing some serious synchronization issues within Falcon, and for that I am grateful.

I think.

New Falcon Engineers

Apart from a new project manager, the Sun acquisition also gifted our beloved little project with three Sun engineers from DBTG. I will spare the reader personal chagrin and preemptively counter the reflexive incantation of Brooke's Law by stating up front that, in the case of Falcon, you would be wrong. "Mythical Man-Month" my ass, we needed the help.

Naturally, we don't want just anyone hacking Falcon (yes, we're open source, but we have to ship for heaven's sake), and so we didn't get just anyone, we got three top-notch engineers each of whom bring something to the table.

(Free idea: "Heaven's Sake", a tangy Japanese alcoholic beverage made from fermented rice, available at select wine shops near you.)

Olav Sandstaa, Senior Software Engineer

Here is Olav's official bio: "Olav has developed database systems for the last eight years. He worked on the internode communication system, the database kernel as well as performance optimization of HADB. Lately, he has worked as team lead for the performance team for Java DB. He has also made key contributions to design, architecture and technology evaluations for Project Cloudberry. "

"Olav has a master's degree on the topic of parallel execution of relation algebra database operations and a PhD in storage systems for digital video archives from the database research group at the Norwegian University of Science and Technology."

What's this actually means is that Olav--sorry, Doctor Olav--is smarter than you. And me. What this als
o means is that he is really good at Scrabble, Norwegian Database Edition.

The cool thing about Olav is that he doesn't wear his impressive credentials on his sleeve, though he does have "Database God" tattooed behind his left ear (not seen in photo.)

Last week in Riga, I asked Olav what HADB was. He said, "It's like MySQL Cluster, except better." Ouch.


John Embretsen, Software Engineer

Again, I will punt and provide John's official bio:

"John joined Sun in 2005, after graduating from the Norwegian Unive
rsity of Science and Technology. He has been working on Java DB QE/QA, including functional, long-running, usability and compliance testing. He recently gained committer privileges in the Apache Derby community, and have lately been working on adding JMX management and monitoring features to Java DB."

John's official bio fails to mention that he is also the tallest member of the Falcon team (sorry, Kevin), and has an impressive record playing Norwegian basketball, which in the U.S. we call "basketball". At right is a photo of John in a typical just-got-
back-from-rigorous-mountain-hiking-and-now-I-will-sing pose.

Fortunately for us, John accepted a role in Falcon QA, where help was desperately needed.
One of his first tasks was to dig through the suite of testcases relegated to a vile little backwater we call "falcon_team". These are tests that fail for no apparent reason on the pushbuild regression servers, so to keep the matrix green we simply move uncooperative tests out of the way so they can be dealt with later. Well, later is now, because "falcon_team" is our mutant cousin thumping around in the attic, and it's time to walk up those stairs.

Which brings to mind one test that John insists on holding up to the col
d light of day, a test that previously failed intermittently with odd warnings and an error. He now reports that after I implemented the ONLINE ALTER features in Falcon, this test still fails but the row numbers are off by one.

This seemingly innocuous observation carries little meaning to the untrained eye, however, a set of discrete database operations resulting in row numbers being off by one is like finding an extra proton in an atom--you can't simply brush such a thing aside. The magnitude of the discrepancy is small, but the implication of the discrepancy is immense. I am, perhaps, being a bit hyperbolic (hydrochloric?), but the essential fact is that this means more digging into a feature should've been wrapped up weeks ago. Thanks John. Love ya, bro.


Lars-Erik Bjørk, Software Engineer


"Lars-Erik joined Sun in July 2006, shortly after graduating from the Norwegian University of Science and Technology. He has been working 18 months on HADB kernel development and 4 months on PostgreSQL."

Ok, that's pretty cool: HADB kernel development and PostgreSQL experience. Some of you may also recall that Lars-Erik had a previous career in the music industry as the lead vocalist in the group "A-ha", which found brief popularity in the 80's.

Following the dizzying spike of international adoration garnered from the gr
ammatically curious, "Take On Me", Lars-Erik left the band to pursue a career in software engineering. Astute readers will note that Lars-Erik remains physically unchanged from his pop music days, which he attributes to a superior genetic makeup, far-infrared saunas, rock-filtered Norwegian spring water and volcanic mineral baths.

Since joining Falcon in May, Lars-Erik has been working on a variety of bugs. Most recently, (in fact, so recent that he doesn't know it yet) he's working on a couple of chill/thaw bugs and another with the Falcon interface to the Information Schema. (Lars-Erik, if you're reading this, talk t
o Kevin.)



Stalled Thread

Strong start. Zero follow-up. Let's get this going again, shall we?
It's been a busy five months since the last post, so here's an overview of what we've been up to:

May: New Falcon Engineers
Synergy happens. Details in the next post.

May: Falcon Meeting in London
Big news from Jim. Training for the new folks. Details to follow.

June: Falcon 6.0.5 Alpha
Team is busy with bug fixes trying to keep the release train rolling.

July: Falcon Meeting in Boston
Got the band back together, this time on our turf.

August: Falcon 6.0.6 Alpha
Another busy month of bug fixing. Feature complete, finally. Some performance stuff, too.

September: MySQL/Sun Meeting in Riga
All-hands engineering meeting, very productive for Falcon. Details to follow.