February 14th, 2008
Releasing software is a black art. It takes a little luck and a little magic. Doing it on time is a major headache.
“Any day now, any day now,
I shall be released.” - Bob Dylan, Before the Flood
To quote Orson Wells in a famous advertising campaign for Paul Masson, “We will sell no wine before it’s time.”
Software doesn’t work that way. While there are perfectionists in every walk of life, wrinkling out the very last .01% of bugs in any software will take 99.99% of its total development time. It’s the modern software industry’s dirty little secret: all software has bugs. It’s a constant war between product and time. You either have 1) a fixed deadline with a not ready for prime-time software product or 2) a slipped deadline with a slightly less than not-ready-for-primetime product. The battle’s always the same, but a good software operations manager knows when to throw in the towel and wait for the next round. As Arthur Dent says in Hitchhiker’s Guide to the Galaxy, “Always know where your towel is.”
In fact everything I learned about production software releases can probably be traced back to the Hitchhiker’s Guide. More specifically, the 3rd book in the series, “Life, the Universe, and Everything” (New York: Harmony Books, 1982, Douglas Adams). The most relevant lesson is Bistromathematics. It can be summed up this way:
- Numbers written on restaurant bills within the confines of restaurants do not follow the same mathematical laws as numbers written on any other pieces of paper in any other parts of the Universe.
- The amount owed on the check is never the same amount as the sum of the amounts owed as calculated by the individuals.
At the end of the meal, there’s bargainings, discussions, negotiations, written calculations, rationalizations, justifications, and prognostications. In the micro-seconds before the waiter approaches the table to collect the bill, all of the numbers and participants magically reconcile themselves without anyone having to run to the ATM, run to the cash register to break a $20, or feel unfairly imposed upon by having to contribute an extra buck or two.
Our last software release was like that. We had a fixed deadline, multiple streams of development going on, multiple sub-projects, bugs, fixes, personnel changes, late checkins, and no clear view of how we were ever going to pull it all together in time. Just when we thought it was going to be infinitely improbable we were going to finish on time–poof–something happened. We ran out of bugs to fix and everything magically fell into place.
“It’s done.”
“It’s done?”
“It’s done.”
“Wow. It’s done.”
In the last release of our software, we had 42 release notes. Coincidentally, “42″ is also the answer in the Hitchhiker’s Guide about Life, the Universe, and Everything. Somehow when we we weren’t paying attention, ripping out the guts of the old platform and replacing it, refactoring the code, optimizing the performance and adding new features they all just disappeared.
We all stood back, sighed a breath of relief and marvelled at the magic.
Everyone else on the other hand said, “Oh, you just got lucky on that release. Let’s see you do that again with the next one!”
Little luck. Little magic.
Greg
~ : ~
January 29th, 2008
A couple months ago, the product guys at my company decided to use Dell as our hardware platform and custom factory integration partner. (“Dude! You’re getting a Dell!”) It turns out that the Google enterprise search yellow-box is produced by the very same people. Also Dell seems to have shipped a couple of tens of millions of them through this process which tends to wrinkle out a lot of kinks.

The first thing that popped into the rest of the company’s collective noodles was questions and concerns about product definitions, platform configurations, worldwide support, parts replacement, and steps for factory hardware and software installation. The first thought that popped into software engineering’s collective minds was, “Can we get discounts on computers?
It turns out we can. A deep philosophical debate then ensued. XPS 720 versus 720HC (factory overclocked and hybrid, liquid radiator thermoelectric cooling and control circuitry). Vista 32 versus Vista x64. Core2, X2 or Quad? ATI versus Nvidia, Radeon versus Geforce, FireGL versus Quadro? Nobody discussed the prices.
After cleaning off all the drool off the monitors, everyone on the engineering staff came to the same conclusion on one thing: the new Dell Crystal 22″ HD flat panel monitors are the coolest. If you haven’t seen one of them, for about $1,200 undiscounted you can get on a list to hopefully backorder one sometime in your lifetime. Deep blacks, glossy technicolor hues, floating screen inside elegant metal and crystal smoked glass, webcam, speakers, and bragging rights.
In case there was any doubt why someone would pay $1,200 for a 1680×1050 22-inch display, I’ve included Dell’s own banner below.

For goodness sakes, the thing even took home the Best of CES Innovations Award for 2008. You’ll be hard pressed to find anyone at our company that would disagree with that.
Our CFO and VP of Finance did.
Apparently it’s a luxury far beyond the reach of mere VP’s. Not to be deterred, I quickly hatched a plan. I convinced our glorious CEO that this was a CEO monitor. All the CEOs were getting them this year. I told him it looks really bad when some big important customer comes into his little CEO office, sits across his little CEO desk from him, and can’t see through the edges of his monitor, has to listen to his Webcast from lame speakers cluttering up his “productive area”, can’t take pictures or video from his 2-megapixel Webcam of the customer promising to buy our software, or can’t view our online Website and brochures in stunning dark blacks, sharp images, crisp text, and brilliant color saturation with life-like detail.
He told me he’d think about it, so I bribed his administrative assistant with chocolate, made her fill out the purchase order, and found my asbestos coat for the flaming I’ll get for rogue purchasing.
Now all I need to do is pray that Dell comes out with a 27″ Crystal flat panel monitor sometime soon and hope I get the hand-me-down.
~ : ~
January 25th, 2008
The world loves statistics. Especially stand-out, world beating ones. This is exceptionally true when it comes to cars and computers.
World’s fastest production car? Don’t try to nail it down. Things change pretty fast in that industry. The Guinness book of World Records has validated, revalidated, and is in the process of validating yet another record. On any given day, the title shifts. To ensure apples to apples comparisons, the records body follows a strict methodology. The procedure as outlined by Guinness involves putting a GPS tracking system on one of these cars, sending it out on a pre-determined course, and then having it turn around and drive in the opposite direction within one hour. Top speeds from each run are averaged to obtain the official speed record.
What are the stats?
- Shelby Supercars (SSC) has a world record run of 257 mph in speed testing of its 1183 horsepower, twin-turbo V8 Ultimate Aero TT as tracked by a Dewetron GPS system. (author note: Nice stats!)
This breaks the previous claims of Koenigsegg CCR at 242mph and Bugatti Veyron’s unofficial speed of 253mph. Not to be content, the Ultimate Aero has been tested in a wind tunnel of speeds up to 273mph while remaining aerodynamically stable.
Every once in a while, despite the statistics, there is an underdog that has the extra sizzle factor and promise of things yet to come that wins the hearts of the true aficionado. Mine’s the Veyron.

Unfortunately, as much as I’d like to spend all my time at the Bugatti factory driving these glorious machines, my day job is in the Security Information and Event Management space (SIEM, Gartner 5/2007) . In this position, I do, however, get the need for speed and the ability to do something about it with a crack engineering staff. In between daily operations, sometimes I daydream about world records.
To recap, the making of an interesting world record would need:
- Something to shoot for–like some published industry statistics
- Methodology–some way to compare apples to apples
- Someone to compare to–anyone want to compete for pink slips?
- Sizzle–you could have the world’s best record, but make sure it’s something that a customer would care about
At the end of 2007, performance statistics across all SIEM vendors for processing events-per-second (EPS), correlated-events-per-second (CEPS), and complex/Real-World correlated-events-per-second (CCEPS) on a single machine was:
- EPS: 20,000
- CEPS: 10,000
- CCEPS: 5,000
That’s not bad for a Volkswagen (author note: Bugatti is owned by VW) But speed addicts and enterprise customers need more. To be able to process more events, one option is to split the network and security event information onto multiple machines. With today’s blended security attacks, splitting out data geographically or organizationally can lead to a false sense of security.
For instance, a hotel chain or a fast-food franchise network with several thousand networked locations could easily fall into this trap. A multi-faceted attack could individually test over time the security-in-depth at hundreds of different points and not be detected without proper correlation. Rolling all of the attacks up at a later point in time could result in a very effective, damaging, and expensive attack. Instead of seeing the pattern of testing against their network defenses, the company would never even know what hit them.
For 1,000 locations, each location spitting out a modest 1,000 events per second (EPS), they would need approximately 50 machines just to log the events at 2007 rates. Even with 50 machines, the correlation among all the devices and data sources would not be in real-time. In order to do correlated events per second (CEPS), you would theoretically need the strength, speed, and intelligence of 100 machines. Still, there is the remaining problem of how to feed all that data into the same place so it can get properly correlated. That adds a whole new level of architectural complexity to your solution. The next step would be to add multiple tiers of systems which distill the raw information to the next tier (and the next one) until you finally can guarantee all the information coming in is properly analyzed and correlated with all the rest.
This turns out to be a very high bar to jump over. You can kiss your assets goodbye trying to do that in real-time.
The traditional solution is to throw more hardware at the problem. More horsepower, more cpu’s per box, etc. Using Moore’s law as a guideline, even if you could estimate a doubling of transistors on a wafer every 18 months would lead to a doubling of performance, that hotel or hamburger chain would have to wait about 10.5 years for the processing power to catch up.

There’s some hope. As faster hardware architectures come on board, there’s a trend to multi-core and multi-cpu models. A high end Dell PowerEdge box right now comes with dual-quad core Xeons (author note: that means 8 really fast ones to non computer geeks). If you add to that various specialized processors like network accelerators, encryption accelerators, pattern matching accelerators, disk performance and storage accelerators, you can start to stomp out a few of the artificial, hardware performance barriers.
At the end of the day, there are respectable gains, but software gains still remain unexploited.
We’ve decided to fundamentally break that model. Imagine a supercar, but instead of having a single 1,100 horsepower engine, you had 8 x 400 horsepower engines that you could fully exploit with up to another 1,000 x 10 horsepower specialized engines for each wheel. How you would selectively use that power would change dramatically. With a little coordination and a little more smarts, aka “software”, our combination of off the shelf and commodity computing parts changes how SIEM software works. Every little horsie is now a capability, available for negotiated sale or rent to whichever software service is in need of it most at the time. Believe me, security event management for large enterprises can gobble a lot of it and still be hungry.
Initial results with our new, shiny service-oriented software architecture (SOA) combined with our lateral-thinking hardware configuration have yielded extremely interesting results. Not only can we configure and add in capabilities into our SIEM on the fly, the performance has leaped off the curve. Our first pass shows 3-5 times the industry average performance on one machine. One special controlled test using real devices and data showed a 1,600 times speedup–that part of our software is definitely not going to be a bottleneck.
Instead of dreaming about the French countryside, rolling hills, open highways, and the roar of a supercar, we’ve been dreaming about how far we can push this new service oriented software architecture. Numbers of 1 million correlated events per second (CEPS) have been whispered around the hallways.
1 million correlated events per second would allow either of the aforementioned customers to fully correlate in real-time, very large numbers of events per second from any of their networked devices. For the first time, they would have a SIEM that could fully scale to the needs of their business –completely, defensively in depth, and end to end.
So gazing down the road for 2008, foot hovering over the accelerator, we have:
- A shot at 1M CEPS
- A way to benchmark how many things are thrown at our box
- Published performance numbers for the SIEM industry
- and, Any number of customers who have had to accept incomplete, real-time correlation across their whole enterprise
That sounds like a world record in the making to me.
Greg
~ : ~