Killer Code

What we code and how we code can have serious consequences

It came without warning. Hundreds of megatons of thermonuclear warheads were unleashed to catastrophic consequences. In a flash, several world capitals became a molten heap of rubble. It was only through heroic efforts and brave decision making that the missile barrage was halted.

It had taken a few weeks after that fateful day till they found the flaw. A poorly constructed function, just a few lines of code, enabled a cascade of errors that triggered the early-warning system to register multiple false alarms. The AI-enabled War Operation Plan Response system went into a full defense mode and launched a counter-strike.

Of course, this never happened. There have been instances though that brought the world close to full-scale nuclear war. One such incident happened on September 26, 1983 when a Soviet Union early detection system registered a US missile attack. A level-headed lieutenant colonel, Stanislav Petrov, realized the data made no sense and held off on launching a counter strike:

“It was subsequently determined that the false alarms were caused by a rare alignment of sunlight on high-altitude clouds and the satellites’ Molniya orbits, an error later corrected by cross-referencing a geostationary satellite.”

Sometimes disaster cannot be avoided. Within five months, two Boeing 737 MAX jets crashed, killing everyone on board. All 737 MAX jets were subsequently grounded and an investigation launched. The culprit? The Maneuvering Characteristics Augmentation System software, otherwise known as MCAS.

To be fair, many decisions beyond the software contributed to the catastrophe. Using the same base plane with larger engines to save cost and speed delivery caused a whole series of compromises resulting in the need to create MCAS, which was a poor band-aid to solve more systematic issues.

As the investigation has progressed, other flaws have been discovered and questionable practices uncovered. One issue that has gained attention is the use of low-cost outsourcing:

“It was controversial because it was far less efficient than Boeing engineers just writing the code…it took many rounds going back and forth because the code was not done correctly.”

While outsourcers were not connected to MCAS per se, they did touch significant portions of the hundreds of millions of lines of code used to fly the plane. On paper, the costs of outsourcing made economic sense, but the added time training, communicating, and correcting mistakes contributed to cost and deadline overruns in other programs such as the 787 Dreamliner.

The pressure to compete with Airbus however pushed aside any concerns about outsourcing. Said one former Boeing engineer:

“Boeing was doing all kinds of things, everything you can imagine, to reduce cost, including moving work from Puget Sound, because we’d become very expensive here.”

Boeing business heads viewed the development work as simply of the “maintain” variety:

“Rabin, the former software engineer, recalled one manager saying at an all-hands meeting that Boeing didn’t need senior engineers because its products were mature.”

In other words, engineering was viewed as a commodity. In that light, outsourcing made a ton of sense as the maintenance or tweaking of code only requires junior skills. This is the typical business point of view of software development, it’s just a time and cost equation.

In an infamous rant on Agile and Scrum practices, Michael O. Church shared his thoughts on business-driven engineering. In his view, it is neither any good at producing quality software or happy developers. He contrasts that with engineering-led development:

“When engineers run engineering and set priorities, everyone wins: engineers are happier with the work they’re assigned (or, better yet, self-assigning) and the business is getting a much higher quality of engineering.”

When you create work environments that devalue software development, the result is lower productivity and worse code quality. It is not just the heavy use of outsourcing either. Open office spaces, abuse of Agile, lack of recognition, unrealistic deadlines, and other practices contribute to low morale engineering cultures.

Do you think software is going to be more or less complex over the next decade? The number of questions and answers in Stack Overflow continues to grow steadily. GitHub now has over 100 million repositories, up from less than 100,000 ten years ago. Everyone company under the sun is publishing API’s. There is currently 33 zettabytes of data in the world and 2.5 quintillion bytes of data generated every day. We are awash in complexity.

Number of repositories in GitHub is rapidly increasing

The reason for this growth is the insatiable desire for digital services, first in the consumer space and now in the corporate world. Technology is at the core of every new business initiative and opportunity. Even with this realization, organizations still treat the people that write software with little regard. Code is like any widget or cog you buy per line.

Unfortunately, the consequences of weak engineering culture, outsourced work, and broken software delivery practices has real consequences. Even in the best software organizations, fatal bugs can crop up as it did with Uber’s self-driving car technology or when cancer patients received lethal doses of radiation therapy. A sobering stat from the Sustainable Computing Consortium says that there are as many as 20 to 30 bugs per 1,000 lines of code.

The irony is that writing software has never seemed easier. With abstractions, you can write a fairly complex application in hours using little more than API’s and a UI framework. But none of these abstractions is perfect. Joel Spolsky called them “leaky abstractions”, and can often fail in ways that we cannot predict.

Every company that cares about delivering quality digital products needs to establish an engineering culture where development is led by developers. Joel suggests creating a “development abstraction layer” that allows teams to focus on creating high quality software. Traditional command-and-control structures do not do this. Agile and Scrum teams do not do this. The whole business-driven engineering movement in enterprises does nothing other than produce poor quality code at faster intervals.

How would you characterize your organization’s development team? What do you think could be done to foster a supportive engineering-driven culture?

Is my background sufficient to start Quantum Computing?

I have been considering a career change…

We help IT leaders in enterprises solve the cultural challenges involved in digital transformation and move towards a community based culture that delivers innovation and customer value faster. Learn more about our work here.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s