Bright magenta, cyan, and dark brown diagonal brushstrokes on a black background

War rooms

Who needs theatre


Despite all the emphasis on being rational and logical, technology professionals love being overly dramatic.

All hands on deck in a shared space to solve a customer problem?

Let’s call it a war room.

Many revolutions around the Bhagwan Surya ago, I was a full stack developer at a robotics company.

You get to see a ton of interesting scenarios when your software interacts with the real world instead of digital spreadsheets.

Industrial spaces aren’t known to be on the cutting edge of web technologies. Joys of getting logs by sending someone physically on-site to pull them from the bare metal.

For example, we can do all the testing we want in the other building to make sure the trucks start braking on time to stop at the right place. But not a lot of prep you can do for if there’s a warehouse in a climate that’s humid enough to add some liquid to the floors. Whole lotta calculations from the hardware and vision teams to get that resolved.

This war room was a different customer.

This theatre did not have wet floors. Instead it had a different problem. A lot of trucks plus a complicated intersection.

Early morning message in the $l@ck. Someone on the team looks into it. Assumes it’s minor and tells them to reboot the box.

Default fix isn’t working. Trucks are still backing up.

More eyes start looking, and soon enough we’re all filling up the room we normally interviewed candidates in going line by line through the logs.

A significant portion of us had the traditional background, either computer science or electrical engineering.

The system’s architecture looked sound. Messages were being sent and read correctly. But no one was moving through the space.

So we started trying to poke apart the algorithm.

Enter our hero. One of the devs on the team had a Ph.D. in Math. Still an incredibly competent technologist, but they also had a complementary lens to view problems through.

As far as they could tell, the logic was sound. Both the architecture and the algorithm were fine. Working off two postulates instead of one.

They noticed something. Our database was dropping connections.

It’s been over half a decade now, so rusty on the fine details. But at a high level, Postgres could not keep up with the number. Too many were trying to be made, the system was not configured to support it, and would then block.

So from Django’s point of view, the database was down. Thus when the trucks sent a message over to the monolith asking for permission to enter, it got a negative response.

Something that sat in the config, not the code.

Skipping over all the details of how we actually shipped the fix.

Not saying every dev should go back to school for a doctorate.