Write the Docs: Brian L. Troutwine – Instrumentation as Living Documentation: Teaching Humans About Complex Systems

I’m at Write the Docs today in Portland and will be post­ing notes from ses­sions through­out the day. These are all posted right after a talk fin­ishes so they’re rough around the edges.

Brian kicked off the slate of talks after the morning break. He’s a software engineer who focuses on computer-to-computer programming. It’s not as much end-user software. He primarily works on real-time and distributed systems. A lot of this is for advertising companies and takes the form of real-time bidding on ads.

Advertising has become an engineering-driven industry. It’s lowered the cost of advertising and allowed people to spend as little as $10 a month for their business advertising. The nature of the problem in this world is it super low latency (less than 100ms per transaction) and global, highly concurrent nature. The complex systems to execute this have complex organizations around them. They’re tightly coupled to external systems and have non-linear feedback built-in to them.

Bad things happen when these complex systems fail. You’ve built them to solve wicked problems and when they fail they sometimes create even worse problems then they originally solved. Ultimately, “our ability to create large and complex systems fools us into believing that we’re also entitled to understand them.” To work toward that understanding we write documentation.

The issue is that complex systems are fiendishly difficult to communicate about. The gap of understanding is difficult to bridge in documentation. Any miscommunications around complex systems are accidents in the making. Documentation can help to reduce these accidents. It can help give everyone a greater shared understanding of the system. Without knowing how a system should behave you cannot really understand how it shouldn’t behave.

An issue with understanding through documentation is that the docs get out of date. Complex systems evolve and written words “rot” as the system moves on. Sometimes this falls on engineers for failing to update what they change. Or, engineers can be unaware of the system as it is actually used. This contributes to misunderstanding through docs.

Normal accidents happen as well. Every system has, intrinsic to itself, some failure. No matter how much we try we can never get rid of them, we can only choose how to react in the presence of such an accident.

How can we build better systems that distribute knowledge and better avoid accidents in real-time? Brian’s argument is for instrumentation. This reflects the reality of the system as it exists. Instrumentation allows the users and the engineers to explore the system as it actually exists. This exploration, done honestly, guides us to a new, better understanding of the system. Instrumentation also democratizes the organization around a complex system.

Troubleshooting complex systems can sometimes be limited by not having enough information. If you’re relying upon external services there’s a level of opaqueness that you cannot see through. But, instrumentation is not a panacea. It can also create an overload of information that floods your ability to make sense of the system. Too much information hampers interpretation.

On a darker note instrumentation can be used for undesirable purposes. Surveillance is a form of this. Complex organizations with excessive instrumentation can do evil things.

What we should do is write documentation with the instrumentation in mind. Procedure manuals and visualizations, for example, reduce the amount of complex knowledge required. The more contextual layers you add, the more you reduce the “big boards with blinky lights” issue. Instrumentation is like a suit. It needs to fit your mind. Cross-checks and documented error margins mitigate instrument deficiency. Checklists with references to instrumentation at decision points is key. It pinpoints where attention should flow and how to process it toward informed decisions.

So, instrumentation addresses the problems of documentation and documentation addresses the problems of instrumentation. Together they help guide and ensure the quality and understanding of complex systems.