Summary and Review of Software Design X-Rays by Adam Tornhill

Software Design X-Rays #

(chapter 1) Why Technical Debt Isn’t Technical #

Technical debt is a metaphor that lets developers explain the need for refactorings and communicate technical trade-offs to business people.

  1. Keep a decision log

Human memory is fragile and cognitive biases are real, so a project decision log will be a tremendous help keeping track of your rationale for accepting technical debt. Jotting down decisions on a wiki or shared document helps you maintain knowledge over time.

Michael Feathers, in his groundbreaking book Working Effectively with Legacy Code, describes legacy code as a code without tests. Technical debt, on the other hand, often occurs in the very test code intended to raise the quality of the overall system!

In addition, legacy code is an undesirable after-the-fact state, whereas technical debt may be a strategic choice. „Let’s design a legacy system,” said absolutely no one ever.

Interest Rate Is a Function of Time #

Just because some code is bad doesn’t mean it’s technical debt. It’s not technical debt unless we have to pay interest on it, and interest rate is a function of time.

Why We Mistake Organizational Problems for Technical Issues #

Your Mental Models of Code #

As we learn a topic we build mental representations of that domain. Psychologists refer to such mental models as schemas. A schema is a theoretical construct used to describe the way we organize knowledge in our memory an how we use that knowledge for a particular event. You can think of a schema as a mental script implemented in neurons rather than code.

Understanding code also builds on schemas. You have general schemas for syntactic and semantic knowledge, like knowing the construction order of a class hierarchy in C++ or how to interpret Haskell. These schemas are fairly stable and translate across different applications you work on. You also have specific schemas to represent the mental model of a particular system or module. Those schemas represent your domain expertise. Building expertise means evolving better and more efficient mental models. (See „Software Design: Cognitive Aspects” for a summary of research on schemas in program comprehension and ”Cognitive Psychology” for a pure psychological view of expertise.)

Building efficient schemas takes time and it’s hard cognitive work for everything but the simplest programs. That task gets significantly harder when applied to a moving target like code under heavy development. In the project that tried to compress its time line from one year to three months by adding more people, the developers found the code hard to understand because code they wrote one day looked different three days later after being worked on by five other developers. Excess parallel work leads to development congestion, which is intrinsically at odds with mastery of the code.

Quality Suffers with Parallel Development #

Organizational factors are some of the best predictors of defects:

The more parallel development, the more process, coordination, and communication we need.


(chapter 2) Identify Code with High Interest Rates #

As we inspect the code, perhaps months or years later, we should be careful to not judge the original programmers, but rather use the information we gather as a way forward.

  1. Heuristics

Just remember that—like all models of complex processes—complexity trends are heuristics, not absolute truths.

  1. Lines of code is the easiest code complexity predictor we can use.
    1. cloc --by-file --out=../cloc_narwhal.txt --exclude-dir=node_modules,out,native .
  2. The indentation will give you a lot more data while still being rather easy to calculate. But, it requires good code quality.

First, the actual complexity number represents the number of logical indentations, so it makes little sense to discuss thresholds or compare complexity values across languages. It’s a trend that’s important, not the absolute values.

  1. Change frequency—a proxy for technical debt interest rate

git log --format=format: --name-only | egrep -v '^$' | sort | uniq -c | sort -r > ../narwahl_frequencies.txt

Hotspots #

A hotspot is complicated code that you have to work with often. Hotspots are calculated by combining the two metrics we’ve explored:

  1. Calculating the change frequency of each file as a proxy for interest rate
  2. Using the lines of code as a simple measure of code complexity

git log #

git log --pretty=format:'[%h] %aN %ad %s' --date=short --numstat > narwhal.log

Or even better:

git log --pretty=format:'[%h] %aN %ad %s' --date=short --numstat -- . ":(exclude)loc/*" > narwhal.log

Analysis with code-maat #


lein run -l ../narwhal/narwhal.log -c git > organizational_metrics.csv

lein run -l ../narwhal/narwhal.log -c git -a coupling > coupling.csv & code coupling.csv

Exercises #




Chapter 3 Coupling in Time: A Heuristic for the Concept of Surprise #

TLDR; It’s an interesting idea, but it seems to make sense only once in a while (every half a year, every year?). It doesn’t seem to be that useful e. g. every retrospective. Looking for surprises can be really useful, but most of the time it just confirms what developers already feel. It gives numbers to intuitions so maybe correlating those with bugs or something like that would make it easier to convince the business that refactoring those issues is important?

Copy-paste isn’t a problem in itself; copying and pasting may well be the right thing to do if the two chunks of code evolve in different directions. If they don’t—that is, if we keep making the same changes to different parts of the program—that’s when we get a problem.

Surprisingly, most of our work as developers doesn’t involve writing code. Rather, most of our time is spent understanding existing code.

Narwhal coupling from the beginning of time: https://docs.google.com/spreadsheets/d/1TiDEGkKFqfuEyCcnQGg6Ijc6di5uXQVNty3NMyQKdPQ/edit?usp=sharing

Exercises #




Chapter 4 Pay Off Your Technical Debt #

(…) proximity—a much underused design principle.

The Principle of Proximity #

Reminds me of https://wiki.c2.com/?CommonClosurePrinciple

The principle of proximity focuses on how well organized your code is with respect to readability and change. Proximity implies that functions that are changed together are moved closer together. Proximity is both a design principle and a heuristic for refactoring hotspots toward code that’s easier to understand.

You see an example of such code duplication in the figure, and the gut reaction is to extract the commonalities into a shared abstraction. In many cases that’s the correct approach, but sometimes a shared abstraction actually makes the code less maintainable.

To abstract means to take away. As we raise the abstraction level through a shared method, the two test cases lose their communicative value. Unit tests serve as an excellent starting point for newcomers in a codebase. When we take abstractions too far we lose that advantage by obscuring the behaviour we want to communicate through the tests.


There a several good books that help you refactor existing code. Refactoring: Improving the Design of Existing Code and Working Effectively with Legacy Code are both classics that offer practical and proven techniques. Refactoring for Software Design Smells: Managing Technical Debt is a new addition that is particularly valuable if you work with object-oriented techniques.

Splinter pattern #

Here are the steps behind an iterative splinter refactoring:

  1. Ensure your tests cover the splinter candidate. If you don’t have an adequate test suite—few hotspots do—you need to create one, as discussed in ​_Build Temporary Tests as a Safety Net_​.
  2. Identify the behaviors inside your hotspot. This step is a code-reading exercise where you look at the names of the methods inside the hotspot and identify code that forms groups of behaviors.
  3. Refactor for proximity. You now form groups of functions with related behavior inside the larger file, based on the behaviors you identified earlier. This proximity refactoring makes your next step much easier.
  4. Extract a new module for the behavior with the most development activity. Use an X-Ray analysis to decide where to start, then copy-paste your group of methods into a new class while leaving the original untouched. Remember to put a descriptive name on your new module to capture its intent.
  5. Delegate to the new module. Replace the body of the original methods with delegations to your new module. This allows you to move forward at a fast pace, which limits the risk for conflicting changes by other developers.
  6. Perform the necessary regression tests to ensure you haven’t altered the behavior of the system. Commit your changes once those tests pass.
  7. Select the next behavior to refactor and start over at step 4. Repeat the splinter steps until you’ve extracted all the critical hotspot methods you identified with your X-Ray analysis.

Separate code with Mixed Content #

Reduce Debt by Deleting Cost Sinks #

As you see in the figure, the ratio between the amount of source code versus test code is unbalanced. The second warning sign is that the complexity trends show different patterns for the hotspot and its corresponding unit test. This is a sign that the test code isn’t doing its job by growing together with the application code, and a quick code inspection is likely to confirm those suspicions.

This situation happens when a dedicated developer attempts to introduce unit tests but fails to get the rest of the organization to embrace the technique. Soon you have a test suite that isn’t updated beyond the initial tests, yet needs to be tweaked in order to compile so that the automated build passes.

You won’t get any value out of such unit tests, but you still have to spend time just to make them build. A simple cost-saving measure is to delete such unit tests, as they do more harm than good.

Turn Hotspot Methods into Brain-Friendly Chunks #

The advantage of a refactoring like the splinter pattern is that it puts a name on a specific concept. Naming our programming constructs is a powerful yet simple technique that ties in to the most limiting factor we have in programming—our working memory.

Working memory is a cognitive construct that serves as the mental workbench of your brain. It lets you integrate and manipulate information in your head. Working memory is also a strictly limited resource and programming tasks stretch it to the maximum.

We saw back in ​Your Mental Models of Code​, that optimizing code for programmer understanding is one of the most important choices we can make. This implies that when we’re writing code our working memory is a dimensioning factor that’s just as important as any technical requirements. Since we, at the time of this writing, unfortunately can neither patch nor upgrade human working memory, we need to work around that mental bottleneck rather than tackle it with brute force. Let’s get some inspiration from chess masters to see how it’s done.

Next books: #

Example #


Chapter 5 The Principles of Code Age #

Code age is a much-underused driver of software design that strengthens our understanding of the systems we build. Code age also helps us identify better modular boundaries, suggests new libraries to extract, and highlights stable aspects of the solution domain.

Stabilize Code by Age #

Buildings change over time to adapt to new uses, and different parts of a building change at different rates, much like software. This led the writer Stewart Brand to remark that a building tears itself apart “because of the different rates of change of its components.” (See How Buildings Learn: What Happens After They’re Built.)

The forces that tear codebases apart are the frailties of human memory and the need to communicate knowledge across time and over corporate boundaries.

The age of code is a factor that should—but rarely does—drive the evolution of a software architecture. Designing with code age as a guide means that we

  1. organize our code by its age;
  2. turn stable packages into libraries; and
  3. move and refactor code we fail to stabilize.

How to calculate the age of code? #

fetch the last modification date of the files in a repository.
​​git log​​ ​​-1​​ ​​--format=​​"%ad"​​ ​​--date=short​​ ​​​ ​​--​​ ​​activerecord/lib/active_record/base.rb​ ​  2016-06-09
git ​​log​​ ​​-1​​ ​​--format=​​"%ad"​​ ​​--date=short​​ ​​ ​​--​​ ​​activerecord/lib/active_record/gem_version.rb​ ​  2017-03-22

(…) we retrieve a list of all files in the repository, fetch their last modification date, and finally calculate the age of each file.

  1. git ls-files
  2. Use git log to get the last modification date
  3. Get age in months of each file (calculate)
## The Three Generations of Code

The code age analysis was inspired by the work of Dan North, who introduced the idea of short software half-life as a way to simplify code. North claims that we want our code to be either very recent or old, and the kind of code that’s hard to understand lies in between these two extremes.

Back in 1885 the psychologist Hermann Ebbinghaus published his pioneering work on how human memory functions. (See Über das Gedächtnis. Untersuchungen zur experimentellen Psychologie.)

The next figure shows the Ebbinghaus forgetting curve, where we quickly forget information learned at day one. To retain the information we need to repeat it, and with each repetition we’re able to improve our performance by remembering more.

Now, think back to North’s claim that code should be either recent or old. This works as a design principle because it aligns with the nature of the Ebbinghaus forgetting curve. Recent code is what we extend and modify right now, which means we have a fresh mental model of the code and we know how it achieves its magic. In contrast, old code is by definition stable, which means we don’t have to modify it, nor do we have to maintain any detailed information about its inner workings. It’s a black box.

The Ebbinghaus forgetting curve also explains why code that’s neither old nor recent is troublesome; such code is where we’ve forgotten much detail, yet we need to revisit the code at times. Each time we revisit mid-aged code we need to relearn its inner workings, which comes at a cost of both time and effort.

There’s also a social side to the age of code in the sense that the older the code, the more likely the original programmer has left the organization. This is particularly troublesome for the code in between—the code we fail to stabilize—because it means that we, as an organization, have to modify code we no longer know. David Parnas labeled such modifications “ignorant surgery” as a reference to changing code whose original design concept we fail to understand.

Your Best Bug Fix Is Time #

The risk of a new bug decreases with every day that passes. That’s due to the interesting fact that the risk of software faults declines with the age of the code. A team of researchers noted that a module that is a year older than a similar module has roughly one-third fewer faults. (See Predicting fault incidence using software change history.)

Test cases tend to grow old in the sense that they become less likely to identify failures. (See Do System Test Cases Grow Old?.) Tests are designed in a context and, as the system changes, the tests have to evolve together with it to stay relevant.

Even when a module is old and stable, bad code may be a time bomb and we might defuse it by isolating that code in its own library. The higher-level interface of a library serves as a barrier to fend off ignorant surgeries.

Refactor Toward Code of Similar Age #

Code age, like many of the techniques in this book, is a heuristic. That means the analysis results won’t make any decisions for us, but rather will guide us by helping us ask the right questions. One such question is if we can identify any high-level refactoring opportunities that allow us to turn a collection of files into a stable package—that is, a mental chunk.

Back in _​Signal Incompleteness with Names_​, we saw that generic module names like str_util.cc signal low cohesion. Given the power of names—they guide usage and influence our thought processes—such modules are quite likely to become a dumping ground for a mixture of unrelated functions. This is a problem even when most of the existing functions in such utility-style files are stable, as the module acts like a magnet that attracts more code. This means we won’t be able to stabilize the strings package unless we introduce new modular boundaries.


The analysis reveals a large discrepancy in age between the different files, as some haven’t been touched in a decade while multibytecodec.c has been modified recently. Code that changes at different rates within the same package is a warning sign that means either of the following:

The age-driven separation of the codec mechanism from the language mappings also follows the common closure principle, which states that classes/files that change together should be packaged together. (See Clean Architecture: A Craftsman’s Guide to Software Structure and Design.)

Make sure that code is still in use before you extract it into a library. I’ve seen several commercial codebases where the only reason a package stabilizes is that the code is dead. In this case it’s a quick win since you can just delete the code. Remember, deleted code is the best code.

Scale from Files to Systems #

Code age also guides code reorganizations toward the common closure principle, which is basically a specialization of the more general concept of cohesion applied on the package level. As a nice side effect, new programmers who join your organization experience less cognitive load, as they can now focus their learning efforts to specific parts of the solution domain with a minimum of distracting code.

Exercises #

Earlier in this chapter we suggested a hypothetical refactoring of TensorFlow’s strings package. That package is located under TensorFlow’s core/lib structure. In the TensorFlow analysis you will see that there is another core package nested inside the core structure.

Next books: #

Chapter 6 Spot Your System’s Tipping Point #

Changes and new features often become increasingly difficult to implement over time, and many systems eventually reach a tipping point beyond which the codebase gets expensive to maintain. Since code decay is a gradual process, that tipping point is often hard to spot when you’re in the middle of the work on a large and growing codebase.

Is Software Too Hard? #

I spent six years of my career studying psychology at the university. During those years I also worked as a software consultant, and the single most common question I got from the people I worked with was why it’s so hard to write good code. This is arguably the wrong question because the more I learned about cognitive psychology, the more surprised I got that we’re able to code at all. Given all the cognitive bottlenecks and biases of the brain—such as our imperfect memory, restricted attention span, and limited multitasking abilities—coding should be too hard for us. The human brain didn’t evolve to program.
Of course, even if programming should be too hard for us, we do it anyway. We pull this off because we humans are great at workarounds, and a lot of the practices we use to structure code are tailor-made for this purpose. Abstraction, cohesion, and good naming help us stretch the amount of information we can hold in our working memory and serve as mental cues to help us counter the Ebbinghaus forgetting curve. We use similar mechanisms to structure our code at a system level. Functions are grouped in modules, and modules are aggregated into subsystems that in turn are composed into a system. When we succeed with our architecture, each high-level building block serves as a mental chunk that we can reason about and yet ignore its inner details. That’s powerful.

The first challenge has to do with the amount of information we can keep up with, as few people in the world can fit some million lines of code in their head and reason efficiently about it. A system under active development is also a moving target, which means that even if you knew how something worked last week, that code might have been changed twice since then by developers on three separate teams located in different parts of the world.

As a project grows beyond 12 or 15 developers, coordination, motivation and communication issues tend to cause a significant cost overhead. We’ve known that since Fred Brooks stressed the costs of communication efforts on tasks with complex interrelationships—the majority of software tasks—in The Mythical Man-Month: Essays on Software Engineering back in the 1970s.

(…) it’s often even more important to know if a specific part of the code is a coordination bottleneck. And in this area, supporting tools have been sadly absent.

Number of contributors #

git shortlog -s | wc -l

Next books: #

← Home