Summary and Review of Software Design X-Rays by Adam Tornhill

Groundbreaking book about software.

Software Design X-Rays #

Exercises #

List of all exercises list on Adam Tornhill website

Chapter 1 Why Technical Debt Isn’t Technical #

Technical debt is a metaphor that lets developers explain the need for refactorings and communicate technical trade-offs to business people.

  1. Keep a decision log

Human memory is fragile and cognitive biases are real, so a project decision log will be a tremendous help keeping track of your rationale for accepting technical debt. Jotting down decisions on a wiki or shared document helps you maintain knowledge over time.

Michael Feathers, in his groundbreaking book Working Effectively with Legacy Code, describes legacy code as a code without tests. Technical debt, on the other hand, often occurs in the very test code intended to raise the quality of the overall system!

In addition, legacy code is an undesirable after-the-fact state, whereas technical debt may be a strategic choice. „Let’s design a legacy system,” said absolutely no one ever.

Interest Rate Is a Function of Time #

Just because some code is bad doesn’t mean it’s technical debt. It’s not technical debt unless we have to pay interest on it, and interest rate is a function of time.

Why We Mistake Organizational Problems for Technical Issues #

Your Mental Models of Code #

As we learn a topic we build mental representations of that domain. Psychologists refer to such mental models as schemas. A schema is a theoretical construct used to describe the way we organize knowledge in our memory an how we use that knowledge for a particular event. You can think of a schema as a mental script implemented in neurons rather than code.

Understanding code also builds on schemas. You have general schemas for syntactic and semantic knowledge, like knowing the construction order of a class hierarchy in C++ or how to interpret Haskell. These schemas are fairly stable and translate across different applications you work on. You also have specific schemas to represent the mental model of a particular system or module. Those schemas represent your domain expertise. Building expertise means evolving better and more efficient mental models. (See „Software Design: Cognitive Aspects” for a summary of research on schemas in program comprehension and ”Cognitive Psychology” for a pure psychological view of expertise.)

Building efficient schemas takes time and it’s hard cognitive work for everything but the simplest programs. That task gets significantly harder when applied to a moving target like code under heavy development. In the project that tried to compress its time line from one year to three months by adding more people, the developers found the code hard to understand because code they wrote one day looked different three days later after being worked on by five other developers. Excess parallel work leads to development congestion, which is intrinsically at odds with mastery of the code.

Quality Suffers with Parallel Development #

Organizational factors are some of the best predictors of defects:

The more parallel development, the more process, coordination, and communication we need.

Books #

Chapter 2 Identify Code with High Interest Rates #

As we inspect the code, perhaps months or years later, we should be careful to not judge the original programmers, but rather use the information we gather as a way forward.

  1. Heuristics

Just remember that—like all models of complex processes—complexity trends are heuristics, not absolute truths.

  1. Lines of code is the easiest code complexity predictor we can use.
    1. cloc --by-file --out=../cloc_narwhal.txt --exclude-dir=node_modules,out,native .
  2. The indentation will give you a lot more data while still being rather easy to calculate. But, it requires good code quality.

First, the actual complexity number represents the number of logical indentations, so it makes little sense to discuss thresholds or compare complexity values across languages. It’s a trend that’s important, not the absolute values.

  1. Change frequency—a proxy for technical debt interest rate

git log --format=format: --name-only | egrep -v '^$' | sort | uniq -c | sort -r > ../narwahl_frequencies.txt

Hotspots #

A hotspot is complicated code that you have to work with often. Hotspots are calculated by combining the two metrics we’ve explored:

  1. Calculating the change frequency of each file as a proxy for interest rate
  2. Using the lines of code as a simple measure of code complexity

git log #

git log --pretty=format:'[%h] %aN %ad %s' --date=short --numstat > narwhal.log

Or even better:

git log --pretty=format:'[%h] %aN %ad %s' --date=short --numstat -- . ":(exclude)loc/*" > narwhal.log

Analysis with code-maat #

https://github.com/adamtornhill/code-maat

lein run -l ../narwhal/narwhal.log -c git > organizational_metrics.csv

lein run -l ../narwhal/narwhal.log -c git -a coupling > coupling.csv & code coupling.csv

Exercises #

The following exercises let you uncover technical debt in popular open source projects. You also learn how the combination of hotspots and complexity trends lets you follow up on the improvements you make in the code. That is, instead of focusing on problems, you get to use the analysis techniques to identify code that has been refactored.
Remember the document linked in ​How Should You Read This Book?​, which specifies a single page with all the exercise URLs. It’ll save you from having to type out all URLs in case you’re reading the print version.

Find Refactoring Candidates in Docker #

Follow Up on Improvements to Rails #

Chapter 3 Coupling in Time: A Heuristic for the Concept of Surprise #

TLDR; It’s an interesting idea, but it seems to make sense only once in a while (every half a year, every year?). It doesn’t seem to be that useful e. g. every retrospective. Looking for surprises can be really useful, but most of the time it just confirms what developers already feel. It gives numbers to intuitions so maybe correlating those with bugs or something like that would make it easier to convince the business that refactoring those issues is important?

Copy-paste isn’t a problem in itself; copying and pasting may well be the right thing to do if the two chunks of code evolve in different directions. If they don’t—that is, if we keep making the same changes to different parts of the program—that’s when we get a problem.

Surprisingly, most of our work as developers doesn’t involve writing code. Rather, most of our time is spent understanding existing code.

Narwhal coupling from the beginning of time: https://docs.google.com/spreadsheets/d/1TiDEGkKFqfuEyCcnQGg6Ijc6di5uXQVNty3NMyQKdPQ/edit?usp=sharing

Exercises #

Once you start to apply change coupling analyses to your own code, you’ll discover that the information is useful beyond uncovering technical debt. The following exercises let you explore different use cases for the analysis information. You also get to fill in the missing piece in our ASP.NET Core MVC case study as you uncover software clones in application code.

Learn from the Change Patterns in a Codebase #

Detect Omissions with Internal Change Coupling #

Kill the Clones #

Chapter 4 Pay Off Your Technical Debt #

(…) proximity—a much underused design principle.

The Principle of Proximity #

Reminds me of https://wiki.c2.com/?CommonClosurePrinciple

The principle of proximity focuses on how well organized your code is with respect to readability and change. Proximity implies that functions that are changed together are moved closer together. Proximity is both a design principle and a heuristic for refactoring hotspots toward code that’s easier to understand.

You see an example of such code duplication in the figure, and the gut reaction is to extract the commonalities into a shared abstraction. In many cases that’s the correct approach, but sometimes a shared abstraction actually makes the code less maintainable.

To abstract means to take away. As we raise the abstraction level through a shared method, the two test cases lose their communicative value. Unit tests serve as an excellent starting point for newcomers in a codebase. When we take abstractions too far we lose that advantage by obscuring the behaviour we want to communicate through the tests.

https://en.wikipedia.org/wiki/Principles_of_grouping#Proximity

There a several good books that help you refactor existing code. Refactoring: Improving the Design of Existing Code and Working Effectively with Legacy Code are both classics that offer practical and proven techniques. Refactoring for Software Design Smells: Managing Technical Debt is a new addition that is particularly valuable if you work with object-oriented techniques.

Splinter pattern #

Here are the steps behind an iterative splinter refactoring:

  1. Ensure your tests cover the splinter candidate. If you don’t have an adequate test suite—few hotspots do—you need to create one, as discussed in ​_Build Temporary Tests as a Safety Net_​.
  2. Identify the behaviors inside your hotspot. This step is a code-reading exercise where you look at the names of the methods inside the hotspot and identify code that forms groups of behaviors.
  3. Refactor for proximity. You now form groups of functions with related behavior inside the larger file, based on the behaviors you identified earlier. This proximity refactoring makes your next step much easier.
  4. Extract a new module for the behavior with the most development activity. Use an X-Ray analysis to decide where to start, then copy-paste your group of methods into a new class while leaving the original untouched. Remember to put a descriptive name on your new module to capture its intent.
  5. Delegate to the new module. Replace the body of the original methods with delegations to your new module. This allows you to move forward at a fast pace, which limits the risk for conflicting changes by other developers.
  6. Perform the necessary regression tests to ensure you haven’t altered the behavior of the system. Commit your changes once those tests pass.
  7. Select the next behavior to refactor and start over at step 4. Repeat the splinter steps until you’ve extracted all the critical hotspot methods you identified with your X-Ray analysis.

Separate code with Mixed Content #

Reduce Debt by Deleting Cost Sinks #

As you see in the figure, the ratio between the amount of source code versus test code is unbalanced. The second warning sign is that the complexity trends show different patterns for the hotspot and its corresponding unit test. This is a sign that the test code isn’t doing its job by growing together with the application code, and a quick code inspection is likely to confirm those suspicions.

This situation happens when a dedicated developer attempts to introduce unit tests but fails to get the rest of the organization to embrace the technique. Soon you have a test suite that isn’t updated beyond the initial tests, yet needs to be tweaked in order to compile so that the automated build passes.

You won’t get any value out of such unit tests, but you still have to spend time just to make them build. A simple cost-saving measure is to delete such unit tests, as they do more harm than good.

Turn Hotspot Methods into Brain-Friendly Chunks #

The advantage of a refactoring like the splinter pattern is that it puts a name on a specific concept. Naming our programming constructs is a powerful yet simple technique that ties in to the most limiting factor we have in programming—our working memory.

Working memory is a cognitive construct that serves as the mental workbench of your brain. It lets you integrate and manipulate information in your head. Working memory is also a strictly limited resource and programming tasks stretch it to the maximum.

We saw back in ​Your Mental Models of Code​, that optimizing code for programmer understanding is one of the most important choices we can make. This implies that when we’re writing code our working memory is a dimensioning factor that’s just as important as any technical requirements. Since we, at the time of this writing, unfortunately can neither patch nor upgrade human working memory, we need to work around that mental bottleneck rather than tackle it with brute force. Let’s get some inspiration from chess masters to see how it’s done.

Next books: #

Example #

https://codescene.io/projects/1716/jobs/4314/results/files/internal-temporal-coupling?file-name=EntityFrameworkCore/test/EFCore.SqlServer.FunctionalTests/QuerySqlServerTest.cs

Chapter 5 The Principles of Code Age #

Code age is a much-underused driver of software design that strengthens our understanding of the systems we build. Code age also helps us identify better modular boundaries, suggests new libraries to extract, and highlights stable aspects of the solution domain.

Stabilize Code by Age #

Buildings change over time to adapt to new uses, and different parts of a building change at different rates, much like software. This led the writer Stewart Brand to remark that a building tears itself apart “because of the different rates of change of its components.” (See How Buildings Learn: What Happens After They’re Built.)

The forces that tear codebases apart are the frailties of human memory and the need to communicate knowledge across time and over corporate boundaries.

The age of code is a factor that should—but rarely does—drive the evolution of a software architecture. Designing with code age as a guide means that we

  1. organize our code by its age;
  2. turn stable packages into libraries; and
  3. move and refactor code we fail to stabilize.

How to calculate the age of code? #

fetch the last modification date of the files in a repository.
​​git log​​ ​​-1​​ ​​--format=​​"%ad"​​ ​​--date=short​​ ​​​ ​​--​​ ​​activerecord/lib/active_record/base.rb​ ​  2016-06-09
git ​​log​​ ​​-1​​ ​​--format=​​"%ad"​​ ​​--date=short​​ ​​ ​​--​​ ​​activerecord/lib/active_record/gem_version.rb​ ​  2017-03-22

(…) we retrieve a list of all files in the repository, fetch their last modification date, and finally calculate the age of each file.

  1. git ls-files
  2. Use git log to get the last modification date
  3. Get age in months of each file (calculate)
## The Three Generations of Code

The code age analysis was inspired by the work of Dan North, who introduced the idea of short software half-life as a way to simplify code. North claims that we want our code to be either very recent or old, and the kind of code that’s hard to understand lies in between these two extremes.

Back in 1885 the psychologist Hermann Ebbinghaus published his pioneering work on how human memory functions. (See Über das Gedächtnis. Untersuchungen zur experimentellen Psychologie.)

The next figure shows the Ebbinghaus forgetting curve, where we quickly forget information learned at day one. To retain the information we need to repeat it, and with each repetition we’re able to improve our performance by remembering more.

Now, think back to North’s claim that code should be either recent or old. This works as a design principle because it aligns with the nature of the Ebbinghaus forgetting curve. Recent code is what we extend and modify right now, which means we have a fresh mental model of the code and we know how it achieves its magic. In contrast, old code is by definition stable, which means we don’t have to modify it, nor do we have to maintain any detailed information about its inner workings. It’s a black box.

The Ebbinghaus forgetting curve also explains why code that’s neither old nor recent is troublesome; such code is where we’ve forgotten much detail, yet we need to revisit the code at times. Each time we revisit mid-aged code we need to relearn its inner workings, which comes at a cost of both time and effort.

There’s also a social side to the age of code in the sense that the older the code, the more likely the original programmer has left the organization. This is particularly troublesome for the code in between—the code we fail to stabilize—because it means that we, as an organization, have to modify code we no longer know. David Parnas labeled such modifications “ignorant surgery” as a reference to changing code whose original design concept we fail to understand.

Your Best Bug Fix Is Time #

The risk of a new bug decreases with every day that passes. That’s due to the interesting fact that the risk of software faults declines with the age of the code. A team of researchers noted that a module that is a year older than a similar module has roughly one-third fewer faults. (See Predicting fault incidence using software change history.)

Test cases tend to grow old in the sense that they become less likely to identify failures. (See Do System Test Cases Grow Old?.) Tests are designed in a context and, as the system changes, the tests have to evolve together with it to stay relevant.

Even when a module is old and stable, bad code may be a time bomb and we might defuse it by isolating that code in its own library. The higher-level interface of a library serves as a barrier to fend off ignorant surgeries.

Refactor Toward Code of Similar Age #

Code age, like many of the techniques in this book, is a heuristic. That means the analysis results won’t make any decisions for us, but rather will guide us by helping us ask the right questions. One such question is if we can identify any high-level refactoring opportunities that allow us to turn a collection of files into a stable package—that is, a mental chunk.

Back in _​Signal Incompleteness with Names_​, we saw that generic module names like str_util.cc signal low cohesion. Given the power of names—they guide usage and influence our thought processes—such modules are quite likely to become a dumping ground for a mixture of unrelated functions. This is a problem even when most of the existing functions in such utility-style files are stable, as the module acts like a magnet that attracts more code. This means we won’t be able to stabilize the strings package unless we introduce new modular boundaries.

https://codescene.io/projects/1693/jobs/4253/results/files/hotspots?file-name=cpython/Modules/cjkcodecs/multibytecodec.c

The analysis reveals a large discrepancy in age between the different files, as some haven’t been touched in a decade while multibytecodec.c has been modified recently. Code that changes at different rates within the same package is a warning sign that means either of the following:

The age-driven separation of the codec mechanism from the language mappings also follows the common closure principle, which states that classes/files that change together should be packaged together. (See Clean Architecture: A Craftsman’s Guide to Software Structure and Design.)

Make sure that code is still in use before you extract it into a library. I’ve seen several commercial codebases where the only reason a package stabilizes is that the code is dead. In this case it’s a quick win since you can just delete the code. Remember, deleted code is the best code.

Scale from Files to Systems #

Code age also guides code reorganizations toward the common closure principle, which is basically a specialization of the more general concept of cohesion applied on the package level. As a nice side effect, new programmers who join your organization experience less cognitive load, as they can now focus their learning efforts to specific parts of the solution domain with a minimum of distracting code.

Exercises #

As we saw in this chapter, a common reason that we fail to stabilize a piece of code is that it’s low on cohesion and, hence, has several reasons to change. In these exercises you get the opportunity to investigate a package, uncover parts with low cohesion, and suggest new modular boundaries. You also get to pick up a loose end and come up with a deeper measure of code age that addresses the shortcomings we noted.

Cores All the Way Down #

Deep Mining: The Median Age of Code #

So far in the book we’ve used variations on the git log command for our data mining. That strategy works surprisingly well in providing us with the bulk of information we need. But for more specific analyses we need to dig deeper.
One such analysis is a possible extension to the age analysis in this chapter, where we used a shallow measure for code age. Ideally, we’d like to complement our age metric with a second one that goes deeper. One promising possibility is to calculate the median age of the lines of code inside a file. A median code age value would be much less sensitive to small changes and likely to provide a more accurate picture. How would you calculate the median age of code?
Hint: The key to successful data mining is to have someone else do the job for us. Thus, look to outsource the bulk of the job to some of Git’s command-line tools that operate on individual files. There are multiple solutions.

Next books: #

Chapter 6 Spot Your System’s Tipping Point #

Changes and new features often become increasingly difficult to implement over time, and many systems eventually reach a tipping point beyond which the codebase gets expensive to maintain. Since code decay is a gradual process, that tipping point is often hard to spot when you’re in the middle of the work on a large and growing codebase.

Is Software Too Hard? #

I spent six years of my career studying psychology at the university. During those years I also worked as a software consultant, and the single most common question I got from the people I worked with was why it’s so hard to write good code. This is arguably the wrong question because the more I learned about cognitive psychology, the more surprised I got that we’re able to code at all. Given all the cognitive bottlenecks and biases of the brain—such as our imperfect memory, restricted attention span, and limited multitasking abilities—coding should be too hard for us. The human brain didn’t evolve to program.
Of course, even if programming should be too hard for us, we do it anyway. We pull this off because we humans are great at workarounds, and a lot of the practices we use to structure code are tailor-made for this purpose. Abstraction, cohesion, and good naming help us stretch the amount of information we can hold in our working memory and serve as mental cues to help us counter the Ebbinghaus forgetting curve. We use similar mechanisms to structure our code at a system level. Functions are grouped in modules, and modules are aggregated into subsystems that in turn are composed into a system. When we succeed with our architecture, each high-level building block serves as a mental chunk that we can reason about and yet ignore its inner details. That’s powerful.

The first challenge has to do with the amount of information we can keep up with, as few people in the world can fit some million lines of code in their head and reason efficiently about it. A system under active development is also a moving target, which means that even if you knew how something worked last week, that code might have been changed twice since then by developers on three separate teams located in different parts of the world.

As a project grows beyond 12 or 15 developers, coordination, motivation and communication issues tend to cause a significant cost overhead. We’ve known that since Fred Brooks stressed the costs of communication efforts on tasks with complex interrelationships—the majority of software tasks—in The Mythical Man-Month: Essays on Software Engineering back in the 1970s.

(…) it’s often even more important to know if a specific part of the code is a coordination bottleneck. And in this area, supporting tools have been sadly absent.

Number of contributors #

git shortlog -s | wc -l

Next books: #

Exercises: #

Run a subsystem analysis of the arch package and identify its top hotspot. Dig deeper with an X-Ray, look at the code, and come up with a prioritized refactoring target.

Perform an X-Ray on the file and look for internal change coupling that we could eliminate by introducing shared abstractions for similar code. If you succeed, you get a quick win since you manage to reduce the overall complexity of the file.

Explore the complexity trends of the logical components in PhpSpreadsheet. Look at the coevolution of application code and test code. Do the trends indicate that unit tests are actively maintained, or are there signs of worry? Think about what the warning signs would look like in terms of trends.

Chapter 7 Beyond Conway’s Law #

In Part I we saw that a software project often mistakes organizational problems for technical issues, and treats the symptoms instead of the root cause. This misdirection happens because the organization that builds the system is invisible in our code. We can’t tell from the code alone if a piece of code is a productivity bottleneck for five different teams. In this chapter we close this knowledge gap as we use version-control data to measure team efficiency and detect parts of the code with excess coordination needs.

We’ll use this information to see how well our current system aligns with Conway’s law , which states that “a design effort should be organized according to the need for communication.” (See How do committees invent?)

Software Architecture Is About Making Choices #

Software architecture is as much about boxes and arrows as archeology is about shovels. While sketching boxes may be useful as part of a discussion, the real software architecture manifests itself as a set of principles and guidelines rather than a static structure captured in PowerPoint. Such architectural principles work as constraints that limit our design choices to ensure consistency and ease of reasoning in the resulting solution.

A software architecture also goes beyond purely technical concerns, as it needs to address the collaborative model of the people building the system. The general idea is to minimize the coordination and synchronization needs between different teams to achieve short lead

Interteam communication is an inevitable aspect of building large systems, and thus ease of communication should be a key nonfunctional requirement of any architecture. These claims are supported by empirical research, which reports gaps in the required coordination between developers and the actual coordination results in an increase in software defects. The same research also shows development productivity increases with better socio-technical congruence . (See Coordination Breakdowns and Their Impact on Development Productivity and Software Failures for the research findings.) Congruence means that the actual coordination needs are matched with appropriate coordinating actions, which is a strong case for aligning your architecture and organization since coordination costs increase with organizational distance. Such coordination costs also increase with the number of developers, so let’s look into that topic.

Measure Coordination Needs #

In a groundbreaking study, researchers at Microsoft used organizational metrics such as the number of authors, the number of ex-authors, and organizational ownership to measure how well these factors predict the failure proneness of the resulting code. The research shows that organizational factors are better predictors of defects than any property of the code itself, be it code complexity or code coverage. (See The Influence of Organizational Structure on Software Quality for the research.)

The number of authors behind each component provides a shallow indication of coordination needs, and is just a starting point. The quality risks we’ve discussed are not so much about how many developers have to work with a particular piece of code. Rather, it’s more important to uncover how diffused their contributions are, and once more we turn to research for guidance.

In a fascinating study on the controversial topic of code ownership, a research team noted that the number of minor contributors to a module has a strong positive correlation to defects. That is, the more authors that make small contributions, the higher the risk for bugs. Interestingly, when there’s a clear main developer who has written most of the code, the risk for defects is lower, as illustrated by the following figure. (See Don’t Touch My Code! Examining the Effects of Ownership on Software Quality.)

Based on that research alone we can’t tell why having more minor developers of a module leads to more defects. However, given what we’ve learned so far, some of the effect is likely due to increased coordination needs combined with an incomplete understanding of the existing design and problem domain.

React to Developer Fragmentation #

Open source development may be different from many closed source settings, as it encourages contributions to all parts of the code. However, there’s evidence to suggest that this collaboration comes with a quality cost. One study on Linux found that code written by many developers is more likely to have security flaws. (See Secure open source collaboration: an empirical study of Linus’ law.) The paper introducing our fractal value metric evaluated it on the Mozilla project, and found a strong correlation between the fractal value of a module and the number of reported bugs. (See Fractal Figures: Visualizing Development Effort for CVS Entities.)

Whenever you find code with a high fractal value, use the data to do the following:

Many fundamental problems in large-scale software development stem from a mindset where programmers are treated as interchangeable cogs—generic resource ready to be moved around and thrown at new problems in different areas. The research we just covered suggests that such a view is seriously flawed. Not all code changes are equal, and the programmer making the change is just as important from a quality perspective as the code itself.

Code Ownership and Diffusion of Responsibility #

So far we’ve discussed coordination needs mainly in terms of quality: the more developers who touch a piece of code, the higher the risk for defects. But coordination also has a very real direct cost, which is what social psychologists call process loss.

Process loss is a concept that social psychologists borrowed from the field of mechanics. The idea is that just as a machine cannot operate at 100 percent efficiency all the time (due to physical factors like friction and heat loss), neither can a team. Part of a team’s potential productivity is simply lost. (See Group Process and Productivity for the original research.)

The kind of process loss that occurs depends on the task, but in a brain-intensive collaboration like software, most process loss is due to communication and coordination overhead. Process loss may also be driven by motivation losses and other social group factors. These are related to a psychological phenomenon called diffusion of responsibility .

To counter the diffusion of responsibility we need to look for structural solutions. One way of producing personal responsibility is privatizing , which is an effective technique for managing shared resources in the real world. (See The commons dilemma: A simulation testing the effects of resource visibility and territorial division for research on how groups benefit from privatization.)

Immutable Design #

Providing a clear ownership model also helps address hotspots. I analyze codebases as part of my day job, and quite often I come across major hotspots with low code quality that still attract 10

It’s quite clear that this code is a problem, and when we investigate its complexity trends we frequently see that those problems have been around for years, significantly adding to the cost and displeasure of the project. New code gets shoehorned into a seemingly immutable design, which has failed to evolve with the system.

At the same time, such code is often not very hard to refactor, so why hasn’t that happened? Why do projects allow their core components to deteriorate in quality, year after year?

Code Ownership Means Responsibility #

Code ownership can be a controversial topic as some organizations move to models where every developer is expected to work on all parts of the codebase. The idea of code ownership evokes the idea of development silos where knowledge is isolated in the head of a single individual. So let’s be clear about this: when we talk ownership, we don’t mean ownership in the sense of “This is my code—stay away.” Rather, ownership is a mechanism to counter the diffusion of responsibility, and it suggests that someone takes personal responsibility for the quality and future of a piece of code.

That “someone” can be an individual, a pair, or a small team in a larger organization. I’ve also seen organizations that successfully adopt an open source–inspired ownership model where a single team owns a piece of code, yet anyone can—and is encouraged to—contribute to that code. The owning team, however, still has the final say on whether to accept the contributions.

Provide Broad Knowledge Boundaries #

The effects we discuss are all supported by data, and whether we like it or not, software development doesn’t work well with lots of minor contributors to the same parts of the code. We’ve seen some prominent studies that support this claim, and there is further research in Code ownership and software quality: a replication study, which shows that code ownership correlates with code quality. This research is particularly interesting since it replicates an earlier study, Don’t Touch My Code! Examining the Effects of Ownership on Software Quality, which claims that the risk for defects increases with the number of minor developers in a component.

Of course, these findings don’t mean you should stop sharing knowledge between people and teams—quite the contrary. It means that we need to distinguish between our operational boundaries (the parts where we’re responsible and write most of the code) from the knowledge boundaries of each team (the parts of the code we understand and are relatively familiar with).

Whereas Conway’s law implies that our communication works best with well-defined operational boundaries, broader knowledge boundaries make interteam communication easier since we share parts of each other’s context. There’s also evidence that broader knowledge boundaries provide our organization with a competitive advantage, enabling us to see opportunities and benefit from innovations outside our area of the code. (See The Mirroring Hypothesis: Theory, Evidence, and Exceptions for a summary of 142 empirical studies on the topic.)

Operational boundaries (areas where you change code) should be smaller than Knowledge Bundary (areas you know about).

There are several techniques for broadening your knowledge boundaries, such as inviting people from other teams to code reviews and scheduling recurring sessions where you present walkthroughs of a solution or design. You may also choose to encourage people to rotate teams. When combined, these techniques give your teams a fresh perspective on their work and help foster a culture of shared goals.

Introduce New Teams to Take on Shared Responsibilities #

Code like the Legacy Plugin is both a cost sink and a quality risk, so it’s important to get it back on track. The first step is to grant someone ownership over the code and ensure that person gets the necessary time to address the most critical parts. Social code analysis helps us with this task too.

Architectural building blocks tend to get defined early in a product’s life cycle, and as the code evolves it’s likely that new boundaries are needed, for both components and teams. Unfortunately, this is an aspect that organizations often fail to react to, and the consequences are developer congestion and coordination bottlenecks in the codebase. Such problems sneak up on us, which is why we need to measure and visualize.

Social Groups: The Flip Side to Conway’s Law #

Conway’s law is a great observation from the dawn of software development that has received renewed interest over the past few years, mostly as a way to sell the idea of microservices. But from a psychological perspective Conway’s law is an oversimplification. Team work is much more multifaceted. The law also involves a trade-off: we minimize communication needs between teams, but that win comes with

The flip side is the direct social costs of isolating teams with distinct areas of responsibility, and if we’re unaware of these social costs they will translate into real costs in terms of both money and a dysfunctional culture.

Motivation Losses in Teams #

A few years ago I worked with a team that was presented with a challenging task. During the past year the team had focused on making its work more predictable. It had learned to narrow down and prioritize tasks and to limit excess parallel development, and it had invested in a strong integration-test suite. It had been a bumpy ride, but life started to look bright until one day the team’s sprint was halted and a rapid change of plans was ordered.

Suddenly the team had to start work on a feature completely unrelated to all other recent work, and a tight deadline was enforced. Since no one had the required domain expertise and the software lacked the proper building blocks, the team had to sacrifice both short-and long-term quality goals to meet the deadline, only to be surprised that the completed feature wasn’t delivered to any customers. The reason that the feature suddenly gained importance and intense management focus was that someone had a bonus depending on it. The bonus goals were set two years earlier, before a single line of code had been written. The manager got his bonus, but the project suffered and was eventually canceled. It wasn’t so much the accumulated technical debt, which could have been countered, but rather the motivational losses among the team members.

This story presents the dangers of making people feel like their contributions are dispensable, a factor that’s known to encourage social loafing . Social loafing is a type of motivation loss that may occur when we feel that the success of our team depends little on our actual effort. We pretend to do our part of the work, when in reality we just try to look busy and hope our peers keep up the effort. It’s a phenomenon that occurs for both simple motor tasks, like rope-pulling, as well as for cognitive tasks like template metaprogramming in C++.

It doesn’t take extreme situations like the previous story to get social loafing going in a team. If the goals of a particular project aren’t clearly communicated or if arbitrary deadlines are enforced, people lose motivation in the task. Thus, as a leader you need to communicate why some specific task has to be done or why a particular deadline is important, which serves to increase the motivation for the person doing the job.

Social loafing is also related to the diffusion of responsibility that we discussed earlier in the sense that social loafing becomes a viable alternative only when you feel anonymous and your contributions aren’t easily identifiable. Therefore, social loafing and the resulting process loss increases with group size, which is a phenomenon known as the Ringelmann effect. Thus, part of the increased communication costs on a software project with excess staffing is likely to be Ringelmann-driven social loafing rather than true coordination needs.

Several factors can minimize the risk of social loafing:

Don't Turn Knowledge Maps into Performance Evaluations

Us and Them: The Perils of Interteam Conflicts #

Exercises #

The following exercises are designed to let you explore architectural hotspots on your own. By working through the exercises you also get the opportunity to explore an additional usage of complexity trends to supervise unit test practices.

Prioritize Hotspots in CPU Architectures #

Get a Quick Win #

Supervise Your Unit Test Practices #

Next books #

Chapter 8 Toward Modular Monoliths through the Social View of Code #

A worse but learned and understood design may trump its cleaner replacement.

Dodge the Silver Bullet #

Whatever architectural decisions we make, they’re likely to be invalidated over time, simply because an organization isn’t static.

The Trade-Off Between Architectural Refinements and Replacement Systems #

In some situations the rewrite choice has already been made for you by the passage of time; for example, when you’re stuck with obsolete technologies like 4GL languages that only compile to 32-bit native applications. A rewrite is also the right decision when the existing technology puts hard limitations on your system’s performance, if it’s no longer supported, or if it’s hard to recruit and retain staff due to an unattractive programming language. (VB6, we’re looking at you—again.)

Layered Architectures and the Cost of Consistency #

In my day job I’ve analyzed dozens of layered architectures, and in general the degree of coupling goes from 30 percent in stable applications where most changes are bug fixes, to 70 percent in codebases that grow new features. Let’s consider the impact.

A layered architecture enforces the same change pattern on all end-user features in the codebase. It’s a consistent design for sure, but that consistency doesn’t serve us well with regard to maintenance. When adding a new feature, no matter how insignificant from a user’s perspective, you need to visit every single layer for a predictable tweak to the code, often just passing on data from one level in the hierarchy to the next. It’s mundane and time consuming.

Package by component #

Package by component is a pattern captured by Simon Brown, (...)
The core idea is to make components an architectural building block that combines application logic and data-access logic, if needed.

Package by feature #

The package by feature pattern presents another architectural alternative that enables a high-level consistency without enforcing a specific technical design like traditional layers do. Package by feature takes a domain-oriented approach where each user-facing feature becomes a high-level building block,
Just like its component-based cousin, the package by feature pattern also makes it straightforward to align your architecture and organization. The main difference between the patterns is that the UI becomes part of each feature in package by feature, whereas it’s a separate concern in package by component.

The architectural paradigm data, context, and interaction (DCI) provides a clear separation between the data/domain model (what the system is) and its features (what the system does). In short, DCI separates your data objects from the feature-specific behaviors, which are expressed in object roles, and different use cases express their context by combining specific object roles,
The novelty of the DCI pattern is its context-specific role objects, which give you a place for all those use case–specific details and tricky special cases that otherwise wreak havoc on your data model. Since DCI is a use case–centric pattern it enables independent developable parts with clear operational boundaries. The DCI pattern isn’t as well known as the other architectures we’ve discussed, but it’s a paradigm worth studying in more depth as a promising refactoring goal when modularizing legacy monoliths. (Lean Architecture for Agile Software Development contains a detailed description of DCI and is a highly recommended read.)

Discover Bounded Contexts Through Change Patterns #

Bounded context is a pattern from domain-driven design (DDD) where multiple context-specific models are preferred over a global, shared data model. (See Domain-Driven Design: Tackling Complexity in the Heart of Software

The Perils of Feature Teams #

The slow pace of feature growth wasn’t due to bad code quality, and the architecture couldn’t be blamed either, as it revealed a modular component-based system with sane boundaries and dependencies. Odd. However, once we took a social view of the system a more worrisome architectural view arose. By applying the concept of knowledge maps on the team level—an idea that we touched on in the previous chapter—it became obvious that there weren’t any clear operational boundaries between the teams. In the next figure, which shows the team contributions over the past three months, you see that it’s hard to spot any patterns in the distribution of each team’s work. Sure, some team may be a major contributor to some parts, but in general this does look chaotic.

By using the historic lines of contributed code, our metric reflects such knowledge retention. Git lets us mine the number of added and deleted lines of code for each modified file through its --numstat option. We use the same algorithm as in ​Analyze Operational Team Boundaries​, to map individuals to teams. The only difference is that our input data is more detailed this time around

Visualizing code deletion as progress could do much good for our industry.

But even in a feature-oriented context there’s a cut-off point where the codebase can’t afford more people working on it, as there will always be dependencies between different features, and more fine-grained components only accentuate that. As feature implementations start to ripple across team boundaries, your lead times increase one synchronization meeting after the other.

First of all, this pattern is reminiscent of the speedup in parallel computing captured in Amdahl’s law, where the theoretical speedup is limited by the serial part of the program, as shown in the following figure.
An even more serious problem is that as your organization grows, code-reviewer fatigue becomes real, as there are just so many lines of code you can review each day. Beyond that point you’re likely to slip, which results in increased lead times, bugs that pass undetected to production,

Exercises #

Doing high-level refactorings will never become easy, and like any other skill, we need to practice it. The following exercises give you an opportunity to experiment with the techniques on your own. You also get a chance to investigate a component-oriented architecture, which makes an interesting contrast to the change patterns we saw in layered codebases.

Detect Components Across Layers #

Investigate Change Patterns in Component-Based Codebases #

Books #

Chapter 9 Systems of Systems: Analyzing Multiple Repositories and Microservices #

Analyze Code in Multiple Repositories #

The core idea behind microservices is to structure your system as a set of loosely coupled services, which—ideally—are independently deployable and execute in their own environment. Different services exchange information via a set of well-defined protocols, and the communication mechanism can be both synchronous, as in a blocking request-response, or asynchronous.

Compare Hotspots Across Repositories #

Microservices take the idea of team autonomy to an extreme, which indeed limits coordination bottlenecks in the code itself. However, as Susan Fowler points out in Production-Ready Microservices: Building Standardized Systems Across an Engineering Organization, a microservice never exists in isolation and it interacts with services developed by other teams. Those are conflicting forces.

As an example, let’s say you’ve identified a number of services with low cohesion. The impact is hard to explain in nontechnical terms, but showing a visualization where one microservice is 10 times the size of the others is an intuitive and powerful demonstration.

Track Change Patterns in Distributed Systems #

If low cohesion is problematic, strong coupling is the cardinal sin that grinds microservice development to a halt.

However, the long lead times weren’t due to slow development or a complex process, but rather were a consequence of the way the system and organization were structured. When one team did its “simple tweak” it had to request a change to another API owned by a different team. And that other team had to go to yet another team, that in turn had to convince the database administrators, which ostensibly is the place where change requests go to die.

Detect Implicit Dependencies Between Microservices #

In the simplest case we consider different commits part of the same logical change set if they are authored by the same person on the same day, and that algorithm is typically implemented using a sliding window. In a large system this gives us lots of change coupling, so we need to prioritize the results. The concept of surprise works well here too, so let’s focus on the coupling that crosses service boundaries as such dependencies are contrary to the philosophy of autonomous microservices.

Packaging the JavaScript files responsible for rendering the metrics together with the services that produce them solves the issue by reducing a systemwide implicit dependency to a local relationship within the same component. Sweet.

Detect Microservices Shotgun Surgery #

Such coupling is basically shotgun surgery on an architectural scale. (Shotgun surgery was introduced in Refactoring: Improving the Design of Existing Code [FBBO99] to describe changes that involve many small tweaks to different classes.) You want to change a single business capability and you end up having to modify five different services. That’s expensive.
There are several root causes for microservices shotgun surgery:

When you detect dependencies between code owned by different teams you have a number of options:

Measure Technical Sprawl #

Four decades ago, Manny Lehman started documenting a series of observations on how software evolves, and his writings became known as Lehman’s laws. (See On Understanding Laws, Evolution, and Conservation in the Large-Program Life Cycle.) One of the laws states the need for conservation of familiarity, which means that everyone involved in the life cycle of a system must maintain a working knowledge of the system’s behavior and content.
The main reasons for diminishing knowledge of a system are high turnover of personnel and, as Lehman points out, excessive growth of the codebase.

Just a couple of years ago microservices launched on the same trajectory, and one early selling point was that each team was free to choose its own technology and programming language. The consequences of unrestricted technology adoption became known as technical sprawl. Technical sprawl comes in different forms, and the most obvious form is when our services use different libraries, frameworks, and infrastructures. This sprawl will slow down the development of the system and diminish our mastery of it. We avoid these dangers by standardizing our microservice ecosystem; Production-Ready Microservices: Building Standardized Systems Across an Engineering Organization comes with a good set of practical advice in this area.

Sure, a good developer can learn the basics of any programming language in a week, but the mastery required to tweak and debug production code needs time and experience. While rewriting a service in another language is doable—at least as long as the service is truly micro—it has no value from a business perspective. It’s a hard sell.

Turn Prototyping into Play #

We humans learn by doing, and prototyping different solutions gives you feedback to base decisions on. Unless you prototype a problem connected to a specific technology—for example, performance optimizations or scalability—use your prototypes as a learning vehicle. (Years ago I learned Common Lisp this way.) The strategy has the advantage of fueling the intrinsic motivation of developers and gives your organization a learning opportunity that you can’t afford on production code. Besides, no manager will mistake that Common Lisp--based prototype as being production ready.

Exercises #

We covered a lot of ground in this chapter as we focused both on gaining situational awareness of existing problems and on getting guidance that makes it easier to understand existing code. In the following exercises you get the opportunity to try a technique from each of those categories.

Support Code Reading and Change Planning #

Combine Technical and Social Views to Identify Communities #

Analyze Your Infrastructure #

Books #

Chapter 10 An Extra Team Member: Predictive and Proactive Analyses #

There’s a common belief in our industry that technical debt sneaks into a codebase over time. However, recent research disagrees and suggests that many problematic code smells are introduced upon creation, and future evolution of the code merely continues to dig that hole deeper. This means we need a way to catch potential problems early, ideally before they enter our master branch.

Detect Deviating Evolutionary Patterns #

While the size of gc.cpp is on the extreme edge of the scale, far too many organizations find themselves in similar situations where parts of the code cannot be refactored without significant risk. Thus it pays off to investigate ways of detecting code decay and future maintenance problems early.

When Code Turns Bad #

In a fascinating study, a team of researchers investigated 200 open source projects to find out When and Why Your Code Starts to Smell Bad. The study identified cases of problematic code such as Blob classes that represent units with too many responsibilities, classes with high cyclomatic complexity, tricky spaghetti code, and so on, and in all fairness gc.cpp ticks most of those boxes.
The researchers then backtracked each of those code problems to identify the commit that introduced the root cause. The surprising conclusion is that such problems are introduced already upon the creation of those classes! Really.
This finding should impact how we view code; it’s easy to think that code starts out fine and then degrades over time.

Instead of waiting for the completion of a feature, make it a practice to present and discuss each implementation at one-third completion. Focus less on details and more on the overall structure, dependencies, and how well the design aligns with the problem domain. Of course, one-third completion is subjective, but it should be a point where the basic structure is in place, the problem is well understood, and the initial test suite exists. At this early stage, a rework of the design is still a viable alternative and catching potential problems here has a large payoff.
If you do one-third code walkthroughs—and you really should give it a try—start from the perspective of the test code. As we saw earlier in this book, there is often a difference in quality between test code and application code. Complicated test code is also an indication that something is not quite right in the design of the application code; if something is hard to test, it will be hard to use from a programmer’s point of view, and thus a future maintenance issue.

Identify Steep Increases in Complexity #

When you investigate your complexity trend warnings, you’re likely to come across the following scenarios:

Identify the Experts #

If you’ve ever worked in an organization that is located across multiple sites, you probably noted that distribution comes at a cost. What may be surprising is how significant that cost is. Research on the subject reports that distributed work items take an average of two and a half times longer to complete than tasks developed by a colocated team. (See the research in An Empirical Study of Speed and Communication in Globally Distributed Software Development.)

The previously mentioned research explains that in a distributed setting, the absence of informal discussions in the hallway makes it harder for distant colleagues to know who has expertise in different areas. In such organizations, knowledge maps gain importance.
In ​Build Team Knowledge Maps​, we saw how knowledge maps help us measure aspects like Conway’s law by mapping individual contributions to organizational units. If we skip that step and retain the information about individual authors, we get a powerful communication tool that lets us locate the experts.

Power Laws Are Everywhere #

We’ve already seen that hotspots work so well because the development activity in a codebase isn’t uniform, but forms a power law distribution. We see a similar distribution when it comes to individual author contributions, as shown in the following figure with an example from Kotlin.

This means that in your own codebase, you’re likely to see that a surprisingly small number of people have written most of the code. (You can have a look at your author distribution by typing the command git shortlog -s | sort -r.)

Your Code Is Still a Crime Scene #

My previous book, Your Code as a Crime Scene, introduced concepts from forensic psychology as a means to understand the evolution of large-scale codebases. Forensics was a metaphor drawn from where it all started. Years ago I did a geographical offender profile on a series of burglaries in my hometown, Malmö, Sweden.

The software industry has improved dramatically during the two decades I’ve been part of it, and there’s no sign it will stop. But it’s also an industry that keeps repeating avoidable mistakes by isolating its influences to technical fields. Large-scale software development has as much in common with the social sciences as with any engineering discipline. This means we could benefit from tapping into the vast body of research that social psychologists have produced over the past decades.

Exercises #

In these final exercises you get an opportunity to look for early warnings of potential future quality problems. You also get to experiment with a proactive usage of the social analysis techniques as a way to facilitate communication, as well as to reason about offboarding risks.

Early Warnings in Legacy Code #

Find the Experts #

Offboarding: What If? #

Books #



Share on Hacker News
Share on LinkedIn


← Home