Thoughts on team leadership and metrics

Thoughts on team leadership and metrics

I realise that in four or so years of this site I've never talked properly about team leadership, which is a bit of an omission given how much time I've devoted to organisational design and how closely the two are related.

I'm going to split this across a few articles - this one is mostly about metrics - but there are some themes which will keep coming up. One of them is this: Teams are important. Your organisational design might be the successor to the Netflix or Spotify models, but if you don't have great teams it's going to fail. Conversely, you can have the most backward and regressive organisation going but if you have a great team they're still going to find a way to deliver.

Building and maintaining great teams requires leadership. You can hire great people, but if you don't lead them effectively then the usual outcome is they'll spend most of their time arguing, get annoyed, and then either check out or quit. I've spent a lot of time rescuing teams and individuals from this state. But how do you avoid getting into that state?

Well, it's a complicated subject. But one starting point is to give yourself an understanding of how the team is performing. This also implies you need some way to manage that performance, and some metrics to measure it with. That raises a question - who are we measuring, and how?

Succeed and fail as a team

This, for me, is Principle Number One. Everything you do is about making the team the atomic unit, not the individual. This has big consequences for how you measure performance - because any individual-based metric you introduce takes you further from that goal: you're giving your team members incentives to optimise for themselves rather than for the team.

How do you handle this dichotomy? You want to measure the team as an atomic unit, but you also need to defend the team against brilliant jerks or people who don't pull their weight. The way to do this is to split your performance metrics as follows:

  • Team metrics are concerned with outcomes and results.
  • Individual metrics are restricted to "team-positive" and "team-negative" behaviours.

Good team metrics will be those specifically related to operational and business outcomes, e.g.

  • Number of escaped defects.
  • Availability SLA achieved.
  • Sprint commitments met.
  • Product objectives delivered.

(More on good metrics later).

Individual metrics will be around things correlated with good or bad team contribution:

  • Frequency of pull requests.
  • Proportion of features paired vs. solo.
  • Number of unexplained standup absences.
  • Team feedback acted upon

Ideally you should have only a small number of individual metrics, and most of them should centre on feedback given by other team members. This seems counter-intuitive. Why does it work?

Metric-induced Individualism

I'll use a counter-example to explain this. Let's say you want bug-free code. You decide to set some individual objectives. Developers will be measured on how many bugs they produce. Testers will be measured on how many bugs they catch. I've worked with plenty of managers who take this approach, often justifying it with phrases like, "healthy tension".

What happens? The team breaks down. The testers report absurd bugs or bring up the same known issue every sprint. The developers close them all as "not a bug". Everyone is arguing with each other and not getting work done. Defects are still escaping, because it's easier to achieve the metric by going after the easy, quick-to-find bugs rather than the difficult and serious ones. Worse, you can get informal horse trading where your tester doesn't report bugs on their drinking buddy's code, and suddenly your developers are falling out with each other as well.

So let's say you have a rethink, and give everybody the same metric. You don't want it to be something that can be gamed easily, so you decide on escaped defects as the individual measure.

What happens? The arguments get even worse! Now every escaped defect is a finger-pointing exercise. He didn't test it properly. Well she didn't code it right. Well who touched it last? Who gets the black mark this time? If you're stack ranking your people (bear in mind that "all employee review scores must cluster around an average" is an implicit stack rank) then congratulations, you just broke your team. Everyone is now working for themselves.

The solution is to make these complex team dynamics work for you.

The power of team dynamics

In my scenario, the solution is to give the entire team the same metric: number of escaped defects.

What you have now is a shared team goal. It doesn't matter whose fault it is (or isn't), the entire team is rewarded or penalised. This means if I'm a developer and I'm a little nervous about the code I've just written, it's not in my interest to rush it through PR and testing in the hope that not catching the bugs becomes someone else's fault. It's in my interest to point my reviewer at the bit I'm worried about, and let my tester know which parts to give extra attention to.

This is the point I get a raised eyebrow, shortly followed by a simple question:

"What about Don Do-Nothing, who coasts along on the team's achievements?"

I'll get to him in a bit. Actually, Don is not usually your problem. Your problem in this structure is Jerry the Brilliant Jerk. Because at some point, Jerry the Jerk is going to get you in a meeting room, and unload this on you:

"It's not fair! I'm the best programmer by far, and I'm being held back by these imbeciles who don't even know what the decorator pattern is."

This is why we don't have brilliant jerks. Well, that's a bit of a glib answer, and in reality you need to coach the hypothetical Jerry. A team with a single high-performing "rock star" is not an effective team; you're always one holiday, illness or unfortunate incident with a mangle away from catastrophic loss of productivity. This means that if Jerry believes the rest of the team are not at his level, it is on him to bring them to his level - this is the team-positive approach, rather than the individualistic one.

Jerry may quit at this point, by the way. That's totally fine. I've never felt any great loss in this situation. Brilliant jerks are not useful for you.

Teams heal themselves

I deliberately didn't address the case of Don Do-Nothing, so it's time to do that.

Firstly, a surprise: in a high-performing team, Don Do-Nothing is extremely rare. One of the most damaging things managers tend to do with low performers and lazy employees is stick them out of the way in a silo on some kind of punishment duty (BAU requests, bug fixing, writing an internal tool by themselves...) until they're ready for the prime time again. Actually, the solution is moving them to your highest performing team.

Most people don't want to let the side down. But they also need to see that they're letting the side down. Your developer who plods through the day slowly delivering low-quality code is never going to learn that they're slow or their output is low-quality stuck out on support, or in a team where the average isn't much better. They are going to learn if they're in a team which turns out lean, well-optimised functions in the space of a coffee break.

The other thing is teams who have a shared goal will bring this to your attention. If the team is the right size, my experience is that this is a positive discussion, too - the impact on their objectives of one person under-performing is large enough they want to do something about it, but not so large they're clamouring to have the person booted out the team. They genuinely want to help.

Team-level objectives: summary

In summary, team-level objectives keep your team together and working towards a common goal. Effectively supported, they result in the highest-performing member bringing up the average level of the team, and then the team bringing up the level of its lowest performer.

Managing individuals within the team

This team-focused approach is great and all, but at some point you're going to need to deal with things at the individual level. It's not just about comp and promotions - while your team may have a common goal, the individuals within it are going to have their own needs in terms of development, coaching and performance.

What we have to square here is the idea of "team-positive" and "team-negative". This can be hard for people who are used to a traditional tournament or stack-rank organisation, because there are behaviours these organisations see as "good" which are actually "team-negative". An example is the developer who always claims the difficult tasks because they're the quickest; it might be "good" for the individual and the velocity (in the short term at least), but it's not good for the team as no-one's learning anything.

Team-positive behaviours are things like:

  • Helps people in the team learn things.
  • Studies and practices things the team are using.
  • Merges their code frequently to avoid conflicts.
  • Provides useful feedback on pull requests.
  • Provides quick feedback on pull requests.
  • Raises blockers and impediments early.
  • Invites other team members to pair and mob.
  • Provides concise, useful ideas in meetings.
  • Fixes other team members' problems.
  • Gives team useful, constructive feedback.
  • Understands and supports the team's process (Scrum/Kanban/XP/whatever).

Team-negative behaviours are things like:

  • Works alone on features.
  • Merges their code at the last minute.
  • Rejects pull requests without explanation.
  • Gives noncommittal, vague updates during standup.
  • Hides blockers and impediments.
  • Refuses to compromise on opinions.
  • Says, "that's not my job"
  • Doesn't understand or work with the team's process.
  • Vetoes a great hire because they're worried about competition.
  • Only works on exciting or interesting tasks.
  • Insults or offends other team members.
  • Wastes time summarising others or arguing in meetings.
  • Refuses to study or learn things the team are using.
  • Complains about other team members.

You get the picture. Team-negative behaviours are a mixture of the straight-up negative, and things that are fine if you're a lone wolf working on something you own completely, but are extremely harmful in a team.

So how do we deal with this? Well, the first part is to have a small number of individual metrics that the team agree on. These are individual behaviours which the team think are important, like frequency of pull requests, percentage of features paired vs. solo, standup absences. Keep this set small and one that the team can sign up to.

The second part is to use the team. Everyone should be giving regular feedback on their peers, based on how they are contributing (or not) to the team goals. A good format is Start/Stop/Continue:

  • What should this person start doing?
  • What should this person stop doing?
  • What should this person continue doing?

This also gives us a really good metric for a team-positive behaviour: number of peer feedbacks given. This is worth evaluating because people are generally reticent about giving feedback, especially when it comes to the "stop doing" ones. Organisations who've been through this journey tend to find they need to anonymise feedback at first until everyone's comfortable with the idea.

Should we measure directly on this feedback? Give people a score something along the lines of positive feedback as a percentage of all feedback? No. If we do, we're stepping on to a path thats end up back at stack ranking and broken teams.

Individual Development

What's important is not necessarily how good or bad someone's feedback is, but their ability to act on it. A junior developer might receive a ton of "stop" feedback on code quality issues whereas a senior may only receive one or two, but if they fix all of those things while the senior dev refuses to change then they're doing a better job being a member of the team. They're addressing what the team needs from them.

So what does the leader do in this? Other than acting as the collection point for feedback if you anonymise, I find they have an important job to do enriching the feedback and making it useful:

  • Extract common themes from specific feedback.
  • Connect those to the team member's own view of their progress.
  • Help the team member create achievable goals.
  • Support the team member in achieving them.

This also includes bringing an expert second opinion. An example from teams I've managed is one where the team gave a lot of negative feedback about one member's code - because he had a much better understanding of the language and used features they didn't know about. In that case your goal gets spun round into teaching the rest of the team what idiomatic code looks like, and the leader has a big supporting role to play in backing up their team member.

One of the hardest things with this is keeping the goals achievable and short in timescale. People have a tendency to react to a suggestion they should start challenging themselves by setting a goal to climb Everest by the end of the year. A good leader will talk them down to the more realistic and near-term aim of climbing a local hill by the end of the week. This may seem facetious but it becomes very important when you're working on things like communication style which require a lot of effort to change and are best accomplished in small, understandable steps.

This, by the way, is evaluative. Once every couple of months you should have a larger session where you look at how many goals were set, how many were achieved, and how many of the chosen feedback items were addressed. (When I get a "Bob should stop doing" or "Sarah should start doing" feedback I like to check back with the person who raised it to see if that's actually happened, to make sure the goals are useful.)

Good Metrics

What we can draw from this is that if the team is going to be measured as a whole, and the individual performance is measured on their ability to address peer feedback based on this, the metrics by which we do this are important.

So what makes a good team metric?

  • It is within the team's control or influence.
  • It is not dependent on a single individual.
  • It is not trivially game-able.
  • It is closely tied to an organisational goal or outcome.

Let's go through these points, then look at some examples.

Within the team's control or influence

One of the fastest ways to demoralise a team is to measure them on something they can't affect. For example, giving a team a revenue target when they rely on a separate sales organisation to drive new business. Or setting a reliability SLA when the biggest driver of system outages is a separate operations team.

There's an argument here for outcome ownership, but when a team can't have it they also shouldn't be measured on it. Putting the team in a position where 20% of their bonus is dependent on whether Greg in ops picks up his pager at 3am is a recipe for a demoralised team. It's also a recipe for a demoralised Greg, who may wonder why other people get to benefit from his sleepless nights.

The majority of a team's evaluative criteria need to be things they have direct control over. They can't change whether Greg answers his pager, but they can change how many defects escape to production, or ensure that all new feature endpoints can be served to 100 concurrent users in under 10ms.

This is all neat and reductive, but sometimes you might want a team to accomplish something that does depend on other people. Maybe you want your top DevOps team to share some Terraform love with the teams who are getting started. The key here is to make sure they can still influence the result - achieving the goal still requires the other teams to want to learn, but your team can influence this by how much support and engagement they provide.

A good balance is that 75-80% of a team's goals are ones where they have control over the outcome, with the remaining 20-25% being ones where they have influence, structured in a way that encourages collaboration between teams.

Not dependent on an individual

Even if a goal is entirely within a team's control, it will still be problematic if only one individual in the team can make it happen. If the team run their own infrastructure but Claire is the only one with the out-of-hours pager and access to fix things, then setting an availability SLA means the entire team relies on Claire to achieve that objective. Whether it's achieved or not, you're going to get resentment in at least one direction.

The good thing is that unlike the previous point, this can usually be resolved without needing a lot of careful organisational design and upheaval. It ought to be within the team's control to fix anything which is dependent on a single individual. This can drive the feedback-based individual cycle too; if only one person can administer the Kubernetes cluster they should be giving feedback suggesting other team members start pairing on deployments, and receiving feedback suggesting they start running some lunch and learn sessions!

Note this doesn't mean absolutely everyone in the team can contribute to every possible goal - merely that the team feel they can support each other in achieving it and it's not dependent on a single individual.

Not trivially game-able

Good engineers look for simple solutions to problems. This does not stop at merely the problems you ask them to solve. Everyone knows what will go wrong with a "lines of code per day" metric, and it's rarely used these days, but let's consider one which is: code churn.

Code churn is expressed as the number of new lines of code compared to the number of deleted or changed lines of code.

If you give a team an objective of minimising code churn, the easiest way for them to achieve that is to never delete any code, and to always add new methods rather than adapt existing ones. The net effect is that despite measuring against an objective correlated with a clean codebase, you ended up with one that was messy - full of redundant code and unnecessary duplication.

The trick to getting out of game-able metrics is to look at what they're trying to drive, and keep following the stack. Code churn is supposed to indicate a code base that is messy and hard to work with. Messy code is correlated with unexpected bugs and unpredictable delivery velocity. So why not measure escaped defects and consistency of sprint velocity? (Be warned that the latter is game-able if your estimation and planning discipline is lax.)

Closely tied to organisational goals/outcomes

Your company prospectus probably doesn't mention code churn or how many points are delivered each sprint. Your investors and customers don't care about these things. Instead, you have new features, new products, and perhaps some quality and availability SLAs to back these up. Maybe you talk about how fast and responsive your products are.

If you have a solid organisational design where teams fully own outcomes then these things can be tied to an individual team. Which... well. What better team objectives than the things you want the team to achieve? So an example set might be:

  • Deliver the holographic airport map feature.
  • Deliver the "which airport am I in?" reminder.
  • Availability objective: 99.95% or greater = 100%, 99.2% or lower = 0%
  • Improve 99th percentile API response time: <80ms = 100%, >200ms = 0%
  • Keep the app size small: 10MB or less = 100%, 30MB or more = 0%
  • No more 1-star reviews mentioning "set my device on fire": 0 = 100%, 10 = 0%

Note the linearity between "achieved" and "not". You don't want the team to give up ever doing performance optimisation because of that one widely-used endpoint that won't go below 90ms.

You do need to pay careful attention to "control" and "influence" here, though. For example, you might work on a project with an organisation whose release cadence is far slower and more unpredictable than your own. If this is the case you may choose not to measure your team on getting to production, but instead on correlated drivers such as how quickly they resolve blockers, average ticket age, time tickets spend in a waiting state during sprint, etc.

Useful resources

There's a lot there, so I think I'll stop at this point and save the rest for another article. However, before I finish up I want to point at a few sources of inspiration and useful tools:

  • A lot of this thinking is influenced by the original Netflix Culture Deck
  • Similarly, the Spotify Talent Snapshot is similar to a lot of approaches I like to use.
  • Most performance management tools are horrible, clunky and mired in a world where reviews are done yearly and stack ranked, but I've been using Betterworks recently and it supports a lot of what I've talked about, particularly short cycles, tying to organisational objectives and open peer feedback.
  • Plandek can automate gathering a lot of the metrics I talk about for people using JIRA and Git. I believe they even have a "succeed and fail as a team" mode in recent iterations.
  • I haven't talked much about the frequency at which these things should happen, except by implication, as I didn't want to retread my own wittering about Performance Reviews

That's all for now, and as ever I'm always interested in hearing how others out there approach this problem. And apologies to those I wasn't able to manage in this style, either because I was still figuring it out or I was constrained in how much of it I could implement.