This is my fourth post in a critical series on Evidence-Based Management (EBM). This series discusses a number of objections that I have against EBM; a quality movement that intends to improve managerial decision-making by urging managers to use the best available (scientific) evidence (Pfeffer & Sutton, 2006). In my first post, I introduced the series and summarized my objections. In my second post, I discussed the history of EBM and how Scrum.org applies it to software development. My previous post focused on the problematic definition of evidence that underlies EBM. In this post, I will argue that this problematic definition opens the door for another problem: the potential for manipulation.
Evidence-Based Management is built on the principle that we should seek out evidence to justify our beliefs, especially when they inform managerial decisions. Questions like ‘should we implement pair coding to decrease the number of defects in production code?’, ‘what team size is optimal in terms of the value a team delivers?’ and ‘will Scrum help us to innovate more rapidly?’ can be answered based on personal experience, gut feelings or whatever competitors are doing. But we can also try to answer these questions with more objective data, like organizational metrics or scientific research. The assumption here is that objective data increases the quality of resulting decisions by eliminating psychological biases, like personal preferences, politics and agenda's.
Although this sounds great in theory, putting it into practice isn't all that easy. Just having a bunch of numbers or metrics on paper doesn't constitute quality data. You will need valid and reliable measurements in order to gather strong, meaningful data. Otherwise you're just fooling yourself and those involved. In my previous post I argued that this is very hard to do in scientific organizational research, let alone in the context of your own organization.
Now, we can simply choose to accept this limitation and assume that it's still better than having no (objective) data at all. And although this position has its merits, it does allow subjectivity, politics and personal agenda's to creep back into the decision-making in more subtle and less transparent ways. This may actually turn out to be more harmful than what we're trying to avoid, which I will try to show in this post. For if we lower the level of what constitutes reliable and valid data, we enter into a gray area where pretty much any position can be justified.
A case in point ...
In my previous post, I referenced three scientific studies that can help to inform management decisions. The studies showed that ‘pair coding improves code quality’ (Hannay, Arisholm & Sjøberg, 2009), that ‘gradual implementation of downsizing implementations increases organizational improvement’ (Cameron, Freeman & Mishra, 1995), and that ‘using Agile methodologies increases the chance of project success by 30%’ (2013, Standish Group). I concluded that, since there is scientific evidence to back up these practices, it is justified to implement them.
But how strong is this evidence, really? Take pair coding. The meta-analysis in the referenced study does show a significant positive effect on code quality, but the authors conclude that the effect is very small indeed and subject to a considerable amount of variance caused by a number of known and unknown moderating factors (Hannay, Arisholm & Sjøberg, 2009). Or take the success rate of Agile. In the most recent iteration of the Chaos Report (2014, Standish Group), the authors point out that this effect completely disappears when controlling for project size (“Size of a project trumps methodology”, ibid). The Chaos Reports are also frequently criticized for their unwillingness to share their data and methods for external verification and the validity of their research in general (Laurens Eveleens & Verhoef, 2010; Zvegintzov, 1998). How trustworthy is this research really? Finally, take the advice to gradually downsize your organization. Although there is certainly a moderate effect, the authors (Cameron, Freeman & Mishra, 1995) conclude that more research is urgently needed in order to test if these results can be generalized to organizations beyond the ones that participated. They also wonder if the results are influenced by mitigating factors, like culture and life cycle stage. Although one could use these research projects to conclude that 'pair coding improves quality', 'agile methodologies increase project success' and 'gradual downsizing trumps big-bang approaches', a more careful and thorough reading of the source material offers a more nuanced perspective of how strong this evidence really is.
Although these examples can be construed as ‘direct evidence’ to justify a particular belief (like 'pair coding improves quality'), they are clearly only circumstantial at best. A point that is also made by EBM (Thompson, 2005). But why should that stop someone from ignoring this nuance and using this 'evidence' to justify his or her belief? After all, people are more inclined to believe a supposedly objective source, even if the quality of evidence in that source doesn't hold up to scrutiny. Too cynical? Just take a look at popular media, where this happens all the time. Consider the intense debates around global warming, whether or not autism is caused by vaccination or the latest research on the healthiness of certain kinds of nutrition. People can offer you ample 'proof', 'evidence' and 'research' to back up any position in these debates.
The problem with circumstantial evidence is that you can easily prove any position with low quality data. The aforementioned studies can be used to support a decision to stop pair coding (‘See, there’s almost no effect’) or Scrum (‘So we can do waterfall as long as the project is small’), but they can also be used to start implementing these practices for the opposite reasons ('See, there is a significant effect'). This kind of ‘cherry picking’ has become increasingly easy with the rise of the internet. There is proof to be found to support pretty much any belief. And this may not even be intentional as people tend to search out evidence that supports their decision. This is called a ‘confirmation bias’ and is a well-known psychological effect (Baron, 2000). On a side note; falsification of one's position would be a more useful approach (ibid). Another problem is that weak data can easily be interpreted differently by changing definitions. This is especially problematic with vague business terminology like 'value', 'effectiveness' and 'success' that are hard to define and even harder to measure.
How does this relate to research in your own organization?
The above examples focused on scientific research, which is favored by Evidence-Based Practices as the primary source of 'strong evidence'. Another source of evidence is from one's own organization, like metrics, financial results or other kinds of measurements. This is what Scrum.org is trying to achieve with their Agility Path framework (Scrum.org, 2014). In my previous post I argued that this kind of (local) evidence is likely to be even weaker than scientific studies for the simple reason that reliable and valid measures are even harder to get right in a living, changing organization. Without sufficient statistical and methodological controls, causal conclusions are impossible to draw.
The core of the problem: weak evidence and manipulation
My worry is this. I am worried that Evidence-Based Management will be used to present weak data as 'strong and objective evidence' of a particular belief, thereby implying that it is justified to hold that belief and that others should therefore embrace this belief as well. After all, calling something 'Evidence' is a strong normative statement. And since the workplace is not a scientific arena with proper counter-balances, like peer reviews, critical appraisals of methodology and statistics and academic debate, most people will simply choose to accept what they're being told without question. Like we do most of the time when presented with 'evidence' through documentaries, blogs and popular media. Instead of empowering decision-making by making it more objective, I am worried that an Evidence-Based approach will instead make it less open for debate, less open for interpretation and offer less room for alternate perspectives. This way 'knowledge' truly becomes power, but more of the manipulative sort. Especially if this knowledge is based evidence that is weak and circumstantial at best.
I am not arguing that you stop searching for data to inform your decisions altogether. Making an informed decision is always a good thing, and Evidence-Based Management does well to remind us that we should always aim for the best information we can find. But the complex reality of a living organization does not warrant calling some information 'evidence', thereby making it somehow more true than other information. This will only serve to stifle debate and push aside the personal and human aspects of decision-making. These personal biases that EBM is trying to push out of the equation certainly have their value, after all. Instead of trying to find the truth or being right, decision-making should be more about finding shared interpretation and meaning in the complex reality in which we live. This seems to me like a far more sensible, and for more human way to decide on where to go next.
And if that is the appeal that Evidence-Based Management is truly making I'm perfectly happy to go along with it. But the terminology of Evidence-Based Management, and how advocates write about it, suggests otherwise. And if this is really the point, wouldn't that make Evidence-Based Management superfluous and guilty of turning the average manager into a "simplistic straw man of ignorance"?
Baron, J. (2000), Thinking and deciding (3rd ed.), New York: Cambridge University Press
Cameron, K. S., Freeman, S. L. & Mishra, A. K. (1995). In Organizational Change and Redesign: Ideas and Insights for Improving Performance, Huber, G. P. & Glick, W. H. (Eds). Oxford University Press;
Hannay, J. E., Dybå, T., Arisholm, E. & Sjoberg, D. I. K. (2009). The effectiveness of pair programming: a meta-analysis. Information and software technology, 51(7), pp. 1110-1122;
Pfeffer, J. & Sutton, R. I. (2006). Hard Facts, Dangerous Half-Trutsh and Total Nonsense: Profiting from Evidence-Based Medicine. Boston, MA: Harvard Business School Press.
Rousseau, D. M. (2005). Is there such a thing as ‘Evidence Based Management?’. Academy of Management Review, Vol. 31, No. 2, pp. 256–269;
Scrum.org (2014). Empirical management explored: Evidence-Based Management for Software Organizations. Retrieved August 9 from https://www.scrum.org/Portals/0/Documents/Community%20Work/Empirical-Management-Explored.pdf;
Standish Group (2013). CHAOS Manifesto 2013. Retrieved August 10, 2014 from http://www.versionone.com/assets/img/files/CHAOSManifesto2013.pdf;
Thompson, B., Diamond, K. E., McWilliam, R., Snyder, P. & Snyder, S. W. (2005). Evaluating the quality of Evidence from correlation research for Evidence-Based practices. Council for Exceptional Children, Vol. 71. No 2., pp. 181-194;