The outcomes?
20 teams found a statistically significant positive effect, while 9 teams did not, and where effect sizes ranged (in odds-ratio units), despite all teams working from the same data set, from 0.89 to 2.93 (where 1.0 would be no effect).
Why so many differences?
Because results depend a great deal on any team's chosen analytic strategy which in turn is influenced by their statistical comfort and choices and their interplay with their pre-existing working theories.
Now these results weren't incentivized examples of p-hacking. The authors of this study point out that the variability seen was based on "justifiable, but subjective, analytic decisions", and while there's no obvious means with which to ensure a researcher has chosen the right methodology for their study, the authors suggest that,
"transparency in data, methods, and process gives the rest of the community opportunity to see the decisions, question them, offer alternatives, and test these alternatives in further research".Something all the more important in cases where authors might in fact have biases the would incentivize them to favour a particular outcome, and why I wish I was offered more in the way of stats and critical appraisal in medical school (and maybe less in the way of embryology for instance).
[Photo by Timur Saglambilek from Pexels]