Surely, God loves the .06 (blob) nearly as much as the .05.

Image Credit: Dan Goldstein

“We are not interested in the logic itself, nor will we argue for replacing the .05 alpha with another level of alpha, but at this point in our discussion we only wish to emphasize that dichotomous significance testing has no ontological basis. That is, we want to underscore that, surely, God loves the .06 nearly as much as the .05. Can there be any doubt that God views the strength of evidence for or against the null as a fairly continuous function of the magnitude of p?”

Rosnow, R.L. & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44, 1276-1284.

This colorful quote came to mind while discussing significance testing procedures with colleagues over lunch. In Cognitive Neuroscience, with it’s enormous boon of obfuscated data, it seems we are so often met with these kinds of seemingly absurd, yet important statistical decisions. Should one correct p-values over the lifetime, as often suggested by our resident methodology expert? I love this suggestion; imagine an academia where the fossilized experts (no offense experts) are tossed aside for the newest and greenest researchers whose pool of p-values remains untapped!

Really though, just how many a priori anatomical hypothesis should one have sealed up in envelopes? As one colleague joked, it seems advantageous to keep a drawer full of wild speculations sealed away in case one’s whole-brain analysis fails to yield results. Of course we must observe and follow best scientific and statistical procedures to their maximum, but in truth a researcher often finds themselves at these obscure impasses, thousands of dollars in scanning funding spent, trying to decide whether or not they predicted a given region’s involvement. In these circumstances, it has even been argued that there is a certain ethical need to explore one’s data and not merely throw away all non-hypothesis fitting findings. While I do not support this claim, I believe it is worth considering. And further, I believe that a vast majority of the field, from the top institutions to the most obscure, often dip into these murky ethical realms.

This is one area I hope “data-driven” science, as in the Human Genome and Human Connectome projects, can succeed. It also points to a desperate need for publishing reform; surely what matters is not how many blobs fall on one side of an arbitrary distinction, but rather a full and accurate depiction of one’s data and it’s implications. In a perfect world, we would not need to obscure the truth hidden in these massive datasets while we hunt for sufficiently low p-values.

Rather we should publish a clear record, showing exactly what was done, what correlated with what, and also where significance and non-significance lie. Perhaps we might one day dream of combing through such datasets, actually explaining what drove the .06’s vs the .05’s. For now however, we must be careful not to look at our uncorrected statistical maps; for that way surely voodoo lie! And that is perhaps the greatest puzzle of all; two datasets, all things being equal. In one case the researcher writes down on paper, “blobs A, B, and C I shall see” and conducts significant ROI analyses on these regions. In the other he first examines the uncorrected map, notices blobs A, B, and C, and then conducts a region of interest analysis. In both cases, the results and data are the same. And yet one is classic statistical voodoo– double dipping- and the other perfectly valid hypothesis testing. It seems thus that our truth criterion lay not only with our statistics, but also in some way, in the epistemological ether.

Of course, it’s really more of a pragmatic distinction than an ontological one. The voodoo distinction serves not to delineate true from false results but rather to discourage researchers from engaging in risky practices that could inflate the risk of false-positives. All-in-all, I agree with Dorothy Bishop: we need to stop chasing the novel, typically spurious and begin to share and investigate our data in ways that create lasting, informative truths. The brain is simply too complex and expensive an object of study to let these practices build into an inevitable file-drawer of doom. It infuriates me how frustratingly obtuse many published studies are, even in top journals, regarding the precise methods and analysis that went into the paper. Wouldn’t we all rather share our data, and help explain it cohesively? I dread the coming collision between the undoubtably monolithic iceberg of unpublished negative findings, spurious positive findings, and our most trusted brain mapping paradigms.

6 thoughts on “Surely, God loves the .06 (blob) nearly as much as the .05.

  1. I’ve made the same “let’s do lifelong p-value correction” gag myself a fair number of times (p < .001*bonferroni corrected. damn, this gets tougher every time.)!

    The point you raise about two datasets, all things being equal, having the same analysis outcome results arrived at simultaneously by either a principled analysis or a fishing expedition; I think the argument there may be that in the latter case in particular it's not clear how one should form the p values, since the probability space will differ wildly. I'm reminded of a recent blog post by John Kruschke in which he argued that the probability space differs when you had set out to recruit 20 people and recruited 20 people compared to when you had set out to recruit as many people as you could in 2 weeks and just happened to recruit 20 people. In a way, though, what this suggests to me is that we're focusing on the wrong thing in the p values because they're so beholden to the intentions of the researcher (I am probably mangling some statistical vocabulary here, but bear with me.) The reality is that

    Regarding the sharing of data and cohesively finding explanations: I think the F1000 Research site that was recently announced sounds like it will provide the framework to do exactly that. In principle it sounds great. One thing I really like is that idea that somebody could come along and say "hey, you say your experiment is testing exactly x, y, and z; that's great, but it's also testing a, b, and c". It appears to me that are many valid a priori hypotheses that could be examined using many experimental designs that any given researcher is simply not going to know about. For example, Researcher A is interested in ERPs and attention and emotion, runs Experiment Y, tests ERPs for experimental manipulation etc. Researcher B reads this and says "I bet this design would also show effects in oscillatory power in the beta band". Instead of collecting new data and running a whole new experiment to test this single hypothesis, Researcher B now just does it, or suggests Researcher A does it. Sure, they can then go on to run another experiment to test, but then at least they can have an effect size estimate or some such which will help in the design.

    • One thing I really like is that idea that somebody could come along and say “hey, you say your experiment is testing exactly x, y, and z; that’s great, but it’s also testing a, b, and c”. It appears to me that are many valid a priori hypotheses that could be examined using many experimental designs that any given researcher is simply not going to know about. For example, Researcher A is interested in ERPs and attention and emotion, runs Experiment Y, tests ERPs for experimental manipulation etc. Researcher B reads this and says “I bet this design would also show effects in oscillatory power in the beta band”. Instead of collecting new data and running a whole new experiment to test this single hypothesis, Researcher B now just does it, or suggests Researcher A does it. Sure, they can then go on to run another experiment to test, but then at least they can have an effect size estimate or some such which will help in the design.

      This is exactly what I’m talking about. Given that most of our funding comes from public sources, I also have to say I feel like we have some obligation towards this sort of thing. It’s an elegant way to get around the fishing problem- have other people test their a priori hypothesis in your dataset. I don’t think it’s too hard to imagine some kind of system where impact-factor-esque rewards were set up in return for both data contributions that lead to meaningful discoveries and the discoveries themselves. That way we could incentivize not only the discoveries themselves but the creation of rich, quality data.

  2. I’m generally an advocate of a thorough exploration of the data, perhaps even at lower thresholds than one would conventionally publish. I see nothing particularly wrong with this, as long as it’s done in a principled manner, and the provenance of any given result is clear in the write-up.

    There are currently lots of initiatives going on which are trying to get brain-imaging people to share data in some way, and that’s great, however the technical and conceptual difficulties are substantial. Ideally there would be a single resource where neuroscientists could dump their data, and it would be accurately catalogued and retained with all the relevant supporting information (subject and experimental details), however that seems some way off at the moment.

    • I’ve long thought that academia could benefit greatly from an open source data repository. It could include rated reviews of datasets as well as comment/discussion threads organized also by user ratings. In that way researchers could both peer-review datasets and their findings, and collaborate extensively to get the most out of our data.

  3. You need to invest in some principled EDA methods that let you explore your data without going over the inferential line. Two books on the topic that I can recommend are:

    http://www.amazon.com/Exploratory-Analysis-Edition-Chapman-Computer/dp/1439812209

    and

    http://www.amazon.com/Exploring-Data-Engineering-Sciences-Medicine/dp/0195089650

    The later book has an associated blog which is a fascinating read of exploratory methods including discussion of the fabulous violin plot:

    http://exploringdatablog.blogspot.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s