The Wild West of Publication Reform Is Now

It’s been a while since I’ve tried out my publication reform revolutionary hat (it comes in red!), but tonight as I was winding down I came across a post I simply could not resist. Titled “Post-publication peer review and the problem of privilege” by evolutionary ecologist Stephen Heard, the post argues that we should be cautious of post-publication review schemes insofar as they may bring about a new era of privilege in research consumption. Stephen writes:

“The packaging of papers into conventional journals, following pre-publication peer review, provides an important but under-recognized service: a signalling system that conveys information about quality and breath of relevance. I know, for instance, that I’ll be interested in almost any paper in The American Naturalist*. That the paper was judged (by peer reviewers and editors) suitable for that journal tells me two things: that it’s very good, and that it has broad implications beyond its particular topic (so I might want to read it even if it isn’t exactly in my own sub-sub-discipline). Take away that peer-review-provided signalling, and what’s left? A firehose of undifferentiated preprints, thousands of them, that are all equal candidates for my limited reading time (such that it exists). I can’t read them all (nobody can), so I have just two options: identify things to read by keyword alerts (which work only if very narrowly focused**), or identify them by author alerts. In other words, in the absence of other signals, I’ll read papers authored by people who I already know write interesting and important papers.”

In a nutshell, Stephen turns the entire argument for PPPR and publishing reform on its head. High impact[1] journals don’t represent elitism; rather they provide the no name rising young scientist a chance to have their work read and cited. This argument really made me pause for a second as it represents the polar opposite of almost my entire worldview on the scientific game and academic publishing. In my view, top-tier journals represent an entrenched system of elitism masquerading as meritocracy. They make arbitrary, journalistic decisions that exert intense power over career advancement. If anything the self-publication revolution represents the ability of a ‘nobody’ to shake the field with a powerful argument or study.

Needless to say I was at first shocked to see this argument supported by a number of other scientists on Twitter, who felt that it represented “everything wrong with the anti-journal rhetoric” spouted by loons such as myself. But then I remembered that in fact this is a version of an argument I hear almost weekly when similar discussions come up with colleagues. Ever since I wrote my pie-in-the sky self-publishing manifesto (don’t call it a manifesto!), I’ve been subjected (and rightly so!) to a kind of trial-by-peers as a de facto representative of the ‘revolution’. Most recently I was even cornered at a holiday party by a large and intimidating physicist who yelled at me that I was naïve and that “my system” would never work, for almost the exact reasons raised in Stephen’s post. So lets take a look at what these common worries are.

The Filter Problem

Bar none the first, most common complaint I hear when talking about various forms of publication reform is the “filter problem”. Stephen describes the fear quite succinctly; how will we ever find the stuff worth reading when the data deluge hits? How can we sort the wheat from the chaff, if journals don’t do it for us?

I used to take this problem seriously, and try to dream up all kinds of neato reddit-like schemes to solve it. But the truth is, it just represents a way of thinking that is rapidly becoming irrelevant. Journal based indexing isn’t a useful way to find papers. It is one signal in a sea of information and it isn’t at all clear what it actually represents. I feel like people who worry about the filter bubble tend to be more senior scientists who already struggle to keep up with the literature. For one thing, science is marked by an incessant march towards specialization. The notion that hundreds of people must read and cite our work for it to be meaningful is largely poppycock. The average paper is mostly technical, incremental, and obvious in nature. This is absolutely fine and necessary – not everything can be ground breaking and even the breakthroughs must be vetted in projects that are by definition less so. For the average paper then, being regularly cited by 20-50 people is damn good and likely represents the total target audience in that topic area. If you network to those people using social media and traditional conferences, it really isn’t hard to get your paper in their hands.

Moreover, the truly ground breaking stuff will find its audience no matter where it is published. We solve the filter problem every single day, by publically sharing and discussing papers that interest us. Arguing that we need journals to solve this problem ignores the fact that they obscure good papers behind meaningless brands, and more importantly, that scientists are perfectly capable of identifying excellent papers from content alone. You can smell a relevant paper from a mile away – regardless of where it is published! We don’t need to wait for some pie in the sky centralised service to solve this ‘problem’ (although someday once the dust settles i’m sure such things will be useful). Just go out and read some papers that interest you! Follow some interesting people on twitter. Develop a professional network worth having! And don’t buy into the idea that the whole world must read your paper for it to be worth it.

The Privilege Problem 

Ok, so lets say you agree with me to this point. Using some combination of email, social media, alerts, and RSS you feel fully capable of finding relevant stuff for your research (I do!). But your worried about this brave new world where people archive any old rubbish they like and embittered post-docs descend to sneer gleefully at it from the dark recesses of pubpeer. Won’t the new system be subject to favouritism, cults of personality, and the privilege of the elite? As Stephen says, isn’t it likely that popular persons will have their papers reviewed and promoted and all the rest will fade to the back?

The answer is yes and no. As I’ve said many times, there is no utopia. We can and must fight for a better system, but cheaters will always find away[2]. No matter how much transparency and rigor we implement, someone is going to find a loophole. And the oldest of all loopholes is good old human-corruption and hero worship. I’ve personally advocated for a day when data, code, and interpretation are all separate publishable, citable items that each contribute to ones CV. In this brave new world PPPRs would be performed by ‘review cliques’ who build up their reputation as reliable reviewers by consistently giving high marks to science objects that go on to garner acclaim, are rarely retracted, and perform well on various meta-analytic robustness indices (reproducibility, transparency, documentation, novelty, etc). They won’t replace or supplant pre-publication peer review. Rather we can ‘let a million flowers bloom’. I am all for a continuum of rigor, ranging from preregistered, confirmatory research with pre and post peer review, to fully exploratory, data driven science that is simply uploaded to a repository with a ‘use at your peril’ warning’. We don’t need to pit one reform tool against another; the brave new world will be a hybrid mixture of every tool we have at our disposal. Such a system would be massively transparent, but of course not perfect. We’d gain a cornucopia of new metrics by which to weight and reward scientists, but assuredly some clever folks would benefit more than others. We need to be ready when that day comes, aware of whatever pitfalls may bely our brave new science.

Welcome to the Wild West

Honestly though, all this kind of talk is just pointless. We all have our own opinions of what will be the best way to do science, or what will happen. For my own part I am sure some version of this sci-fi depiction is inevitable. But it doesn’t matter because the revolution is here, it’s now, it’s changing the way we consume and produce science right before our very eyes. Every day a new preprint lands on twitter with a massive splash. Just last week in my own field of cognitive neuroscience a preprint on problems in cluster inference for fMRI rocked the field, threatening to undermine thousands of existing papers while generating heated discussion in the majority of labs around the world. The week before that #cingulategate erupted when PNAS published a paper which was met with instant outcry and roundly debunked by an incredibly series of thorough post-publication reviews. A multitude of high-profile fraud cases have been exposed, and careers ended, via anonymous comments on pubpeer. People are out there, right now finding and sharing papers, discussing the ones that matter, and arguing about the ones that don’t. The future is now and we have almost no idea what shape it is taking, who the players are, or what it means for the future of funding and training. We need to stop acting like this is some fantasy future 10 years from now; we have entered the wild west and it is time to discuss what that means for science.

Authors note: In case it isn’t clear, i’m quite glad that Stephen raised the important issue of privilege. I am sure that there are problems to be rooted out and discussed along these lines, particularly in terms of the way PPPR and filtering is accomplished now in our wild west. What I object to is the idea that the future will look like it does now; we must imagine a future where science is radically improved!

[1] I’m not sure if Stephen meant high impact as I don’t know the IF of American Naturalist, maybe he just meant ‘journals I like’.

[2] Honestly this is where we need to discuss changing the hyper-capitalist system of funding and incentives surrounding publication but that is another post entirely! Maybe people wouldn’t cheat so much if we didn’t pit them against a thousand other scientists in a no-holds-barred cage match to the death.

oh BOLD where art thou? Evidence for a “mm-scale” match between intracortical and fMRI measures.

A frequently discussed problem with functional magnetic resonance imaging is that we don’t really understand how the hemodynamic ‘activations’ measured by the technique relate to actual neuronal phenomenon. This is because fMRI measures the Blood-Oxygenation-Level Dependent (BOLD) signal, a complex vascular response to neuronal activity. As such, neuroscientists can easily get worried about all sorts of non-neural contributions to the BOLD signal, such as subjects gasping for air, pulse-related motion artefacts, and other generally uninteresting effects. We can even start to worry that out in the lab, the BOLD signal may not actually measure any particular aspect of neuronal activity, but rather some overly diluted, spatially unconstrained filter that simply lacks the key information for understanding brain processes.

Given that we generally use fMRI over neurophysiological methods (e.g. M/EEG) when we want to say something about the precise spatial generators of a cognitive process, addressing these ambiguities is of utmost importance. Accordingly a variety of recent papers have utilized multi-modal techniques, for example combining optogenetics, direct recordings, and FMRI, to assess particularly which kinds of neural events contribute to alterations in the BOLD signal and it’s spatial (mis)localization. Now a paper published today in Neuroimage addresses this question by combining high resolution 7-tesla fMRI with Electrocorticography (ECoG) to determine the spatial overlap of finger-specific somatomotor representations captured by the measures. Starting from the title’s claim that “BOLD matches neuronal activity at the mm-scale”, we can already be sure this paper will generate a great deal of interest.

From Siero et al (In Press)

As shown above, the authors managed to record high resolution (1.5mm) fMRI in 2 subjects implanted with 23 x 11mm intracranial electrode arrays during a simple finger-tapping task. Motor responses from each finger were recorded and used to generate somatotopic maps of brain responses specific to each finger. This analysis was repeated in both ECoG and fMRI, which were then spatially co-registered to one another so the authors could directly compare the spatial overlap between the two methods. What they found appears at first glance, to be quite impressive:
From Siero et al (In Press)

Here you can see the color-coded t-maps for the BOLD activations to each finger (top panel, A), the differential contrast contour maps for the ECOG (middle panel, B), and the maximum activation foci for both measures with respect to the electrode grid (bottom panel, C), in two individual subjects. Comparing the spatial maps for both the index and thumb suggests a rather strong consistency both in terms of the topology of each effect and the location of their foci. Interestingly the little finger measurements seem somewhat more displaced, although similar topographic features can be seen in both. Siero and colleagues further compute the spatial correlation (Spearman’s R) across measures for each individual finger, finding an average correlation of .54, with a range between .31-.81, a moderately high degree of overlap between the measures. Finally the optimal amount of shift needed to reduce spatial difference between the measures was computed and found to be between 1-3.1 millimetres, suggesting a slight systematic bias between ECoG and fMRI foci.

Are ‘We the BOLD’ ready to breakout the champagne and get back to scanning in comfort, spatial anxieties at ease? While this is certainly a promising result, suggesting that the BOLD signal indeed captures functionally relevant neuronal parameters with reasonable spatial accuracy, it should be noted that the result is based on a very-best-case scenario, and that a considerable degree of unique spatial variance remains for the two methods. The data presented by Siero and colleagues have undergone a number of crucial pre-processing steps that are likely to influence their results: the high degree of spatial resolution, the manual removal of draining veins, the restriction of their analysis to grey-matter voxels only, and the lack of spatial smoothing all render generalizing from these results to the standard 3-tesla whole brain pipeline difficult. Indeed, even under these best-case criteria, the results still indicate up to 3mm of systematic bias in the fMRI results. Though we can be glad the bias was systematic and not random– 3mm is still quite a lot in the brain. On this point, the authors note that the stability of the bias may point towards a systematic miss-registration of the ECoG and FMRI data and/or possible rigid-body deformations introduced by the implantation of the electrodes), issues that could be addressed in future studies. Ultimately it remains to be seen whether similar reliability can be obtained for less robust paradigms than finger wagging, obtained in the standard sub-optimal imaging scenarios. But for now I’m happy to let fMRI have its day in the sun, give or take a few millimeters.

Siero, J. C. W., Hermes, D., Hoogduin, H., Luijten, P. R., Ramsey, N. F., & Petridou, N. (2014). BOLD matches neuronal activity at the mm scale: A combined 7T fMRI and ECoG study in human sensorimotor cortex. NeuroImage. doi:10.1016/j.neuroimage.2014.07.002

 

#MethodsWeDontReport – brief thought on Jason Mitchell versus the replicators

This morning Jason Mitchell self-published an interesting essay espousing his views on why replication attempts are essentially worthless. At first I was merely interested by the fact that what would obviously become a topic of heated debate was self-published, rather than going through the long slog of a traditional academic medium. Score one for self publication, I suppose. Jason’s argument is essentially that null results don’t yield anything of value and that we should be improving the way science is conducted and reported rather than publicising our nulls. I found particularly interesting his short example list of things that he sees as critical to experimental results which nevertheless go unreported:

These experimental events, and countless more like them, go unreported in our method section for the simple fact that they are part of the shared, tacit know-how of competent researchers in my field; we also fail to report that the experimenters wore clothes and refrained from smoking throughout the session.  Someone without full possession of such know-how—perhaps because he is globally incompetent, or new to science, or even just new to neuroimaging specifically—could well be expected to bungle one or more of these important, yet unstated, experimental details.

While I don’t agree with the overall logic or conclusion of Jason’s argument (I particularly like Chris Said’s Bayesian response), I do think it raises some important or at least interesting points for discussion. For example, I agree that there is loads of potentially important stuff that goes on in the lab, particularly with human subjects and large scanners, that isn’t reported. I’m not sure to what extent that stuff can or should be reported, and I think that’s one of the interesting and under-examined topics in the larger debate. I tend to lean towards the stance that we should report just about anything we can – but of course publication pressures and tacit norms means most of it won’t be published. And probably at least some of it doesn’t need to be? But which things exactly? And how do we go about reporting stuff like how we respond to random participant questions regarding our hypothesis?

To find out, I’d love to see a list of things you can’t or don’t regularly report using the #methodswedontreport hashtag. Quite a few are starting to show up- most are funny or outright snarky (as seems to be the general mood of the response to Jason’s post), but I think a few are pretty common lab occurrences and are even though provoking in terms of their potentially serious experimental side-effects. Surely we don’t want to report all of these ‘tacit’ skills in our burgeoning method sections; the question is which ones need to be reported, and why are they important in the first place?

Birth of a New School: PDF version and Scribus Template!

As promised, today we are releasing a copy-edited PDF of my “Birth of a New School” essay, as well as a Scribus template that anyone can use to quickly create their own professional quality PDF manuscripts. Apologies for the lengthy delay, as i’ve been in the middle of a move to the UK. We hope folks will iterate and optimize these templates for a variety of purposes, especially post-publication peer review, commentary, pre-registration, and more. Special thanks to collaborator Kate Mills, who used Scribus to create the initial layout. You might notice we deliberately styled the manuscript around the format of one of those Big Sexy Journals (see if you can guess which one). I’ve heard this elaborate process should cost somewhere in the tens of thousands of dollars per article, so I guess I owe Kate a few lunches! Seriously though, the entire copy-editing and formatting process only took about 3 or 4 hours total (most of which was just getting used to the Scribus interface), less than the time you would spend formatting and reformatting your article for a traditional publisher. With a little practice Scribus or similar tools can be used to quickly turn out a variety of high quality article types.

Here is the article on Figshare, and the direct download link:

Screen Shot 2013-12-12 at 11.50.42
The formatted manuscript. Easy!

What do you think? Personally, I’m really pleased with it! We’ve also gone ahead and uploaded the Scribus template to Figshare. You can use this to easily publish your own post-publication peer reviews, commentaries, and whatever else you like. Just copy-paste your own text into the text fields, replace the images, upload to Figshare or a similiar service, and you are good to go! In general Scribus is a really awesome open source tool for publishing, both easy to learn and cross platform. Another great alternative is Fidus. For now we’re still not exactly sure how to generate citations – in theory if you format your manuscripts according to these guidelines, Google Scholar will pick them up anywhere on the net and generate alerts. For now we are recommending everyone upload their self-publications to Figshare or a similar service, who are already working on a streamlined citation generation scheme. We hope you find these useful; now go out and publish some research!

The template:

An easy to use Scribus template for self-publishing
Our Scribus template, for quick creation of research proofs.

Birth of a New School: How Self-Publication can Improve Research

Edit: click here for a PDF version and citable figshare link!

Preface: What follows is my attempt to imagine a radically different future for research publishing. Apologies for any overlooked references – the following is meant to be speculative and purposely walks the line between paper and blog post. Here is to a productive discussion regarding the future of research.

Our current systems of producing, disseminating, and evaluating research could be substantially improved. For-profit publishers enjoy extremely high taxpayer-funded profit margins. Traditional closed-door peer review is creaking under the weight of an exponentially growing knowledge base, delaying important communications and often resulting in seemingly arbitrary publication decisions1–4. Today’s young researchers are frequently dismayed to find their pain-staking work producing quality reviews overlooked or discouraged by journalistic editorial practices. In response, the research community has risen to the challenge of reform, giving birth to an ever expanding multitude of publishing tools: statistical methods to detect p-hacking5, numerous open-source publication models6–8, and innovative platforms for data and knowledge sharing9,10.

While I applaud the arrival and intent of these tools, I suspect that ultimately publication reform must begin with publication culture – with the very way we think of what a publication is and can be. After all, how can we effectively create infrastructure for practices that do not yet exist? Last summer, shortly after igniting #pdftribute, I began to think more and more about the problems confronting the publication of results. After months of conversations with colleagues I am now convinced that real reform will come not in the shape of new tools or infrastructures, but rather in the culture surrounding academic publishing itself. In many ways our current publishing infrastructure is the product of a paper-based society keen to produce lasting artifacts of scholarly research. In parallel, the exponential arrival of networked society has lead to an open-source software community in which knowledge is not a static artifact but rather an ever-expanding living document of intelligent productivity. We must move towards “research 2.0” and beyond11.

From Wikipedia to Github, open-source communities are changing the way knowledge is produced and disseminated. Already this movement has begun reach academia, with researchers across disciplines flocking to social media, blogs, and novel communication infrastructures to create a new movement of post-publication peer review4,12,13. In math and physics, researchers have already embraced self-publication, uploading preprints to the online repository arXiv, with more and more disciplines using the site to archive their research. I believe that the inevitable future of research communication is in this open-source metaphor, in the form of pervasive self-publication of scholarly knowledge. The question is thus not where are we going, but rather how do we prepare for this radical change in publication culture. In asking these questions I would like to imagine what research will look like 10, 15, or even 20 years from today. This post is intended as a first step towards bringing to light specific ideas for how this transition might be facilitated. Rather than this being a prescriptive essay, here I am merely attempting to imagine what that future may look like. I invite you to treat what follows as an ‘open beta’ for these ideas.

Part 1: Why self-publication?

I believe the essential metaphor is within the open-source software community. To this end over the past few months I have  feverishly discussed the merits and risks of self-publishing scholarly knowledge with my colleagues and peers. While at first I was worried many would find the notion of self-publication utterly absurd, I have been astonished at the responses – many have been excitedly optimistic! I was surprised to find that some of my most critical and stoic colleagues have lost so much faith in traditional publication and peer review that they are ready to consider more radical options.

The basic motivation for research self-publication is pretty simple: research papers cannot be properly evaluated without first being read. Now, by evaluation, I don’t mean for the purposes of hiring or grant giving committees. These are essentially financial decisions, e.g. “how do I effectively spend my money without reading the papers of the 200+ applicants for this position?” Such decisions will always rely on heuristics and metrics that must necessarily sacrifice accuracy for efficiency. However, I believe that self-publication culture will provide a finer grain of metrics than ever dreamed of under our current system. By documenting each step of the research process, self-publication and open science can yield rich information that can be mined for increasingly useful impact measures – but more on that later.

When it comes to evaluating research, many admit that there is no substitute for opening up an article and reading its content – regardless of journal. My prediction is, as post-publication peer review gains acceptance, some tenured researcher or brave young scholar will eventually decide to simply self-publish her research directly onto the internet, and when that research goes viral, the resulting deluge of self-publications will be overwhelming. Of course, busy lives require heuristic decisions and it’s arguable that publishers provide this editorial service. While I will address this issue specifically in Part 3, for now I want to point out that growing empirical evidence suggests that our current publisher/impact-based system provides an unreliable heuristic at best14–16. Thus, my essential reason for supporting self-publication is that in the worst-case scenario, self-publications must be accompanied by the disclaimer: “read the contents and decide for yourself.” As self-publishing practices are established, it is easy to imagine that these difficulties will be largely mitigated by self-published peer reviews and novel infrastructures supporting these interactions.

Indeed, with a little imagination we can picture plenty of potential benefits of self-publication to offset the risk that we might read poor papers. Researchers spend exorbitant amounts of their time reviewing, commenting on, and discussing articles – most of that rich content and meta-data is lost under the current system. In documenting the research practice more thoroughly, the ensuing flood of self-published data can support new quantitative metrics of reviewer trust, and be further utlized in the development of rich information about new ideas and data in near real-time. To give just one example, we might calculate how many subsequent citations or retractions a particular reviewer generates, generating a reviewer impact factor and reliability index. The more aspects of research we publish, the greater the data-mining potential. Incentivizing in-depth reviews that add clarity and conceptual content to research, rather than merely knocking down or propping up equally imperfect artifacts, will ultimately improve research quality. By self-publishing well-documented, open-sourced pilot data and accompanying digital reagents (e.g. scripts, stimulus materials, protocols, etc), researchers can get instant feedback from peers, preventing uncounted research dollars from being wasted. Previously closed-door conferences can become live records of new ideas and conceptual developments as they unfold. The metaphor here is research as open-source – an ever evolving, living record of knowledge as it is created.

Now, let’s contrast this model to the current publishing system. Every publisher (including open-access) obliges researchers to adhere to randomly varied formatting constraints, presentation rules, submission and acceptance fees, and review cultures. Researchers perform reviews for free for often publically subsidized work, so that publishers can then turn around and sell the finished product back to those same researchers (and the public) at an exorbitant mark-up. These constraints introduce lengthy delays – ranging from 6+ months in the sciences all the way up to two years in some humanities disciplines. By contrast, how you self-publish your research is entirely up to you – where, when, how, the formatting, and the openness. Put simply, if you could publish your research how and when you wanted, and have it generate the same “impact” as traditional venues, why would you use a publisher at all?

One obvious reason to use publishers is copy-editing, i.e. the creation of pretty manuscripts. Another is the guarantee of high-profile distribution. Indeed, under the current system these are legitimate worries. While it is possible to produce reasonably formatted papers, ideally the creation of an open-source, easy to use copy-editing software is needed to facilitate mainstream self-publication. Innovators like figshare are already leading the way in this area. In the next section, I will try to theorize some different ways in which self-publication can overcome these and other potential limitations, in terms of specific applications and guidelines for maximizing the utility of self-published research. To do so, I will outline a few specific cases with the most potential for self-publication to make a positive impact on research right away, and hopefully illuminate the ‘why’ question a bit further with some concrete examples.

 Part 2: Where to begin self-publishing

What follows is the “how-to” part of this document. I must preface by saying that although I have written so far with researchers across the sciences and humanities in mind, I will now focus primarily on the scientific examples with which I am more experienced.  The transition to self-publication is already happening in the forms of academic tweets, self-archives, and blogs, at a seemingly exponential growth rate. To be clear, I do not believe that the new publication culture will be utopian. As in many human endeavors the usual brandism3, politics, and corruption can be expected to appear in this new culture. Accordingly, the transition is likely to be a bit wild and woolly around the edges. Like any generational culture shift, new practices must first emerge before infrastructures can be put in place to support them. My hope is to contribute to that cultural shift from artifact to process-based research, outlining particularly promising early venues for self-publication. Once these practices become more common, there will be huge opportunities for those ready and willing to step in and provide rich informational architectures to support and enhance self-publication – but for now we can only step into that wild frontier.

In my discussions with others I have identified three particularly promising areas where self-publication is either already contributing or can begin contributing to research. These are: the publication of exploratory pilot-data, post-publication peer reviews, and trial pre-registration. I will cover each in turn, attempting to provide examples and templates where possible. Finally, Part 3 will examine some common concerns with self-publication. In general, I think that successful reforms should resemble existing research practices as much as possible: publication solutions are most effective when they resemble daily practices that are already in place, rather than forcing individuals into novel practices or infrastructures with an unclear time-commitment. A frequent criticism of current solutions such as the comments section on Frontiers, PLOS One, or the newly developed PubPeer, is that they are rarely used by the general academic population. It is reasonable to conclude that this is because already over-worked academics currently see little plausible benefit from contributing to these discussions given the current publishing culture (worse still, they may fear other negative repercussions, discussed in Part 3). Thus a central theme of the following examples is that they attempt to mirror practices in which many academics are already engaged, with complementary incentive structures (e.g. citations).

Example 1: Exploratory Pilot Data 

This previous summer witnessed a fascinating clash of research cultures, with the eruption of intense debate between pre-registration advocates and pre-registration skeptics. I derived some useful insights from both sides of that discussion. Many were concerned about what would happen to exploratory data under these new publication regimes. Indeed, a general worry with existing reform movements is that they appear to emphasize a highly conservative and somewhat cynical “perfect papers” culture. I do not believe in perfect papers – the scientific model is driven by replication and discovery. No paper can ever be 100% flawless – otherwise there would be no reason for further research! Inevitably, some will find ways to cheat the system. Accordingly, reform must incentivize better reporting practices over stricter control, or at least balance between the two extremes.

Exploratory pilot data is an excellent avenue for this. By their very nature such data are not confirmatory – they are exciting in that they do not conform well to prior predictions. Such data benefit from rapid communication and feedback. Imagine an intuition-based project – a side or pet project conducted on the fly for example. The researcher might feel that the project has potential, but also knows that there could be serious flaws. Most journals won’t publish these kinds of data. Under the current system these data are lost, hidden, obscured, or otherwise forgotten.

Compare to a self-publication world: the researcher can upload the data, document all the protocols, make the presentation and analysis scripts open-source, and provide some well-written documentation explaining why she thinks the data are of interest. Some intrepid graduate student might find it, and follow up with a valuable control analysis, pointing out an excellent feature or fatal flaw, which he can then upload as a direct citation to the original data. Both publications are citable, giving credit to originator and reviewer alike. Armed with this new knowledge, the original researcher could now pre-register an altered protocol and conduct a full study on the subject (or alternatively, abandon the project entirely). In this exchange, it is likely that hundreds of hours and research dollars will have been saved. Additionally, the entire process will have been documented, making it both citable and minable for impact metrics. Tools already exist for each of these steps – but largely cultural fears prevent it from happening. How would it be perceived? Would anyone read it? Will someone steal my idea? To better frame these issues, I will now examine a self-publication practice that has already emerged in force.

 Example 2: Post-publication peer review

This is a particularly easy case, precisely because high-profile scholars are already regularly engaged in the practice. As I’ve frequently joked on twitter, we’re rapidly entering an era where publishing in a glam-mag has no impact guarantee if the paper itself isn’t worthwhile – you may as well hang a target on your head for post-publication peer reviewers. However, I want to emphasize the positive benefits and not just the conservative controls. Post-publication peer review (PPPR) has already begun to change the way we view research, with reviewers adding lasting content to papers, enriching the conclusions one can draw, and pointing out novel connections that were not extrapolated upon by the authors themselves. Here I like to draw an analogy to the open source movement, where code (and its documentation) is forkable, versioned, and open to constant revision – never static but always evolving.

Indeed, just last week PubMed launched their new “PubMed Commons” system, an innovative PPPR comment system, whereby any registered person (with at least one paper on PubMed) can leave scientific comments on articles.  Inevitably, the reception on twitter and Facebook mirrored previous attempts to introduce infrastructure-based solutions – mixed excitement followed by a lot of bemused cynicism – bring out the trolls many joked. To wit, a brief scan of the average comment on another platform, PubPeer, revealed a generally (but not entirely) poor level of comment quality. While many comments seem to be on topic, most had little to no formatting and were given with little context. At times comments can seem trollish, pointing out minor flaws as if they render the paper worthless. In many disciplines like my own, few comments could be found at all. This compounds the central problem with PPPR; why would anyone acknowledge such a system if the primary result is poorly formed nitpicking of your research? The essential problem here is again incentive – for reviews to be quality there needs to be incentive. We need a culture of PPPR that values positive and negative comments equally. This is common to both traditional and self-publication practices.

To facilitate easy, incentivized self-publication of comments and PPPRs, my colleague Hauke Hillebrandt and I have attempted to create a simple template that researchers can use to quickly and easily publish these materials. The idea is that by using these templates and uploading them to figshare or similar services, Google Scholar will automatically index them as citations, provide citation alerts to the original authors, and even include the comments in its h-index calculation. This way researchers can begin to get credit for what they are already doing, in an easy to use and familiar format. While the template isn’t quite working yet (oddly enough, Scholar is counting citations from my blog, but not the template), you can take a look at it here and maybe help us figure out why it isn’t working! In the near future we plan to get this working, and will follow-up this post with the full template, ready for you to use.

Example 3: Pre-registration of experimental trials

As my final example, I suggest that for many researchers, self-publication of trial pre-registrations (PR) may be an excellent way to test the waters of PR in a format with a low barrier to entry. Replication attempts are a particularly promising venue for PR, and self-publication of such registrations is a way to quickly move from idea to registration to collection (as in the above pilot data example), while ensuring that credit for the original idea is embedded in the infamously hard to erase memory of the internet.

A few benefits of PR self-publication, rather than relying on for-profit publishers, is that PR templates can be easily open-sourced themselves, allowing various research fields to generate community-based specialized templates adhering to the needs of that field. Self-published PRs, as well as high quality templates, can be cited – incentivizing the creation and dissemination of both. I imagine the rapid emergence of specialized templates within each community, tailored to the needs of that research discipline.

Part 3: Criticism and limitations

Here I will close by considering some common concerns with self-publication:

Quality of data

A natural worry at this point is quality control. How can we be sure that what is published without the seal of peer review isn’t complete hooey? The primary response is that we cannot, just like we cannot be sure that peer reviewed materials are quality without first reading them ourselves. Still, it is for this reason that I tried to suggest a few particularly ripe venues for self-publication of research. The cultural zeitgeist supporting full-blown scholarly self-publication has not yet arrived, but we can already begin to prepare for it. With regards to filtering noise, I argue that by coupling post-publication peer review and social media, quality self-publications will rise to the top. Importantly, this issue points towards flaws in our current publication culture. In many research areas there are effects that are repeatedly published but that few believe, largely due to the presence of biases against null-findings. Self-publication aims to make as much of the research process publicly available as possible, preventing this kind of knowledge from slipping through the editorial cracks and improving our ability to evaluate the veracity of published effects. If such data are reported cleanly and completely, existing quantitative tools can further incorporate them to better estimate the likelihood of p-hacking within a literature. That leads to the next concern – quality of presentation.

Hemingway's thoughts on data.

Quality of presentation

Many ask: how in this brave new world will we separate signal from noise? I am sure that every published researcher already receives at least a few garbage citations a year from obscure places in obscure journals with little relevance to actual article contents. But, so the worry goes, what if we are deluged with a vast array of poorly written, poorly documented, self-published crud. How would we separate the signal from the noise?

 The answer is Content, Presentation, and Clarity. These must be treated as central guidelines for self-publication to be worth anyone’s time. The Internet memesphere has already generated one rule for ranking interest: content rules. Content floats and is upvoted, blogspam sinks and is downvoted. This is already true for published articles – twitter, reddit, facebook, and email circles help us separate the wheat from the chaff at least as much as impact factor if not more. But presentation and clarity are equally important. Poorly conducted research is not shared, or at least is shared with vehemence. Similarly, poorly written self-publications, or poorly documented data/reagents are unlikely to generate positive feedback, much less impact-generating eyeballs. I like to imagine a distant future in which self-publication has given rise to a new generation of well-regarded specialists: reviewers who are prized for their content, presentation, and clarity; coders who produce cleanly documented pipelines; behaviorists producing powerful and easily customized paradigm scripts; and data collection experts who produce the smoothest, cleanest data around. All of these future specialists will be able to garner impact for the things they already do, incentivizing each step of the research processes rather than only the end product.

Being scooped, intellectual credit

Another common concern is “what if my idea/data/pilot is scooped?” I acknowledge that particularly in these early days, the decision to self-publish must be weighted against this possibility. However, I must also point out that in the current system authors must also weight the decision to develop an idea in isolation against the benefits of communicating with peers and colleagues. Both have risks and benefits – an idea or project in isolation can easily over-estimate its own quality or impact. The decision to self-publish must similarly be weighted against the need for feedback. Furthermore, a self-publication culture would allow researchers to move more quickly from project to publication, ensuring that they are readily credited for their work. And again, as research culture continues to evolve, I believe this concern will increasingly fade. It is notoriously difficult to erase information from The Internet (see the “Streisand effect”) – there is no reason why self-published ideas and data cannot generate direct credit for the authors. Indeed, I envision a world in which these contributions can themselves be independently weighted and credited.

 Prevention of cheating, corruption, self-citations

To some, this will be an inevitable point of departure. Without our time-tested guardian of peer review, what is to prevent a flood of outright fabricated data? My response is: what prevents outright fabrication under the current system? To misquote Jeff Goldblum in Jurassic Park, cheaters will always find a way. No matter how much we tighten our grip, there will be those who respond to the pressures of publication by deliberate misconduct. I believe that the current publication system directly incentivizes such behavior by valuing end product over process. By creating incentives for low-barrier post-publication peer review, pre-registration, and rich pilot data publication, researchers are given the opportunity to generate impact for each step of the research process. When faced with the vast penalties of cheating due to a null finding, versus doing one’s best to turn those data into something useful for someone, I suspect most people will choose the honest and less risky option.

 Corruption and self-citations are perhaps a subtler, more sinister factor. In my discussions with colleagues, a frequent concern is that there is nothing to prevent high-impact “rich club” institutions from banding together to provide glossy post-publication reviews, citation farming, or promoting one another’s research to the top of the pile regardless of content. I again answer: how is this any different from our current system? Papers are submitted to an editor who makes a subjective evaluation of the paper’s quality and impact, before sending it to four out of a thousand possible reviewers who will make an obscure  decision about the content of the paper. Sometimes this system works well, but increasingly it does not2. Many have witnessed great papers rejected for political reasons, or poor ones accepted for the same. Lowering the barrier to post-publication peer review means that even when these factors drive a paper to the top, it will be far easier to contextualize that research with a heavy dose of reality. Over time, I believe self-publication will incentivize good research. Cheating will always be a factor – and this new frontier is unlikely to be a utopia. Rather, I hope to contribute to the development of a bridge between our traditional publishing models and a radically advanced not-too-distant future.

Conclusion

Our current systems of producing, disseminating, and evaluating research increasingly seem to be out of step with cultural and technological realities. To take back the research process and bolster the ailing standard of peer-review I believe research will ultimately adopt an open and largely publisher-free model. In my view, these new practices will be entirely complementary to existing solutions including such as the p-curve5, open-source publication models6–8, and innovative platforms for data and knowledge sharing such as PubPeer, PubMed Commons, and figshare9,10. The next step from here will be to produce useable templates for self-publication. You can expect to see a PDF version of this post in the coming weeks as a further example of self-publishing practices. In attempting to build a bridge to the coming technological and social revolution, I hope to inspire others to join in the conversation so that we can improve all aspects of research.

 Acknowledgments

Thanks to Hauke Hillebrandt, Kate Mills, and Francesca Fardo for invaluable discussion, comments, and edits of this work. Many of the ideas developed here were originally inspired by this post envisioning a self-publication future. Thanks also to PubPeer, PeerJ,  figshare, and others in this area for their pioneering work in providing some valuable tools and spaces to begin engaging with self-publication practices.

Addendum

Excellent resources already exist for the many of the ideas presented here. I want to give special notice to researchers who have already begun self-publishing their work either as preprints, archives, or as direct blog posts. Parallel publishing is an attractive transitional option where researchers can prepublish their work for immediate feedback before submitting it to a traditional publisher. Special notice should be given to Zen Faulkes whose excellent pioneering blog posts demonstrated that it is reasonably easy to self-produce well formatted publications. Here are a few pioneering self-published papers you can use as examples – feel free to add your own in the comments:

The distal leg motor neurons of slipper lobsters, Ibacus spp. (Decapoda, Scyllaridae), Zen Faulkes

http://neurodojo.blogspot.dk/2012/09/Ibacus.html

Eklund, Anders (2013): Multivariate fMRI Analysis using Canonical Correlation Analysis instead of Classifiers, Comment on Todd et al. figshare.

http://dx.doi.org/10.6084/m9.figshare.787696

Automated removal of independent components to reduce trial-by-trial variation in event-related potentials, Dorothy Bishop

http://bishoptechbits.blogspot.dk/2011_05_01_archive.html

Deep Impact: Unintended consequences of journal rank

Björn Brembs, Marcus Munafò

http://arxiv.org/abs/1301.3748

A novel platform for open peer to peer review and publication:

http://thewinnower.com/

A platform for open PPPRs:

https://pubpeer.com/

Another PPPR platform:

http://f1000.com/

References

1. Henderson, M. Problems with peer review. BMJ 340, c1409 (2010).

2. Ioannidis, J. P. A. Why Most Published Research Findings Are False. PLoS Med 2, e124 (2005).

3. Peters, D. P. & Ceci, S. J. Peer-review practices of psychological journals: The fate of published articles, submitted again. Behav. Brain Sci. 5, 187 (2010).

4. Hunter, J. Post-publication peer review: opening up scientific conversation. Front. Comput. Neurosci. 6, 63 (2012).

5. Simonsohn, U., Nelson, L. D. & Simmons, J. P. P-Curve: A Key to the File Drawer. (2013). at <http://papers.ssrn.com/abstract=2256237>

6.  MacCallum, C. J. ONE for All: The Next Step for PLoS. PLoS Biol. 4, e401 (2006).

7. Smith, K. A. The frontiers publishing paradigm. Front. Immunol. 3, 1 (2012).

8. Wets, K., Weedon, D. & Velterop, J. Post-publication filtering and evaluation: Faculty of 1000. Learn. Publ. 16, 249–258 (2003).

9. Allen, M. PubPeer – A universal comment and review layer for scholarly papers? | Neuroconscience on WordPress.com. Website/Blog (2013). at <https://neuroconscience.com/2013/01/25/pubpeer-a-universal-comment-and-review-layer-for-scholarly-papers/>

10. Hahnel, M. Exclusive: figshare a new open data project that wants to change the future of scholarly publishing. Impact Soc. Sci. blog (2012). at <http://eprints.lse.ac.uk/51893/1/blogs.lse.ac.uk-Exclusive_figshare_a_new_open_data_project_that_wants_to_change_the_future_of_scholarly_publishing.pdf>

11. Yarkoni, T., Poldrack, R. A., Van Essen, D. C. & Wager, T. D. Cognitive neuroscience 2.0: building a cumulative science of human brain function. Trends Cogn. Sci. 14, 489–496 (2010).

12. Bishop, D. BishopBlog: A gentle introduction to Twitter for the apprehensive academic. Blog/website (2013). at <http://deevybee.blogspot.dk/2011/06/gentle-introduction-to-twitter-for.html>

13. Hadibeenareviewer. Had I Been A Reviewer on WordPress.com. Blog/website (2013). at <http://hadibeenareviewer.wordpress.com/>

14. Tressoldi, P. E., Giofré, D., Sella, F. & Cumming, G. High Impact = High Statistical Standards? Not Necessarily So. PLoS One 8, e56180 (2013).

15.  Brembs, B. & Munafò, M. Deep Impact: Unintended consequences of journal rank. (2013). at <http://arxiv.org/abs/1301.3748>

16.  Eisen, J. A., Maccallum, C. J. & Neylon, C. Expert Failure: Re-evaluating Research Assessment. PLoS Biol. 11, e1001677 (2013).

http://wl.figshare.com/articles/875339/embed?show_title=1

When is expectation not a confound? On the necessity of active controls.

Learning and plasticity are hot topics in neuroscience. Whether exploring old world wisdom or new age science fiction, the possibility that playing videogames might turn us into attention superheroes or that practicing esoteric meditation techniques might heal troubled minds is an exciting avenue for research. Indeed findings suggesting that exotic behaviors or novel therapeutic treatments might radically alter our brain (and behavior) are ripe for sensational science-fiction headlines purporting vast brain benefits.  For those of you not totally bored of methodological crisis, here we have one brewing anew. You see the standard recommendation for those interested in intervention research is the active-controlled experimental design. Unfortunately in both clinical research on psychotherapy (including meditation) and more Sci-Fi areas of brain training and gaming, use of active controls is rare at best when compared to the more convenient (but causally ineffective) passive control group. Now a new article in Perspectives in Psychological Science suggests that even standard active controls may not be sufficient to rule out confounds in the treatment effect of interest.

Why is that? And why exactly do we need  active controls in the first place? As the authors clearly point out, what you want to show with such a study is the causal efficacy of the treatment of interest. Quite simply what that means is that the thing you think should have some interesting effect should actually be causally responsible for creating that effect. If you want to argue that standing upside down for twenty minutes a day will make me better at playing videogames in Australia, it must be shown that it is actually standing upside down that causes my increased performance down under. If my improved performance on Minecraft Australian Edition is simply a product of my belief in the power of standing upside down, or my expectation that standing upside down is a great way to best kangaroo-creepers, then we have no way of determining what actually produced that performance benefit. Research on placebos and the power of expectations shows that these kinds of subjective beliefs can have a big impact on everything from attentional performance to mortality rates.

Useful flowchart from Boot et al on whether or not a study can make causal claims for treatment.
Useful flowchart from Boot et al on whether or not a study can make causal claims for treatment.

Typically researchers attempt to control for such confounds through the use of a control group performing a task as similar as possible to the intervention of interest. But how do we know participants in the two groups don’t end up with different expectations about how they should improve as a result of the training? Boot et al point out that without actually measuring these variables, we have no idea and no way of knowing for sure that expectation biases don’t produce our observed improvements. They then provide a rather clever demonstration of their concern, in an experiment where participants view videos of various cognition tests as well as videos of a training task they might later receive, in this case either the first-person shooter Unreal Tournament or the spatial puzzle game Tetris. Finally they asked the participants in each group which tests they thought they’d do better on as a result of the training video. Importantly the authors show that not only did UT and Tetris lead to significantly different expectations, but also that those expectation benefits were specific to the modality of trained and tested tasks. Thus participant who watched the action-intensive Unreal Tournament videos expected greater improvements on tests of reaction time and visual performance, whereas participants viewing Tetris rated themselves as likely to do better on tests of spatial memory.

This is a critically important finding for intervention research. Many researchers, myself included, have often thought of the expectation and demand characteristic confounds in a rather general way. Generally speaking until recently I wouldn’t have expected the expectation bias to go much beyond a general “I’m doing something effective” belief. Boot et al show that our participants are a good deal cleverer than that, forming expectations-for-improvement that map onto specific dimensions of training. This means that to the degree that an experimenter’s hypothesis can be discerned from either the training or the test, participants are likely to form unbalanced expectations.

The good news is that the authors provide several reasonable fixes for this dilemma. The first is just to actually measure participant’s expectations, specifically in relation to the measures of interest. Another useful suggestion is to run pilot studies ensuring that the two treatments do not evoke differential expectations, or similarly to check that your outcome measures are not subject to these biases. Boot and colleagues throw the proverbial glove down, daring readers to attempt experiments where the “control condition” actually elicits greater expectations yet the treatment effect is preserved. Further common concerns, such as worries about balancing false positives against false negatives, are address at length.

The entire article is a great read, timely and full of excellent suggestions for caution in future research. It also brought something I’ve been chewing on for some time quite clearly into focus. From the general perspective of learning and plasticity, I have to ask at what point is an expectation no longer a confound. Boot et al give an interesting discussion on this point, in which they suggest that even in the case of balanced expectations and positive treatment effects, an expectation dependent response (in which outcome correlates with expectation) may still give cause for concern as to the causal efficacy of the trained task. This is a difficult question that I believe ventures far into the territory of what exactly constitutes the minimal necessary features for learning. As the authors point out, placebo and expectations effects are “real” products of the brain, with serious consequences for behavior and treatment outcome. Yet even in the medical community there is a growing understanding that such effects may be essential parts of the causal machinery of healing.

Possible outcome of a training experiment, in which the control shows no dependence between expectation and outcome (top panel) and the treatment of interest shows dependence (bottom panel). Boot et al suggest that such a case may invalidate causal claims for treatment efficacy.
Possible outcome of a training experiment, in which the control shows no dependence between expectation and outcome (top panel) and the treatment of interest shows dependence (bottom panel). Boot et al suggest that such a case may invalidate causal claims for treatment efficacy.

To what extent might this also be true of learning or cognitive training? For sure we can assume that expectations shape training outcomes, otherwise the whole point about active controls would be moot. But can one really have meaningful learning if there is no expectation to improve? I realize that from an experimental/clinical perspective, the question is not “is expectation important for this outcome” but “can we observe a treatment outcome when expectations are balanced”. Still when we begin to argue that the observation of expectation-dependent responses in a balanced design might invalidate our outcome findings, I have to wonder if we are at risk of valuing methodology over phenomena. If expectation is a powerful, potentially central mechanism in the causal apparatus of learning and plasticity, we shouldn’t be surprised when even efficacious treatments are modulated by such beliefs. In the end I am left wondering if this is simply an inherent limitation in our attempt to apply the reductive apparatus of science to increasingly holistic domains.

Please do read the paper, as it is an excellent treatment of a critically ignored issue in the cognitive and clinical sciences. Anyone undertaking related work should expect this reference to appear in reviewer’s replies in the near future.

EDIT:
Professor Simons, a co-author of the paper, was nice enough to answer my question on twitter. Simons pointed out that a study that balanced expectation, found group outcome differences, and further found correlations of those differences with expectation could conclude that the treatment was causally efficacious, but that it also depends on expectations (effect + expectation). This would obviously be superior to an unbalanced designed or one without measurement of expectation, as it would actually tell us something about the importance of expectation in producing the causal outcome. Be sure to read through the very helpful FAQ they’ve posted as an addendum to the paper, which covers these questions and more in greater detail. Here is the answer to my specific question:

What if expectations are necessary for a treatment to work? Wouldn’t controlling for them eliminate the treatment effect?

No. We are not suggesting that expectations for improvement must be eliminated entirely. Rather, we are arguing for the need to equate such expectations across conditions. Expectations can still affect the treatment condition in a double-blind, placebo-controlled design. And, it is possible that some treatments will only have an effect when they interact with expectations. But, the key to that design is that the expectations are equated across the treatment and control conditions. If the treatment group outperforms the control group, and expectations are equated, then something about the treatment must have contributed to the improvement. The improvement could have resulted from the critical ingredients of the treatment alone or from some interaction between the treatment and expectations. It would be possible to isolate the treatment effect by eliminating expectations, but that is not essential in order to claim that the treatment had an effect.

In a typical psychology intervention, expectations are not equated between the treatment and control condition. If the treatment group improves more than the control group, we have no conclusive evidence that the ingredients of the treatment mattered. The improvement could have resulted from the treatment ingredients alone, from expectations alone, or from an interaction between the two. The results of any intervention that does not equate expectations across the treatment and control condition cannot provide conclusive evidence that the treatment was necessary for the improvement. It could be due to the difference in expectations alone. That is why double blind designs are ideal, and it is why psychology interventions must take steps to address the shortcomings that result from the impossibility of using a double blind design. It is possible to control for expectation differences without eliminating expectations altogether.

Can compassion be trained like a muscle? Active-controlled fMRI of compassion meditation.

Among the cognitive training literature, meditation interventions are particularly unique in that they often emphasize emotional or affective processing at least as much as classical ‘top-down’ attentional control. From a clinical and societal perspective, the idea that we might be able to “train” our “emotion muscle” is an attractive one. Recently much has been made of the “empathy deficit” in the US, ranging from empirical studies suggesting a relationship between quality-of-care and declining caregiver empathy, to a recent push by President Obama to emphasize the deficit in numerous speeches.

While much of the training literature focuses on cognitive abilities like sustained attention and working memory, many investigating meditation training have begun to study the plasticity of affective function, myself included.  A recent study by Helen Weng and colleagues in Wisconsin investigated just this question, asking if compassion (“loving-kindness”) meditation can alter altruistic behavior and associated neural processing. Her study is one of the first of its kind, in that rather than merely comparing groups of advanced practitioners and controls, she utilized a fully-randomized active-controlled design to see if compassion responds to brief training in novices while controlling for important confounds.

As many readers should be aware, a chronic problem in training studies is a lack of properly controlled longitudinal design. At best, many rely on “passive” or “no-contact” controls who merely complete both measurements without receiving any training. Even in the best of circumstances “active” controls are often poorly matched to whatever is being emphasized and tested in the intervention of interest. While having both groups do “something” is better than a passive or no-control design, problems may still arise if the measure of interest is mismatched to the demand characteristics of the study.  Stated simply, if your condition of interest receives attention training and attention tests, and your control condition receives dieting instruction or relaxation, you can expect group differences to be confounded by an explicit “expectation to improve” in the interest group.

In this regard Weng et al present an almost perfect example of everything a training study should be. Both interventions were delivered via professionally made audio CDs (you can download them yourselves here!), with participants’ daily practice experiences being recorded online. The training materials were remarkably well matched for the tests of interest and extra care was taken to ensure that the primary measures were not presented in a biased way. The only thing they could have done further would be a single blind (making sure the experimenters didn’t know the group identity of each participant), but given the high level of difficulty in blinding these kinds of studies I don’t blame them for not undertaking such a manipulation. In all the study is extremely well-controlled for research in this area and I recommend it as a guideline for best practices in training research.

Specifically, Weng et al tested the impact of loving-kindness compassion meditation or emotion reappraisal training on an emotion regulation fMRI task and behavioral economic game measuring altruistic behavior. For the fMRI task, participants viewed emotional pictures (IAPS) depicting suffering or neutral scenarios and either practiced a compassion meditation or reappraisal strategy to regulate their emotional response, before and after training. After the follow-up scan, good-old fashion experimental deception was used to administer a dictator economics-game that was ostensibly not part of the primary study and involved real live players (both deceptions).

For those not familiar with the dictator game, the concept is essentially that a participant watches a “dictator” endowed with 100$ give “unfair” offers to a “victim” without any money. Weng et al took great care in contextualizing the test purely in economic terms, limiting demand confounds:

Participants were told that they were playing the game with live players over the Internet. Effects of demand characteristics on behavior were minimized by presenting the game as a unique study, describing it in purely economic terms, never instructing participants to use the training they received, removing the physical presence of players and experimenters during game play, and enforcing real monetary consequences for participants’ behavior.

This is particularly important, as without these simple manipulations it would be easy for stodgy reviewers like myself to worry about subtle biases influencing behavior on the task. Equally important is the content of the two training programs. If for example, Weng et al used a memory training or attention task as their active-control group, it would be difficult not to worry that behavioral differences were due to one group expecting a more emotional consequence of the study, and hence acting more altruistic. In the supplementary information, Weng et al describe the two training protocols in great detail:

Compassion

… Participants practiced compassion for targets by 1) contemplating and envisioning their suffering and then 2) wishing them freedom from that suffering. They first practiced compassion for a Loved One, such as a friend or family member. They imagined a time their loved one had suffered (e.g., illness, injury, relationship problem), and were instructed to pay attention to the emotions and sensations this evoked. They practiced wishing that the suffering were relieved and repeated the phrases, “May you be free from this suffering. May you have joy and happiness.” They also envisioned a golden light that extended from their heart to the loved one, which helped to ease his/her suffering. They were also instructed to pay attention to bodily sensations, particularly around the heart. They repeated this procedure for the Self, a Stranger, and a Difficult Person. The Stranger was someone encountered in daily life but not well known (e.g., a bus driver or someone on the street), and the Difficult Person was someone with whom there was conflict (e.g., coworker, significant other). Participants envisioned hypothetical situations of suffering for the stranger and difficult person (if needed) such as having an illness or experiencing a failure. At the end of the meditation, compassion was extended towards all beings. For each new meditation session, participants could choose to use either the same or different people for each target category (e.g., for the loved one category, use sister one day and use father the next day).

Reappraisal

… Participants were asked to recall a stressful experience from the past 2 years that remained upsetting to them, such as arguing with a significant other or receiving a lower-than- expected grade. They were instructed to vividly recall details of the experience (location, images, sounds). They wrote a brief description of the event, and chose one word to best describe the feeling experienced during the event (e.g., sad, angry, anxious). They rated the intensity of the feeling during the event, and the intensity of the current feeling on a scale (0 = No feeling at all, 100 = Most intense feeling in your life). They wrote down the thoughts they had during the event in detail. Then they were asked to reappraise the event (to think about it in a different, less upsetting way) using 3 different strategies, and to write down the new thoughts. The strategies included 1) thinking about the situation from another person’s perspective (e.g., friend, parent), 2) viewing it in a way where they would respond with very little emotion, and 3) imagining how they would view the situation if a year had passed, and they were doing very well. After practicing each strategy, they rated how reasonable each interpretation was (0 = Not at all reasonable, 100 = Completely reasonable), and how badly they felt after considering this view (0 = Not bad at all, 100 = Most intense ever). Day to day, participants were allowed to practice reappraisal with the same stressful event, or choose a different event. Participants logged the amount of minutes practiced after the session.

In my view the active control is extremely well designed for the fMRI and economic tasks, with both training methods explicitly focusing on the participant altering an emotional response to other individuals.  In tests of self-rated efficacy, both groups showed significant decreases in negative emotion, further confirming the active control. Interestingly when Weng et al compared self-ratings over time, only the compassion group showed significant reduction from the first half of training sessions to the last. I’m not sure if this constitutes a limitation, as Weng et al further report that on each individual training day the reappraisal group reported significant reductions, but that the reductions themselves did not differ significantly over time. They explain this as being likely due to the fact that the reappraisal group frequently changed emotional targets, whereas the compassion group had the same 3 targets throughout the training. Either way the important point is that both groups self-reported similar overall reductions in negative emotion during the course of the study, strongly supporting the active control.

Now what about the findings? As mentioned above, Weng et al tested participants before and after training on an fMRI emotion regulation task. After the training, all participants performed the “dictator game”, shown below. After rank-ordering the data, they found that the compassion group showed significantly greater redistribution:

The dictator task (left) and increased redistribution (right).

For the fMRI analysis, they analyzed BOLD responses to negative vs neutral images at both time points, subtracted the beta coefficients, and then entered these images into a second-level design matrix testing the group difference, with the rank-ordered redistribution scores as a covariate of interest. They then tested for areas showing group differences in the correlation of redistribution scores and changes of BOLD response to negative vs neutral images (pre vs post), across the whole brain and in several ROIs, while properly correcting for multiple comparisons. Essentially this analysis asks, where in the brain do task-related changes in BOLD correlate more or less with the redistribution score in one group or another. For the group x covariate interaction they found significant differences (increased BOLD-covariate correlation) in the right inferior parietal cortex (IPC), a region of the parietal attention network, shown on the left-hand panel:

Increased correlation between negative vs neutral imagery related BOLD and redistribution scores (left), connectivity with DLPFC (right).

They further extracted signal from the IPC cluster and entered it into a conjunction analysis, testing for areas showing significant correlation  with the IPC activity, and found a strong effect in right DLPFC (right panel). Finally they performed a psychophysiological interaction (PPI) analysis with the right DLPFC activity as the seed, to determine regions showing significant task-modulated connectivity with that DLPFC activity. The found increased emotion-modulated DLPFC connectivity to nucleus accumbens, a region involved in encoding positive rewards (below, right).

Screen shot 2013-05-23 at 3.21.15 PM
PPI shows increased emotion-modulated connectivity of nucleus accumbens and DLPFC.

Together these results implicate training-related BOLD activity increases to emotional stimuli in the parietal attention network and increased parietal connectivity with regions implicated in cognitive control and reward processing, in the observed altruistic behavior differences. The authors conclude that compassion training may alter emotional processing through a novel mechanism, where top-down central-executive circuits redirect emotional information to areas associated with positive reward, reflecting the role of compassion meditation in emphasizing increased positive emotion to the aversive states of others. A fitting and interesting conclusion, I think.

Overall, the study should receive high marks for its excellent design and appropriate statistical rigor. There is quite a bit of interesting material in the supplementary info, a strategy I dislike, but that is no fault of the authors considering the publishing journal (Psych Science). The question itself is extremely novel, in terms of previous active-controlled studies. To date only one previous active-controlled study investigated the role of compassion meditation on empathy-related neuroplasticity. However that study compared compassion meditation with a memory strategy course, which (in my opinion) exposes it to serious criticism regarding demand characteristic. The authors do reference that study, but only briefly to state that both studies support a role of compassion training in altering positive emotion- personally I would have appreciated a more thorough comparison, though I suppose I can go and to that myself if I feel so inclined :).

The study does have a few limitations worth mentioning. One thing that stood out to me was that the authors never report the results of the overall group mean contrast for negative vs neutral images. I would have liked to know if the regions showing increased correlation with redistribution actually showed higher overall mean activation increases during emotion regulation. However as the authors clearly had quite specific hypotheses, leading them to  restrict their alpha to 0.01 (due to testing 1 whole-brain contrast and 4 ROIs), I can see why they left this out. Given the strong results of the study, it would in retrospect perhaps have been more prudent to skip  the ROI analysis (which didn’t seem to find much) and instead focus on testing the whole brain results.  I can’t blame them however, as it is surprising not to see anything going on in insula or amygdala for this kind of training.  It is also a bit unclear to me why the DLPFC was used as the PPI seed as opposed to the primary IPC cluster, although I am somewhat unfamiliar with the conjunction-connectivity analysis used here. Finally, as the authors themselves point out, a major limitation of the study is that the redistribution measure was collected only at time two, preventing a comparison to baseline for this measure.

Given the methodological state of the topic (quite poor, generally speaking), I am willing to grant them these mostly minor caveats. Of course, without a baseline altruism measure it is difficult to make a strong conclusion about the causal impact of the meditation training on altruism behavior, but at least their neural data are shielded from this concern. So while we can’t exhaustively conclude that compassion can be trained, the results of this study certainly suggest it is possible and perhaps even likely, providing a great starting point for future research. One interesting thing for me was the difference in DLPFC. We also found task-related increases in dorsolateral prefrontal cortex following active-controlled meditation, although in the left hemisphere and for a very different kind of training and task. One other recent study of smoking cessation also reported alteration in DLPFC following mindfulness training, leading me to wonder if we’re seeing the emergence of empirical consensus for this region’s specific involvement in meditation training. Another interesting point for me was that affective regulation here seems to involve primarily top-down or attention related neural correlates,  suggesting that bottom-up processing (insula, amygdala) may be more resilient to brief training, something we also found in our study. I wonder if the group mean-contrasts would have been revealing here (i.e. if there were differences in bottom-up processing that don’t correlate with redistribution). All together a great study that raises the bar for training research in cognitive neuroscience!