Neurovault: a must-use tool for every neuroimaging paper!

Something that has long irked me about cognitive neuroscience is the way we share our data. I still remember the very first time I opened a brain imaging paper and was struck dumbfounded by the practice of listing activation results in endless p-value tables and selective 2D snapshots. How could anyone make sense of data this way? Now having several years experience creating such papers, I am only more dumbfounded that we continue to present our data in this way. What purpose can be served by taking a beautiful 3-dimensional result and filtering it through an awkward foci ‘photoshoot’? While there are some standards you can use to improve the 2D presentation of 3D brain maps, for example showing only peak activation and including glass-brains, this is an imperfect solution – ultimately the best way to assess the topology of a result is by directly examining the full 3D result.

Just imagine how improved every fMRI paper would be, if instead of a 20+ row table and selective snapshot, results were displayed in a simple 3D viewing widget right in the paper? Readers could assess the underlying effects at whatever statistical threshold they feel is most appropriate, and PDF versions could be printed at a particular coordinate and threshold specified by the author. Reviewers and readers alike could get a much fuller idea of the data, and meta-analysis would be vastly improved by the extensive uploading of well-categorized contrast images. More-over, all this can be easily achieved while excluding worries about privacy and intellectual property, using only group-level contrast images, which are inherently without identifying features and contain only those effects included in the published manuscript!

Now imagine my surprise when I learned that thanks to Chris Gorgolewksi and colleagues, all of this is already possible! Chris pioneered the development of neurovault.org, an extremely easy to use data sharing site backed by the International Neuroinformatics Coordinating Facility. To use it, researchers simply need to create a new ‘collection’ for their study and then upload whatever images they like. Within about 15 minutes I was able to upload both the T- and contrast-images from my group level analysis, complete with as little or as much meta-data as I felt like including. Collections can be easily linked to paper DOIs and marked as in-review, published, etc. Collections and entries can be edited or added to at any time, and the facilities allow quick documentation of imaging data at any desired level, from entire raw imaging datasets to condition-specific group contrast images. Better still, neurovault seamlessly displays these images on a 3D MNI standard brain with flexible options for thresholding, and through a hookup to neurosynth.org can even seamlessly find meta-analytic feature loadings for your images! Check out these t-map display and feature loadings for the stimulus intensity contrast for my upcoming somatosensory oddball paper, which correctly identified the modality of stimulation!

T-map in the neurovault viewer.
T-map in the neurovault viewer.
Decoded features for my contrast image.
Decoded features for my contrast image, with accurate detection of stimulation modality!

Neurovault.org doesn’t yet support embedding the viewer, but it is easy to imagine that with collaboration from publishers, future versions could be embedded directly within HTML full-text for imaging papers. For now, the site provides the perfect solution for researchers looking to make their data available to others and to more fully present their results, simply by providing supplementary links either to the neurovault collection or directly to individual viewer results. This is a tool that everyone in cognitive neuroscience should be using – I fully intend to do so in all future papers!

Effective connectivity or just plumbing? Granger Causality estimates highly reliable maps of venous drainage.

update: for an excellent response to this post, see the comment by Anil Seth at the bottom of this article. Also don’t miss the extended debate regarding the general validity of causal methods for fMRI at Russ Poldrack’s blog that followed this post. 

While the BOLD signal can be a useful measurement of brain function when used properly, the fact that it indexes blood flow rather than neural activity raises more than a few significant concerns. That is to say, when we make inferences on BOLD, we want to be sure the observed effects are causally downstream of actual neural activity, rather than the product of physiological noise such as fluctuations in breath or heart rate. This is a problem for all fMRI analyses, but is particularly tricky for resting state fMRI, where we are interested in signal fluctuations that fall in the same range as respiration and pulse. Now a new study has extended these troubles to granger causality modelling (GCM), a lag-based method for estimating causal interactions between time series, popular in the resting state literature. Just how bad is the damage?

In an article published this week in PLOS ONE, Webb and colleagues analysed over a thousand scans from the Human Connectome database, examining the reliability of GCM estimates and the proximity of the major ‘hubs’ identified by GCM with known major arteries and veins. The authors first found that GCM estimates were highly robust across participants:

Plot showing robustness of GCM estimates across 620 participants. The majority of estimated causes did not show significant differences within or between participants (black datapoints).
Plot showing robustness of GCM estimates across 620 participants. The majority of estimated causes did not show significant differences within or between participants (black datapoints).

They further report that “the largest [most robust] lags are for BOLD Granger causality differences for regions close to large veins and dural venous sinuses”. In other words, although the major ‘upstream’ and ‘downstream’ nodes estimated by GCM are highly robust across participants, regions primarily effecting other regions (e.g. causal outflow) map onto major arteries, whereas regions primarily receiving ‘inputs’  (e.g.  causal inflow) map onto veins. This pattern of ‘causation’ is very difficult to explain as anything other than a non-neural artifact, as it seems like the regions mostly ‘causing’ activity in others are exactly where you would have fresh blood coming into the brain, and regions primarily being influenced by others seem to be areas of major blood drainage. Check out the arteriogram and venogram provided by the authors:

Depiction of major arteries (top image) and veins (bottom). Not overlap with areas of greatest G-cause (below).
Depiction of major arteries (top image) and veins (bottom). Note overlap with areas of greatest G-cause (below).

Compare the above to their thresholded z-statistic map for significant granger causality; white are areas of significant g-causation overlapping with an ateriogram mask, green are significant areas overlapping with a venogram mask:

journal.pone.0084279.g005
From paper:
“Figure 5. Mean Z-statistic for significant Granger causality differences to seed ROIs. Z-statistics were averaged for a given target ROI with the 264 seed ROIs to which it exhibited significantly asymmetric Granger causality relationship. Masks are overlaid for MRI arteriograms (white) and MRI venograms (green) for voxels with greater than 2 standard deviations signal intensity of in-brain voxels in averaged images from 33 (arteriogram) and 34 (venogram) subjects. Major arterial inflow and venous outflow distributions are labeled.”

It’s fairly obvious from the above that a significant proportion of the areas typically G-causing other areas overlap with arteries, whereas areas typically being g-caused by others overlap with veins. This is a serious problem for GCM of resting state fMRI, and worse, these effects were also observed for a comprehensive range of task-based fMRI data. The authors come to the grim conclusion that “Such arterial inflow and venous drainage has a highly reproducible pattern across individuals where major arterial and venous distributions are largely invariant across subjects, giving the illusion of reliable timing differences between brain regions that may be completely unrelated to actual differences in effective connectivity”. Importantly, this isn’t the first time GCM has been called into question. A related concern is the impact of spatial variation in the lag between neural activation and the BOLD response (the ‘hemodynamic response function’, HRF) across the brain. Previous work using simultaneous intracranial and BOLD recordings has shown that due to these lags, GCM can estimate a causal pattern of A then B, whereas the actual neural activity was B then A.

This is because GCM acts in a relatively simple way; given two time-series (A & B), if a better estimate of the future state of B can be predicted by the past fluctation of both A and B than that provided by B alone, then A is said to G-cause B.  However, as we’ve already established, BOLD is a messy and complex signal, where neural activity is filtered through slow blood fluctuations that must be carefully mapped back onto to neural activity using deconvolution methods. Thus, what looks like A then B in BOLD, can actually be due to differences in HRF lags between regions – GCM is blind to this as it does not consider the underlying process producing the time-series. Worse, while this problem can be resolved by combining GCM (which is naïve to the underlying cause of the analysed time series) with an approach that de-convolves each voxel-wise time-series with a canonical HRF, the authors point out that such an approach would not resolve the concern raised here that granger causality largely picks up macroscopic temporal patterns in blood in- and out-flow:

“But even if an HRF were perfectly estimated at each voxel in the brain, the mechanism implied in our data is that similarly oxygenated blood arrives at variable time points in the brain independently of any neural activation and will affect lag-based directed functional connectivity measurements. Moreover, blood from one region may then propagate to other regions along the venous drainage pathways also independent of neural to vascular transduction. It is possible that the consistent asymmetries in Granger causality measured in our data may be related to differences in HRF latency in different brain regions, but we consider this less likely given the simpler explanation of blood moving from arteries to veins given the spatial distribution of our results.”

As for correcting for these effects, the authors suggest that a nuisance variable approach estimating vascular effects related to pulse, respiration, and breath-holding may be effective. However, they caution that the effects observed here (large scale blood inflow and drainage) take place over a timescale an order of magnitude slower than actual neural differences, and that this approach would need extremely precise estimates of the associated nuisance waveforms to prevent confounded connectivity estimates. For now, I’d advise readers to be critical of what can actually  be inferred from GCM until further research can be done, preferably using multi-modal methods capable of directly inferring the impact of vascular confounds on GCM estimates. Indeed, although I suppose am a bit biased, I have to ask if it wouldn’t be simpler to just use Dynamic Causal Modelling, a technique explicitly designed for estimating causal effects between BOLD timeseries, rather than a method originally designed to estimate influences between financial stocks.

References for further reading:

Friston, K. (2009). Causal modelling and brain connectivity in functional magnetic resonance imaging. PLoS biology, 7(2), e33. doi:10.1371/journal.pbio.1000033

Friston, K. (2011). Dynamic causal modeling and Granger causality Comments on: the identification of interacting networks in the brain using fMRI: model selection, causality and deconvolution. NeuroImage, 58(2), 303–5; author reply 310–1. doi:10.1016/j.neuroimage.2009.09.031

Friston, K., Moran, R., & Seth, A. K. (2013). Analysing connectivity with Granger causality and dynamic causal modelling. Current opinion in neurobiology, 23(2), 172–8. doi:10.1016/j.conb.2012.11.010

Webb, J. T., Ferguson, M. a., Nielsen, J. a., & Anderson, J. S. (2013). BOLD Granger Causality Reflects Vascular Anatomy. (P. A. Valdes-Sosa, Ed.)PLoS ONE, 8(12), e84279. doi:10.1371/journal.pone.0084279

Chang, C., Cunningham, J. P., & Glover, G. H. (2009). Influence of heart rate on the BOLD signal: the cardiac response function. NeuroImage, 44(3), 857–69. doi:10.1016/j.neuroimage.2008.09.029

Chang, C., & Glover, G. H. (2009). Relationship between respiration, end-tidal CO2, and BOLD signals in resting-state fMRI. NeuroImage, 47(4), 1381–93. doi:10.1016/j.neuroimage.2009.04.048

Lund, T. E., Madsen, K. H., Sidaros, K., Luo, W.-L., & Nichols, T. E. (2006). Non-white noise in fMRI: does modelling have an impact? Neuroimage, 29(1), 54–66.

David, O., Guillemain, I., Saillet, S., Reyt, S., Deransart, C., Segebarth, C., & Depaulis, A. (2008). Identifying neural drivers with functional MRI: an electrophysiological validation. PLoS biology, 6(12), 2683–97. doi:10.1371/journal.pbio.0060315

Update: This post continued into an extended debate on Russ Poldrack’s blog, where Anil Seth made the following (important) comment 

Hi this is Anil Seth.  What an excellent debate and I hope I can add few quick thoughts of my own since this is an issue close to my heart (no pub intended re vascular confounds).

First, back to the Webb et al paper. They indeed show that a vascular confound may affect GC-FMRI but only in the resting state and given suboptimal TR and averaging over diverse datasets.  Indeed I suspect that their autoregressive models may be poorly fit so that the results rather reflect a sort-of mental chronometry a la Menon, rather than GC per se.
In any case the more successful applications of GC-fMRI are those that compare experimental conditions or correlate GC with some behavioural variable (see e.g. Wen et al.http://www.ncbi.nlm.nih.gov/pubmed/22279213).  In these cases hemodynamic and vascular confounds may subtract out.
Interpreting findings like these means remembering that GC is a description of the data (i.e. DIRECTED FUNCTIONAL connectivity) and is not a direct claim about the underlying causal mechanism (e.g. like DCM, which is a measure of EFFECTIVE connectivity).  Therefore (model light) GC and (model heavy) DCM are to a large extent asking and answering different questions, and to set them in direct opposition is to misunderstand this basic point.  Karl, Ros Moran, and I make these points in a recent review (http://www.ncbi.nlm.nih.gov/pubmed/23265964).
Of course both methods are complex and ‘garbage in garbage out’ applies: naive application of either is likely to be misleading or worse.  Indeed the indirect nature of fMRI BOLD means that causal inference will be very hard.  But this doesn’t mean we shouldn’t try.  We need to move to network descriptions in order to get beyond the neo-phrenology of functional localization.  And so I am pleased to see recent developments in both DCM and GC for fMRI.  For the latter, with Barnett and Chorley I have shown that GC-FMRI is INVARIANT to hemodynamic convolution given fast sampling and low noise (http://www.ncbi.nlm.nih.gov/pubmed/23036449).  This counterintuitive finding defuses a major objection to GC-fMRI and has been established both in theory, and in a range of simulations of increasing biophysical detail.  With the development of low-TR multiband sequences, this means there is renewed hope for GC-fMRI in practice, especially when executed in an appropriate experimental design.  Barnett and I have also just released a major new GC software which avoids separate estimation of full and reduced AR models, avoiding a serious source of bias afflicting previous approaches (http://www.ncbi.nlm.nih.gov/pubmed/24200508).
Overall I am hopeful that we can move beyond premature rejection of promising methods on the grounds they fail when applied without appropriate data or sufficient care.  This applies to both GC and fMRI. These are hard problems but we will get there.

Birth of a New School: How Self-Publication can Improve Research

Edit: click here for a PDF version and citable figshare link!

Preface: What follows is my attempt to imagine a radically different future for research publishing. Apologies for any overlooked references – the following is meant to be speculative and purposely walks the line between paper and blog post. Here is to a productive discussion regarding the future of research.

Our current systems of producing, disseminating, and evaluating research could be substantially improved. For-profit publishers enjoy extremely high taxpayer-funded profit margins. Traditional closed-door peer review is creaking under the weight of an exponentially growing knowledge base, delaying important communications and often resulting in seemingly arbitrary publication decisions1–4. Today’s young researchers are frequently dismayed to find their pain-staking work producing quality reviews overlooked or discouraged by journalistic editorial practices. In response, the research community has risen to the challenge of reform, giving birth to an ever expanding multitude of publishing tools: statistical methods to detect p-hacking5, numerous open-source publication models6–8, and innovative platforms for data and knowledge sharing9,10.

While I applaud the arrival and intent of these tools, I suspect that ultimately publication reform must begin with publication culture – with the very way we think of what a publication is and can be. After all, how can we effectively create infrastructure for practices that do not yet exist? Last summer, shortly after igniting #pdftribute, I began to think more and more about the problems confronting the publication of results. After months of conversations with colleagues I am now convinced that real reform will come not in the shape of new tools or infrastructures, but rather in the culture surrounding academic publishing itself. In many ways our current publishing infrastructure is the product of a paper-based society keen to produce lasting artifacts of scholarly research. In parallel, the exponential arrival of networked society has lead to an open-source software community in which knowledge is not a static artifact but rather an ever-expanding living document of intelligent productivity. We must move towards “research 2.0” and beyond11.

From Wikipedia to Github, open-source communities are changing the way knowledge is produced and disseminated. Already this movement has begun reach academia, with researchers across disciplines flocking to social media, blogs, and novel communication infrastructures to create a new movement of post-publication peer review4,12,13. In math and physics, researchers have already embraced self-publication, uploading preprints to the online repository arXiv, with more and more disciplines using the site to archive their research. I believe that the inevitable future of research communication is in this open-source metaphor, in the form of pervasive self-publication of scholarly knowledge. The question is thus not where are we going, but rather how do we prepare for this radical change in publication culture. In asking these questions I would like to imagine what research will look like 10, 15, or even 20 years from today. This post is intended as a first step towards bringing to light specific ideas for how this transition might be facilitated. Rather than this being a prescriptive essay, here I am merely attempting to imagine what that future may look like. I invite you to treat what follows as an ‘open beta’ for these ideas.

Part 1: Why self-publication?

I believe the essential metaphor is within the open-source software community. To this end over the past few months I have  feverishly discussed the merits and risks of self-publishing scholarly knowledge with my colleagues and peers. While at first I was worried many would find the notion of self-publication utterly absurd, I have been astonished at the responses – many have been excitedly optimistic! I was surprised to find that some of my most critical and stoic colleagues have lost so much faith in traditional publication and peer review that they are ready to consider more radical options.

The basic motivation for research self-publication is pretty simple: research papers cannot be properly evaluated without first being read. Now, by evaluation, I don’t mean for the purposes of hiring or grant giving committees. These are essentially financial decisions, e.g. “how do I effectively spend my money without reading the papers of the 200+ applicants for this position?” Such decisions will always rely on heuristics and metrics that must necessarily sacrifice accuracy for efficiency. However, I believe that self-publication culture will provide a finer grain of metrics than ever dreamed of under our current system. By documenting each step of the research process, self-publication and open science can yield rich information that can be mined for increasingly useful impact measures – but more on that later.

When it comes to evaluating research, many admit that there is no substitute for opening up an article and reading its content – regardless of journal. My prediction is, as post-publication peer review gains acceptance, some tenured researcher or brave young scholar will eventually decide to simply self-publish her research directly onto the internet, and when that research goes viral, the resulting deluge of self-publications will be overwhelming. Of course, busy lives require heuristic decisions and it’s arguable that publishers provide this editorial service. While I will address this issue specifically in Part 3, for now I want to point out that growing empirical evidence suggests that our current publisher/impact-based system provides an unreliable heuristic at best14–16. Thus, my essential reason for supporting self-publication is that in the worst-case scenario, self-publications must be accompanied by the disclaimer: “read the contents and decide for yourself.” As self-publishing practices are established, it is easy to imagine that these difficulties will be largely mitigated by self-published peer reviews and novel infrastructures supporting these interactions.

Indeed, with a little imagination we can picture plenty of potential benefits of self-publication to offset the risk that we might read poor papers. Researchers spend exorbitant amounts of their time reviewing, commenting on, and discussing articles – most of that rich content and meta-data is lost under the current system. In documenting the research practice more thoroughly, the ensuing flood of self-published data can support new quantitative metrics of reviewer trust, and be further utlized in the development of rich information about new ideas and data in near real-time. To give just one example, we might calculate how many subsequent citations or retractions a particular reviewer generates, generating a reviewer impact factor and reliability index. The more aspects of research we publish, the greater the data-mining potential. Incentivizing in-depth reviews that add clarity and conceptual content to research, rather than merely knocking down or propping up equally imperfect artifacts, will ultimately improve research quality. By self-publishing well-documented, open-sourced pilot data and accompanying digital reagents (e.g. scripts, stimulus materials, protocols, etc), researchers can get instant feedback from peers, preventing uncounted research dollars from being wasted. Previously closed-door conferences can become live records of new ideas and conceptual developments as they unfold. The metaphor here is research as open-source – an ever evolving, living record of knowledge as it is created.

Now, let’s contrast this model to the current publishing system. Every publisher (including open-access) obliges researchers to adhere to randomly varied formatting constraints, presentation rules, submission and acceptance fees, and review cultures. Researchers perform reviews for free for often publically subsidized work, so that publishers can then turn around and sell the finished product back to those same researchers (and the public) at an exorbitant mark-up. These constraints introduce lengthy delays – ranging from 6+ months in the sciences all the way up to two years in some humanities disciplines. By contrast, how you self-publish your research is entirely up to you – where, when, how, the formatting, and the openness. Put simply, if you could publish your research how and when you wanted, and have it generate the same “impact” as traditional venues, why would you use a publisher at all?

One obvious reason to use publishers is copy-editing, i.e. the creation of pretty manuscripts. Another is the guarantee of high-profile distribution. Indeed, under the current system these are legitimate worries. While it is possible to produce reasonably formatted papers, ideally the creation of an open-source, easy to use copy-editing software is needed to facilitate mainstream self-publication. Innovators like figshare are already leading the way in this area. In the next section, I will try to theorize some different ways in which self-publication can overcome these and other potential limitations, in terms of specific applications and guidelines for maximizing the utility of self-published research. To do so, I will outline a few specific cases with the most potential for self-publication to make a positive impact on research right away, and hopefully illuminate the ‘why’ question a bit further with some concrete examples.

 Part 2: Where to begin self-publishing

What follows is the “how-to” part of this document. I must preface by saying that although I have written so far with researchers across the sciences and humanities in mind, I will now focus primarily on the scientific examples with which I am more experienced.  The transition to self-publication is already happening in the forms of academic tweets, self-archives, and blogs, at a seemingly exponential growth rate. To be clear, I do not believe that the new publication culture will be utopian. As in many human endeavors the usual brandism3, politics, and corruption can be expected to appear in this new culture. Accordingly, the transition is likely to be a bit wild and woolly around the edges. Like any generational culture shift, new practices must first emerge before infrastructures can be put in place to support them. My hope is to contribute to that cultural shift from artifact to process-based research, outlining particularly promising early venues for self-publication. Once these practices become more common, there will be huge opportunities for those ready and willing to step in and provide rich informational architectures to support and enhance self-publication – but for now we can only step into that wild frontier.

In my discussions with others I have identified three particularly promising areas where self-publication is either already contributing or can begin contributing to research. These are: the publication of exploratory pilot-data, post-publication peer reviews, and trial pre-registration. I will cover each in turn, attempting to provide examples and templates where possible. Finally, Part 3 will examine some common concerns with self-publication. In general, I think that successful reforms should resemble existing research practices as much as possible: publication solutions are most effective when they resemble daily practices that are already in place, rather than forcing individuals into novel practices or infrastructures with an unclear time-commitment. A frequent criticism of current solutions such as the comments section on Frontiers, PLOS One, or the newly developed PubPeer, is that they are rarely used by the general academic population. It is reasonable to conclude that this is because already over-worked academics currently see little plausible benefit from contributing to these discussions given the current publishing culture (worse still, they may fear other negative repercussions, discussed in Part 3). Thus a central theme of the following examples is that they attempt to mirror practices in which many academics are already engaged, with complementary incentive structures (e.g. citations).

Example 1: Exploratory Pilot Data 

This previous summer witnessed a fascinating clash of research cultures, with the eruption of intense debate between pre-registration advocates and pre-registration skeptics. I derived some useful insights from both sides of that discussion. Many were concerned about what would happen to exploratory data under these new publication regimes. Indeed, a general worry with existing reform movements is that they appear to emphasize a highly conservative and somewhat cynical “perfect papers” culture. I do not believe in perfect papers – the scientific model is driven by replication and discovery. No paper can ever be 100% flawless – otherwise there would be no reason for further research! Inevitably, some will find ways to cheat the system. Accordingly, reform must incentivize better reporting practices over stricter control, or at least balance between the two extremes.

Exploratory pilot data is an excellent avenue for this. By their very nature such data are not confirmatory – they are exciting in that they do not conform well to prior predictions. Such data benefit from rapid communication and feedback. Imagine an intuition-based project – a side or pet project conducted on the fly for example. The researcher might feel that the project has potential, but also knows that there could be serious flaws. Most journals won’t publish these kinds of data. Under the current system these data are lost, hidden, obscured, or otherwise forgotten.

Compare to a self-publication world: the researcher can upload the data, document all the protocols, make the presentation and analysis scripts open-source, and provide some well-written documentation explaining why she thinks the data are of interest. Some intrepid graduate student might find it, and follow up with a valuable control analysis, pointing out an excellent feature or fatal flaw, which he can then upload as a direct citation to the original data. Both publications are citable, giving credit to originator and reviewer alike. Armed with this new knowledge, the original researcher could now pre-register an altered protocol and conduct a full study on the subject (or alternatively, abandon the project entirely). In this exchange, it is likely that hundreds of hours and research dollars will have been saved. Additionally, the entire process will have been documented, making it both citable and minable for impact metrics. Tools already exist for each of these steps – but largely cultural fears prevent it from happening. How would it be perceived? Would anyone read it? Will someone steal my idea? To better frame these issues, I will now examine a self-publication practice that has already emerged in force.

 Example 2: Post-publication peer review

This is a particularly easy case, precisely because high-profile scholars are already regularly engaged in the practice. As I’ve frequently joked on twitter, we’re rapidly entering an era where publishing in a glam-mag has no impact guarantee if the paper itself isn’t worthwhile – you may as well hang a target on your head for post-publication peer reviewers. However, I want to emphasize the positive benefits and not just the conservative controls. Post-publication peer review (PPPR) has already begun to change the way we view research, with reviewers adding lasting content to papers, enriching the conclusions one can draw, and pointing out novel connections that were not extrapolated upon by the authors themselves. Here I like to draw an analogy to the open source movement, where code (and its documentation) is forkable, versioned, and open to constant revision – never static but always evolving.

Indeed, just last week PubMed launched their new “PubMed Commons” system, an innovative PPPR comment system, whereby any registered person (with at least one paper on PubMed) can leave scientific comments on articles.  Inevitably, the reception on twitter and Facebook mirrored previous attempts to introduce infrastructure-based solutions – mixed excitement followed by a lot of bemused cynicism – bring out the trolls many joked. To wit, a brief scan of the average comment on another platform, PubPeer, revealed a generally (but not entirely) poor level of comment quality. While many comments seem to be on topic, most had little to no formatting and were given with little context. At times comments can seem trollish, pointing out minor flaws as if they render the paper worthless. In many disciplines like my own, few comments could be found at all. This compounds the central problem with PPPR; why would anyone acknowledge such a system if the primary result is poorly formed nitpicking of your research? The essential problem here is again incentive – for reviews to be quality there needs to be incentive. We need a culture of PPPR that values positive and negative comments equally. This is common to both traditional and self-publication practices.

To facilitate easy, incentivized self-publication of comments and PPPRs, my colleague Hauke Hillebrandt and I have attempted to create a simple template that researchers can use to quickly and easily publish these materials. The idea is that by using these templates and uploading them to figshare or similar services, Google Scholar will automatically index them as citations, provide citation alerts to the original authors, and even include the comments in its h-index calculation. This way researchers can begin to get credit for what they are already doing, in an easy to use and familiar format. While the template isn’t quite working yet (oddly enough, Scholar is counting citations from my blog, but not the template), you can take a look at it here and maybe help us figure out why it isn’t working! In the near future we plan to get this working, and will follow-up this post with the full template, ready for you to use.

Example 3: Pre-registration of experimental trials

As my final example, I suggest that for many researchers, self-publication of trial pre-registrations (PR) may be an excellent way to test the waters of PR in a format with a low barrier to entry. Replication attempts are a particularly promising venue for PR, and self-publication of such registrations is a way to quickly move from idea to registration to collection (as in the above pilot data example), while ensuring that credit for the original idea is embedded in the infamously hard to erase memory of the internet.

A few benefits of PR self-publication, rather than relying on for-profit publishers, is that PR templates can be easily open-sourced themselves, allowing various research fields to generate community-based specialized templates adhering to the needs of that field. Self-published PRs, as well as high quality templates, can be cited – incentivizing the creation and dissemination of both. I imagine the rapid emergence of specialized templates within each community, tailored to the needs of that research discipline.

Part 3: Criticism and limitations

Here I will close by considering some common concerns with self-publication:

Quality of data

A natural worry at this point is quality control. How can we be sure that what is published without the seal of peer review isn’t complete hooey? The primary response is that we cannot, just like we cannot be sure that peer reviewed materials are quality without first reading them ourselves. Still, it is for this reason that I tried to suggest a few particularly ripe venues for self-publication of research. The cultural zeitgeist supporting full-blown scholarly self-publication has not yet arrived, but we can already begin to prepare for it. With regards to filtering noise, I argue that by coupling post-publication peer review and social media, quality self-publications will rise to the top. Importantly, this issue points towards flaws in our current publication culture. In many research areas there are effects that are repeatedly published but that few believe, largely due to the presence of biases against null-findings. Self-publication aims to make as much of the research process publicly available as possible, preventing this kind of knowledge from slipping through the editorial cracks and improving our ability to evaluate the veracity of published effects. If such data are reported cleanly and completely, existing quantitative tools can further incorporate them to better estimate the likelihood of p-hacking within a literature. That leads to the next concern – quality of presentation.

Hemingway's thoughts on data.

Quality of presentation

Many ask: how in this brave new world will we separate signal from noise? I am sure that every published researcher already receives at least a few garbage citations a year from obscure places in obscure journals with little relevance to actual article contents. But, so the worry goes, what if we are deluged with a vast array of poorly written, poorly documented, self-published crud. How would we separate the signal from the noise?

 The answer is Content, Presentation, and Clarity. These must be treated as central guidelines for self-publication to be worth anyone’s time. The Internet memesphere has already generated one rule for ranking interest: content rules. Content floats and is upvoted, blogspam sinks and is downvoted. This is already true for published articles – twitter, reddit, facebook, and email circles help us separate the wheat from the chaff at least as much as impact factor if not more. But presentation and clarity are equally important. Poorly conducted research is not shared, or at least is shared with vehemence. Similarly, poorly written self-publications, or poorly documented data/reagents are unlikely to generate positive feedback, much less impact-generating eyeballs. I like to imagine a distant future in which self-publication has given rise to a new generation of well-regarded specialists: reviewers who are prized for their content, presentation, and clarity; coders who produce cleanly documented pipelines; behaviorists producing powerful and easily customized paradigm scripts; and data collection experts who produce the smoothest, cleanest data around. All of these future specialists will be able to garner impact for the things they already do, incentivizing each step of the research processes rather than only the end product.

Being scooped, intellectual credit

Another common concern is “what if my idea/data/pilot is scooped?” I acknowledge that particularly in these early days, the decision to self-publish must be weighted against this possibility. However, I must also point out that in the current system authors must also weight the decision to develop an idea in isolation against the benefits of communicating with peers and colleagues. Both have risks and benefits – an idea or project in isolation can easily over-estimate its own quality or impact. The decision to self-publish must similarly be weighted against the need for feedback. Furthermore, a self-publication culture would allow researchers to move more quickly from project to publication, ensuring that they are readily credited for their work. And again, as research culture continues to evolve, I believe this concern will increasingly fade. It is notoriously difficult to erase information from The Internet (see the “Streisand effect”) – there is no reason why self-published ideas and data cannot generate direct credit for the authors. Indeed, I envision a world in which these contributions can themselves be independently weighted and credited.

 Prevention of cheating, corruption, self-citations

To some, this will be an inevitable point of departure. Without our time-tested guardian of peer review, what is to prevent a flood of outright fabricated data? My response is: what prevents outright fabrication under the current system? To misquote Jeff Goldblum in Jurassic Park, cheaters will always find a way. No matter how much we tighten our grip, there will be those who respond to the pressures of publication by deliberate misconduct. I believe that the current publication system directly incentivizes such behavior by valuing end product over process. By creating incentives for low-barrier post-publication peer review, pre-registration, and rich pilot data publication, researchers are given the opportunity to generate impact for each step of the research process. When faced with the vast penalties of cheating due to a null finding, versus doing one’s best to turn those data into something useful for someone, I suspect most people will choose the honest and less risky option.

 Corruption and self-citations are perhaps a subtler, more sinister factor. In my discussions with colleagues, a frequent concern is that there is nothing to prevent high-impact “rich club” institutions from banding together to provide glossy post-publication reviews, citation farming, or promoting one another’s research to the top of the pile regardless of content. I again answer: how is this any different from our current system? Papers are submitted to an editor who makes a subjective evaluation of the paper’s quality and impact, before sending it to four out of a thousand possible reviewers who will make an obscure  decision about the content of the paper. Sometimes this system works well, but increasingly it does not2. Many have witnessed great papers rejected for political reasons, or poor ones accepted for the same. Lowering the barrier to post-publication peer review means that even when these factors drive a paper to the top, it will be far easier to contextualize that research with a heavy dose of reality. Over time, I believe self-publication will incentivize good research. Cheating will always be a factor – and this new frontier is unlikely to be a utopia. Rather, I hope to contribute to the development of a bridge between our traditional publishing models and a radically advanced not-too-distant future.

Conclusion

Our current systems of producing, disseminating, and evaluating research increasingly seem to be out of step with cultural and technological realities. To take back the research process and bolster the ailing standard of peer-review I believe research will ultimately adopt an open and largely publisher-free model. In my view, these new practices will be entirely complementary to existing solutions including such as the p-curve5, open-source publication models6–8, and innovative platforms for data and knowledge sharing such as PubPeer, PubMed Commons, and figshare9,10. The next step from here will be to produce useable templates for self-publication. You can expect to see a PDF version of this post in the coming weeks as a further example of self-publishing practices. In attempting to build a bridge to the coming technological and social revolution, I hope to inspire others to join in the conversation so that we can improve all aspects of research.

 Acknowledgments

Thanks to Hauke Hillebrandt, Kate Mills, and Francesca Fardo for invaluable discussion, comments, and edits of this work. Many of the ideas developed here were originally inspired by this post envisioning a self-publication future. Thanks also to PubPeer, PeerJ,  figshare, and others in this area for their pioneering work in providing some valuable tools and spaces to begin engaging with self-publication practices.

Addendum

Excellent resources already exist for the many of the ideas presented here. I want to give special notice to researchers who have already begun self-publishing their work either as preprints, archives, or as direct blog posts. Parallel publishing is an attractive transitional option where researchers can prepublish their work for immediate feedback before submitting it to a traditional publisher. Special notice should be given to Zen Faulkes whose excellent pioneering blog posts demonstrated that it is reasonably easy to self-produce well formatted publications. Here are a few pioneering self-published papers you can use as examples – feel free to add your own in the comments:

The distal leg motor neurons of slipper lobsters, Ibacus spp. (Decapoda, Scyllaridae), Zen Faulkes

http://neurodojo.blogspot.dk/2012/09/Ibacus.html

Eklund, Anders (2013): Multivariate fMRI Analysis using Canonical Correlation Analysis instead of Classifiers, Comment on Todd et al. figshare.

http://dx.doi.org/10.6084/m9.figshare.787696

Automated removal of independent components to reduce trial-by-trial variation in event-related potentials, Dorothy Bishop

http://bishoptechbits.blogspot.dk/2011_05_01_archive.html

Deep Impact: Unintended consequences of journal rank

Björn Brembs, Marcus Munafò

http://arxiv.org/abs/1301.3748

A novel platform for open peer to peer review and publication:

http://thewinnower.com/

A platform for open PPPRs:

https://pubpeer.com/

Another PPPR platform:

http://f1000.com/

References

1. Henderson, M. Problems with peer review. BMJ 340, c1409 (2010).

2. Ioannidis, J. P. A. Why Most Published Research Findings Are False. PLoS Med 2, e124 (2005).

3. Peters, D. P. & Ceci, S. J. Peer-review practices of psychological journals: The fate of published articles, submitted again. Behav. Brain Sci. 5, 187 (2010).

4. Hunter, J. Post-publication peer review: opening up scientific conversation. Front. Comput. Neurosci. 6, 63 (2012).

5. Simonsohn, U., Nelson, L. D. & Simmons, J. P. P-Curve: A Key to the File Drawer. (2013). at <http://papers.ssrn.com/abstract=2256237>

6.  MacCallum, C. J. ONE for All: The Next Step for PLoS. PLoS Biol. 4, e401 (2006).

7. Smith, K. A. The frontiers publishing paradigm. Front. Immunol. 3, 1 (2012).

8. Wets, K., Weedon, D. & Velterop, J. Post-publication filtering and evaluation: Faculty of 1000. Learn. Publ. 16, 249–258 (2003).

9. Allen, M. PubPeer – A universal comment and review layer for scholarly papers? | Neuroconscience on WordPress.com. Website/Blog (2013). at <https://neuroconscience.com/2013/01/25/pubpeer-a-universal-comment-and-review-layer-for-scholarly-papers/>

10. Hahnel, M. Exclusive: figshare a new open data project that wants to change the future of scholarly publishing. Impact Soc. Sci. blog (2012). at <http://eprints.lse.ac.uk/51893/1/blogs.lse.ac.uk-Exclusive_figshare_a_new_open_data_project_that_wants_to_change_the_future_of_scholarly_publishing.pdf>

11. Yarkoni, T., Poldrack, R. A., Van Essen, D. C. & Wager, T. D. Cognitive neuroscience 2.0: building a cumulative science of human brain function. Trends Cogn. Sci. 14, 489–496 (2010).

12. Bishop, D. BishopBlog: A gentle introduction to Twitter for the apprehensive academic. Blog/website (2013). at <http://deevybee.blogspot.dk/2011/06/gentle-introduction-to-twitter-for.html>

13. Hadibeenareviewer. Had I Been A Reviewer on WordPress.com. Blog/website (2013). at <http://hadibeenareviewer.wordpress.com/>

14. Tressoldi, P. E., Giofré, D., Sella, F. & Cumming, G. High Impact = High Statistical Standards? Not Necessarily So. PLoS One 8, e56180 (2013).

15.  Brembs, B. & Munafò, M. Deep Impact: Unintended consequences of journal rank. (2013). at <http://arxiv.org/abs/1301.3748>

16.  Eisen, J. A., Maccallum, C. J. & Neylon, C. Expert Failure: Re-evaluating Research Assessment. PLoS Biol. 11, e1001677 (2013).

http://wl.figshare.com/articles/875339/embed?show_title=1

When is expectation not a confound? On the necessity of active controls.

Learning and plasticity are hot topics in neuroscience. Whether exploring old world wisdom or new age science fiction, the possibility that playing videogames might turn us into attention superheroes or that practicing esoteric meditation techniques might heal troubled minds is an exciting avenue for research. Indeed findings suggesting that exotic behaviors or novel therapeutic treatments might radically alter our brain (and behavior) are ripe for sensational science-fiction headlines purporting vast brain benefits.  For those of you not totally bored of methodological crisis, here we have one brewing anew. You see the standard recommendation for those interested in intervention research is the active-controlled experimental design. Unfortunately in both clinical research on psychotherapy (including meditation) and more Sci-Fi areas of brain training and gaming, use of active controls is rare at best when compared to the more convenient (but causally ineffective) passive control group. Now a new article in Perspectives in Psychological Science suggests that even standard active controls may not be sufficient to rule out confounds in the treatment effect of interest.

Why is that? And why exactly do we need  active controls in the first place? As the authors clearly point out, what you want to show with such a study is the causal efficacy of the treatment of interest. Quite simply what that means is that the thing you think should have some interesting effect should actually be causally responsible for creating that effect. If you want to argue that standing upside down for twenty minutes a day will make me better at playing videogames in Australia, it must be shown that it is actually standing upside down that causes my increased performance down under. If my improved performance on Minecraft Australian Edition is simply a product of my belief in the power of standing upside down, or my expectation that standing upside down is a great way to best kangaroo-creepers, then we have no way of determining what actually produced that performance benefit. Research on placebos and the power of expectations shows that these kinds of subjective beliefs can have a big impact on everything from attentional performance to mortality rates.

Useful flowchart from Boot et al on whether or not a study can make causal claims for treatment.
Useful flowchart from Boot et al on whether or not a study can make causal claims for treatment.

Typically researchers attempt to control for such confounds through the use of a control group performing a task as similar as possible to the intervention of interest. But how do we know participants in the two groups don’t end up with different expectations about how they should improve as a result of the training? Boot et al point out that without actually measuring these variables, we have no idea and no way of knowing for sure that expectation biases don’t produce our observed improvements. They then provide a rather clever demonstration of their concern, in an experiment where participants view videos of various cognition tests as well as videos of a training task they might later receive, in this case either the first-person shooter Unreal Tournament or the spatial puzzle game Tetris. Finally they asked the participants in each group which tests they thought they’d do better on as a result of the training video. Importantly the authors show that not only did UT and Tetris lead to significantly different expectations, but also that those expectation benefits were specific to the modality of trained and tested tasks. Thus participant who watched the action-intensive Unreal Tournament videos expected greater improvements on tests of reaction time and visual performance, whereas participants viewing Tetris rated themselves as likely to do better on tests of spatial memory.

This is a critically important finding for intervention research. Many researchers, myself included, have often thought of the expectation and demand characteristic confounds in a rather general way. Generally speaking until recently I wouldn’t have expected the expectation bias to go much beyond a general “I’m doing something effective” belief. Boot et al show that our participants are a good deal cleverer than that, forming expectations-for-improvement that map onto specific dimensions of training. This means that to the degree that an experimenter’s hypothesis can be discerned from either the training or the test, participants are likely to form unbalanced expectations.

The good news is that the authors provide several reasonable fixes for this dilemma. The first is just to actually measure participant’s expectations, specifically in relation to the measures of interest. Another useful suggestion is to run pilot studies ensuring that the two treatments do not evoke differential expectations, or similarly to check that your outcome measures are not subject to these biases. Boot and colleagues throw the proverbial glove down, daring readers to attempt experiments where the “control condition” actually elicits greater expectations yet the treatment effect is preserved. Further common concerns, such as worries about balancing false positives against false negatives, are address at length.

The entire article is a great read, timely and full of excellent suggestions for caution in future research. It also brought something I’ve been chewing on for some time quite clearly into focus. From the general perspective of learning and plasticity, I have to ask at what point is an expectation no longer a confound. Boot et al give an interesting discussion on this point, in which they suggest that even in the case of balanced expectations and positive treatment effects, an expectation dependent response (in which outcome correlates with expectation) may still give cause for concern as to the causal efficacy of the trained task. This is a difficult question that I believe ventures far into the territory of what exactly constitutes the minimal necessary features for learning. As the authors point out, placebo and expectations effects are “real” products of the brain, with serious consequences for behavior and treatment outcome. Yet even in the medical community there is a growing understanding that such effects may be essential parts of the causal machinery of healing.

Possible outcome of a training experiment, in which the control shows no dependence between expectation and outcome (top panel) and the treatment of interest shows dependence (bottom panel). Boot et al suggest that such a case may invalidate causal claims for treatment efficacy.
Possible outcome of a training experiment, in which the control shows no dependence between expectation and outcome (top panel) and the treatment of interest shows dependence (bottom panel). Boot et al suggest that such a case may invalidate causal claims for treatment efficacy.

To what extent might this also be true of learning or cognitive training? For sure we can assume that expectations shape training outcomes, otherwise the whole point about active controls would be moot. But can one really have meaningful learning if there is no expectation to improve? I realize that from an experimental/clinical perspective, the question is not “is expectation important for this outcome” but “can we observe a treatment outcome when expectations are balanced”. Still when we begin to argue that the observation of expectation-dependent responses in a balanced design might invalidate our outcome findings, I have to wonder if we are at risk of valuing methodology over phenomena. If expectation is a powerful, potentially central mechanism in the causal apparatus of learning and plasticity, we shouldn’t be surprised when even efficacious treatments are modulated by such beliefs. In the end I am left wondering if this is simply an inherent limitation in our attempt to apply the reductive apparatus of science to increasingly holistic domains.

Please do read the paper, as it is an excellent treatment of a critically ignored issue in the cognitive and clinical sciences. Anyone undertaking related work should expect this reference to appear in reviewer’s replies in the near future.

EDIT:
Professor Simons, a co-author of the paper, was nice enough to answer my question on twitter. Simons pointed out that a study that balanced expectation, found group outcome differences, and further found correlations of those differences with expectation could conclude that the treatment was causally efficacious, but that it also depends on expectations (effect + expectation). This would obviously be superior to an unbalanced designed or one without measurement of expectation, as it would actually tell us something about the importance of expectation in producing the causal outcome. Be sure to read through the very helpful FAQ they’ve posted as an addendum to the paper, which covers these questions and more in greater detail. Here is the answer to my specific question:

What if expectations are necessary for a treatment to work? Wouldn’t controlling for them eliminate the treatment effect?

No. We are not suggesting that expectations for improvement must be eliminated entirely. Rather, we are arguing for the need to equate such expectations across conditions. Expectations can still affect the treatment condition in a double-blind, placebo-controlled design. And, it is possible that some treatments will only have an effect when they interact with expectations. But, the key to that design is that the expectations are equated across the treatment and control conditions. If the treatment group outperforms the control group, and expectations are equated, then something about the treatment must have contributed to the improvement. The improvement could have resulted from the critical ingredients of the treatment alone or from some interaction between the treatment and expectations. It would be possible to isolate the treatment effect by eliminating expectations, but that is not essential in order to claim that the treatment had an effect.

In a typical psychology intervention, expectations are not equated between the treatment and control condition. If the treatment group improves more than the control group, we have no conclusive evidence that the ingredients of the treatment mattered. The improvement could have resulted from the treatment ingredients alone, from expectations alone, or from an interaction between the two. The results of any intervention that does not equate expectations across the treatment and control condition cannot provide conclusive evidence that the treatment was necessary for the improvement. It could be due to the difference in expectations alone. That is why double blind designs are ideal, and it is why psychology interventions must take steps to address the shortcomings that result from the impossibility of using a double blind design. It is possible to control for expectation differences without eliminating expectations altogether.

Can compassion be trained like a muscle? Active-controlled fMRI of compassion meditation.

Among the cognitive training literature, meditation interventions are particularly unique in that they often emphasize emotional or affective processing at least as much as classical ‘top-down’ attentional control. From a clinical and societal perspective, the idea that we might be able to “train” our “emotion muscle” is an attractive one. Recently much has been made of the “empathy deficit” in the US, ranging from empirical studies suggesting a relationship between quality-of-care and declining caregiver empathy, to a recent push by President Obama to emphasize the deficit in numerous speeches.

While much of the training literature focuses on cognitive abilities like sustained attention and working memory, many investigating meditation training have begun to study the plasticity of affective function, myself included.  A recent study by Helen Weng and colleagues in Wisconsin investigated just this question, asking if compassion (“loving-kindness”) meditation can alter altruistic behavior and associated neural processing. Her study is one of the first of its kind, in that rather than merely comparing groups of advanced practitioners and controls, she utilized a fully-randomized active-controlled design to see if compassion responds to brief training in novices while controlling for important confounds.

As many readers should be aware, a chronic problem in training studies is a lack of properly controlled longitudinal design. At best, many rely on “passive” or “no-contact” controls who merely complete both measurements without receiving any training. Even in the best of circumstances “active” controls are often poorly matched to whatever is being emphasized and tested in the intervention of interest. While having both groups do “something” is better than a passive or no-control design, problems may still arise if the measure of interest is mismatched to the demand characteristics of the study.  Stated simply, if your condition of interest receives attention training and attention tests, and your control condition receives dieting instruction or relaxation, you can expect group differences to be confounded by an explicit “expectation to improve” in the interest group.

In this regard Weng et al present an almost perfect example of everything a training study should be. Both interventions were delivered via professionally made audio CDs (you can download them yourselves here!), with participants’ daily practice experiences being recorded online. The training materials were remarkably well matched for the tests of interest and extra care was taken to ensure that the primary measures were not presented in a biased way. The only thing they could have done further would be a single blind (making sure the experimenters didn’t know the group identity of each participant), but given the high level of difficulty in blinding these kinds of studies I don’t blame them for not undertaking such a manipulation. In all the study is extremely well-controlled for research in this area and I recommend it as a guideline for best practices in training research.

Specifically, Weng et al tested the impact of loving-kindness compassion meditation or emotion reappraisal training on an emotion regulation fMRI task and behavioral economic game measuring altruistic behavior. For the fMRI task, participants viewed emotional pictures (IAPS) depicting suffering or neutral scenarios and either practiced a compassion meditation or reappraisal strategy to regulate their emotional response, before and after training. After the follow-up scan, good-old fashion experimental deception was used to administer a dictator economics-game that was ostensibly not part of the primary study and involved real live players (both deceptions).

For those not familiar with the dictator game, the concept is essentially that a participant watches a “dictator” endowed with 100$ give “unfair” offers to a “victim” without any money. Weng et al took great care in contextualizing the test purely in economic terms, limiting demand confounds:

Participants were told that they were playing the game with live players over the Internet. Effects of demand characteristics on behavior were minimized by presenting the game as a unique study, describing it in purely economic terms, never instructing participants to use the training they received, removing the physical presence of players and experimenters during game play, and enforcing real monetary consequences for participants’ behavior.

This is particularly important, as without these simple manipulations it would be easy for stodgy reviewers like myself to worry about subtle biases influencing behavior on the task. Equally important is the content of the two training programs. If for example, Weng et al used a memory training or attention task as their active-control group, it would be difficult not to worry that behavioral differences were due to one group expecting a more emotional consequence of the study, and hence acting more altruistic. In the supplementary information, Weng et al describe the two training protocols in great detail:

Compassion

… Participants practiced compassion for targets by 1) contemplating and envisioning their suffering and then 2) wishing them freedom from that suffering. They first practiced compassion for a Loved One, such as a friend or family member. They imagined a time their loved one had suffered (e.g., illness, injury, relationship problem), and were instructed to pay attention to the emotions and sensations this evoked. They practiced wishing that the suffering were relieved and repeated the phrases, “May you be free from this suffering. May you have joy and happiness.” They also envisioned a golden light that extended from their heart to the loved one, which helped to ease his/her suffering. They were also instructed to pay attention to bodily sensations, particularly around the heart. They repeated this procedure for the Self, a Stranger, and a Difficult Person. The Stranger was someone encountered in daily life but not well known (e.g., a bus driver or someone on the street), and the Difficult Person was someone with whom there was conflict (e.g., coworker, significant other). Participants envisioned hypothetical situations of suffering for the stranger and difficult person (if needed) such as having an illness or experiencing a failure. At the end of the meditation, compassion was extended towards all beings. For each new meditation session, participants could choose to use either the same or different people for each target category (e.g., for the loved one category, use sister one day and use father the next day).

Reappraisal

… Participants were asked to recall a stressful experience from the past 2 years that remained upsetting to them, such as arguing with a significant other or receiving a lower-than- expected grade. They were instructed to vividly recall details of the experience (location, images, sounds). They wrote a brief description of the event, and chose one word to best describe the feeling experienced during the event (e.g., sad, angry, anxious). They rated the intensity of the feeling during the event, and the intensity of the current feeling on a scale (0 = No feeling at all, 100 = Most intense feeling in your life). They wrote down the thoughts they had during the event in detail. Then they were asked to reappraise the event (to think about it in a different, less upsetting way) using 3 different strategies, and to write down the new thoughts. The strategies included 1) thinking about the situation from another person’s perspective (e.g., friend, parent), 2) viewing it in a way where they would respond with very little emotion, and 3) imagining how they would view the situation if a year had passed, and they were doing very well. After practicing each strategy, they rated how reasonable each interpretation was (0 = Not at all reasonable, 100 = Completely reasonable), and how badly they felt after considering this view (0 = Not bad at all, 100 = Most intense ever). Day to day, participants were allowed to practice reappraisal with the same stressful event, or choose a different event. Participants logged the amount of minutes practiced after the session.

In my view the active control is extremely well designed for the fMRI and economic tasks, with both training methods explicitly focusing on the participant altering an emotional response to other individuals.  In tests of self-rated efficacy, both groups showed significant decreases in negative emotion, further confirming the active control. Interestingly when Weng et al compared self-ratings over time, only the compassion group showed significant reduction from the first half of training sessions to the last. I’m not sure if this constitutes a limitation, as Weng et al further report that on each individual training day the reappraisal group reported significant reductions, but that the reductions themselves did not differ significantly over time. They explain this as being likely due to the fact that the reappraisal group frequently changed emotional targets, whereas the compassion group had the same 3 targets throughout the training. Either way the important point is that both groups self-reported similar overall reductions in negative emotion during the course of the study, strongly supporting the active control.

Now what about the findings? As mentioned above, Weng et al tested participants before and after training on an fMRI emotion regulation task. After the training, all participants performed the “dictator game”, shown below. After rank-ordering the data, they found that the compassion group showed significantly greater redistribution:

The dictator task (left) and increased redistribution (right).

For the fMRI analysis, they analyzed BOLD responses to negative vs neutral images at both time points, subtracted the beta coefficients, and then entered these images into a second-level design matrix testing the group difference, with the rank-ordered redistribution scores as a covariate of interest. They then tested for areas showing group differences in the correlation of redistribution scores and changes of BOLD response to negative vs neutral images (pre vs post), across the whole brain and in several ROIs, while properly correcting for multiple comparisons. Essentially this analysis asks, where in the brain do task-related changes in BOLD correlate more or less with the redistribution score in one group or another. For the group x covariate interaction they found significant differences (increased BOLD-covariate correlation) in the right inferior parietal cortex (IPC), a region of the parietal attention network, shown on the left-hand panel:

Increased correlation between negative vs neutral imagery related BOLD and redistribution scores (left), connectivity with DLPFC (right).

They further extracted signal from the IPC cluster and entered it into a conjunction analysis, testing for areas showing significant correlation  with the IPC activity, and found a strong effect in right DLPFC (right panel). Finally they performed a psychophysiological interaction (PPI) analysis with the right DLPFC activity as the seed, to determine regions showing significant task-modulated connectivity with that DLPFC activity. The found increased emotion-modulated DLPFC connectivity to nucleus accumbens, a region involved in encoding positive rewards (below, right).

Screen shot 2013-05-23 at 3.21.15 PM
PPI shows increased emotion-modulated connectivity of nucleus accumbens and DLPFC.

Together these results implicate training-related BOLD activity increases to emotional stimuli in the parietal attention network and increased parietal connectivity with regions implicated in cognitive control and reward processing, in the observed altruistic behavior differences. The authors conclude that compassion training may alter emotional processing through a novel mechanism, where top-down central-executive circuits redirect emotional information to areas associated with positive reward, reflecting the role of compassion meditation in emphasizing increased positive emotion to the aversive states of others. A fitting and interesting conclusion, I think.

Overall, the study should receive high marks for its excellent design and appropriate statistical rigor. There is quite a bit of interesting material in the supplementary info, a strategy I dislike, but that is no fault of the authors considering the publishing journal (Psych Science). The question itself is extremely novel, in terms of previous active-controlled studies. To date only one previous active-controlled study investigated the role of compassion meditation on empathy-related neuroplasticity. However that study compared compassion meditation with a memory strategy course, which (in my opinion) exposes it to serious criticism regarding demand characteristic. The authors do reference that study, but only briefly to state that both studies support a role of compassion training in altering positive emotion- personally I would have appreciated a more thorough comparison, though I suppose I can go and to that myself if I feel so inclined :).

The study does have a few limitations worth mentioning. One thing that stood out to me was that the authors never report the results of the overall group mean contrast for negative vs neutral images. I would have liked to know if the regions showing increased correlation with redistribution actually showed higher overall mean activation increases during emotion regulation. However as the authors clearly had quite specific hypotheses, leading them to  restrict their alpha to 0.01 (due to testing 1 whole-brain contrast and 4 ROIs), I can see why they left this out. Given the strong results of the study, it would in retrospect perhaps have been more prudent to skip  the ROI analysis (which didn’t seem to find much) and instead focus on testing the whole brain results.  I can’t blame them however, as it is surprising not to see anything going on in insula or amygdala for this kind of training.  It is also a bit unclear to me why the DLPFC was used as the PPI seed as opposed to the primary IPC cluster, although I am somewhat unfamiliar with the conjunction-connectivity analysis used here. Finally, as the authors themselves point out, a major limitation of the study is that the redistribution measure was collected only at time two, preventing a comparison to baseline for this measure.

Given the methodological state of the topic (quite poor, generally speaking), I am willing to grant them these mostly minor caveats. Of course, without a baseline altruism measure it is difficult to make a strong conclusion about the causal impact of the meditation training on altruism behavior, but at least their neural data are shielded from this concern. So while we can’t exhaustively conclude that compassion can be trained, the results of this study certainly suggest it is possible and perhaps even likely, providing a great starting point for future research. One interesting thing for me was the difference in DLPFC. We also found task-related increases in dorsolateral prefrontal cortex following active-controlled meditation, although in the left hemisphere and for a very different kind of training and task. One other recent study of smoking cessation also reported alteration in DLPFC following mindfulness training, leading me to wonder if we’re seeing the emergence of empirical consensus for this region’s specific involvement in meditation training. Another interesting point for me was that affective regulation here seems to involve primarily top-down or attention related neural correlates,  suggesting that bottom-up processing (insula, amygdala) may be more resilient to brief training, something we also found in our study. I wonder if the group mean-contrasts would have been revealing here (i.e. if there were differences in bottom-up processing that don’t correlate with redistribution). All together a great study that raises the bar for training research in cognitive neuroscience!

Is the resting BOLD signal physiological noise? What about resting EEG?

Over the past 5 years, resting-state fMRI (rsfMRI) has exploded in popularity. Literally dozens of papers are published each day examining slow (< .1 hz) or “low frequency” fluctuations in the BOLD signal. When I first moved to Europe I was caught up in the somewhat North American frenzy of resting state networks. I couldn’t understand why my Danish colleagues, who specialize in modelling physiological noise in fMRI, simply did not take the literature seriously. The problem is essentially that the low frequencies examined in these studies are the same as those that dominate physiological rhythms. Respiration and cardiac pulsation can make up a massive amount of variability in the BOLD signal. Before resting state fMRI came along, nearly every fMRI study discarded any data frequencies lower than one oscillation every 120 seconds (e.g. 1/120 Hz high pass filtering). Simple things like breath holding and pulsatile motion in vasculature can cause huge effects in BOLD data, and it just so happens that these artifacts (which are non-neural in origin) tend to pool around some of our favorite “default” areas: medial prefrontal cortex, insula, and other large gyri near draining veins.

Naturally this leads us to ask if the “resting state networks” (RSNs) observed in such studies are actually neural in origin, or if they are simply the result of variations in breath pattern or the like. Obviously we can’t answer this question with fMRI alone. We can apply something like independent component analysis (ICA) and hope that it removes most of the noise- but we’ll never really be 100% sure we’ve gotten it all that way. We can measure the noise directly (e.g. “nuisance covariance regression”) and include it in our GLM- but much of the noise is likely to be highly correlated with the signal we want to observe. What we need are cross-modality validations that low-frequency oscillations do exist, that they drive observed BOLD fluctuations, and that these relationships hold even when controlling for non-neural signals. Some of this is already established- for example direct intracranial recordings do find slow oscillations in animal models. In MEG and EEG, it is well established that slow fluctuations exist and have a functional role.

So far so good. But what about in fMRI? Can we measure meaningful signal while controlling for these factors? This is currently a topic of intense research interest. Marcus Raichle, the ‘father’ of the default mode network, highlights fascinating multi-modal work from a Finnish group showing that slow fluctuations in behavior and EEG signal coincide (Raichle and Snyder 2007; Monto, Palva et al. 2008). However, we should still be cautious- I recently spoke to a post-doc from the Helsinki group about the original paper, and he stressed that slow EEG is just as contaminated by physiological artifacts as fMRI. Except that the problem is even worse, because in EEG the artifacts may be several orders of magnitude larger than the signal of interest[i].

Understandably I was interested to see a paper entitled “Correlated slow fluctuations in respiration, EEG, and BOLD fMRI” appear in Neuroimage today (Yuan, Zotev et al. 2013). The authors simultaneously collected EEG, respiration, pulse, and resting fMRI data in 9 subjects, and then perform cross-correlation and GLM analyses on the relationship of these variables, during both eyes closed and eyes open rest. They calculate Respiratory Volume per Time (RVT), a measure developed by Rasmus Birn, to assign a respiratory phase to each TR (Birn, Diamond et al. 2006). One key finding is that the global variations in EEG power are strongly predicted by RVT during eyes closed rest, with a maximum peak correlation coefficient of .40. Here are the two time series:

RVTalpha 

You can clearly see that there is a strong relationship between global alpha (GFP) and respiration (RVT). The authors state that “GFP appears to lead RVT” though I am not so sure. Regardless, there is a clear relationship between eyes closed ‘alpha’ and respiration. Interestingly they find that correlations between RVT and GFP with eyes open were not significantly different from chance, and that pulse did not correlate with GFP. They then conduct GLM analyses with RVT and GFP as BOLD regressors. Here is what their example subject looked like during eyes-closed rest:

RVT_GFP_BOLD

Notice any familiar “RSNs” in the RVT map? I see anti-correlated executive deactivation and default mode activation! Very canonical.  Too bad they are breath related. This is why noise regression experts tend to dislike rsfMRI, particularly when you don’t measure the noise. We also shouldn’t be too surprised that the GFP-BOLD and RVT-BOLD maps look similar, considering that GFP and RVT are highly correlated. After looking at these correlations separately, Yuan et al perform RETROICOR physiological noise correction and then reexamine the contrasts. Here are the group maps:

group_map

Things look a bit less default-mode-like in the group RVT map, but the RVT and GFP maps are still clearly quite similar. In panel D you can see that physiological noise correction has a large global impact on GFP-BOLD correlations, suggesting that quite a bit of this co-variance is driven by physiological noise. Put simply, respiration is explaining a large degree of alpha-BOLD correlation; any experiment not modelling this covariance is likely to produce strongly contaminated results. Yuan et al go on to examine eyes-open rest and show that, similar to their RVT-GFP cross-correlation analysis, not nearly as much seems to be happening in eyes open compared to closed:

eyesopen

The authors conclude that “In particular, this correlation between alpha EEG and respiration is much stronger in eyes-closed resting than in eyes-open resting” and that “[the] results also suggest that eyes-open resting may be a more favorable condition to conduct brain resting state fMRI and for functional connectivity analysis because of the suppressed correlation between low-frequency respiratory fluctuation and global alpha EEG power, therefore the low-frequency physiological noise predominantly of non-neuronal origin can be more safely removed.” Fair enough- one conclusion is certainly that eyes closed rest seems much more correlated with respiration than eyes open. This is a decent and useful result of the study. But then they go on to make this really strange statement, which appears in the abstract, introduction, and discussion:

“In addition, similar spatial patterns were observed between the correlation maps of BOLD with global alpha EEG power and respiration. Removal of respiration related physiological noise in the BOLD signal reduces the correlation between alpha EEG power and spontaneous BOLD signals measured at eyes-closed resting. These results suggest a mutual link of neuronal origin between the alpha EEG power, respiration, and BOLD signals”’ (emphasis added)

That’s one way to put it! The logic here is that since alpha = neural activity, and respiration correlates with alpha, then alpha must be the neural correlate of respiration. I’m sorry guys, you did a decent experiment, but I’m afraid you’ve gotten this one wrong. There is absolutely nothing that implies alpha power cannot also be contaminated by respiration-related physiological noise. In fact it is exactly the opposite- in the low frequencies observed by Yuan et al the EEG data is particularly likely to be contaminated by physiological artifacts! And that is precisely what the paper shows – in the author’s own words: “impressively strong correlations between global alpha and respiration”. This is further corroborated by the strong similarity between the RVT-BOLD and alpha-BOLD maps, and the fact that removing respiratory and pulse variance drastically alters the alpha-BOLD correlations!

So what should we take away from this study? It is of course inconclusive- there are several aspects of the methodology that are puzzling to me, and sadly the study is rather under-powered at n = 9. I found it quite curious that in each of the BOLD-alpha maps there seemed to be a significant artifact in the lateral and posterior ventricles, even after physiological noise correction (check out figure 2b, an almost perfect ventricle map). If their global alpha signal is specific to a neural origin, why does this artifact remain even after physiological noise correction? I can’t quite put my finger on it, but it seems likely to me that some source of noise remained even after correction- perhaps a reader with more experience in EEG-fMRI methods can comment. For one thing their EEG motion correction seems a bit suspect, as they simply drop outlier timepoints. One way or another, I believe we should take one clear message away from this study – low frequency signals are not easily untangled from physiological noise, even in electrophysiology. This isn’t a damnation of all resting state research- rather it is a clear sign that we need be to measuring these signals to retain a degree of control over our data, particularly when we have the least control at all.

References:

Birn, R. M., J. B. Diamond, et al. (2006). “Separating respiratory-variation-related fluctuations from neuronal-activity-related fluctuations in fMRI.” Neuroimage 31(4): 1536-1548.

Monto, S., S. Palva, et al. (2008). “Very slow EEG fluctuations predict the dynamics of stimulus detection and oscillation amplitudes in humans.” The Journal of Neuroscience 28(33): 8268-8272.

Raichle, M. E. and A. Z. Snyder (2007). “A default mode of brain function: a brief history of an evolving idea.” Neuroimage 37(4): 1083-1090.

Yuan, H., V. Zotev, et al. (2013). “Correlated Slow Fluctuations in Respiration, EEG, and BOLD fMRI.” NeuroImage pp. 1053-8119.

 


[i] Note that this is not meant to be in anyway a comprehensive review. A quick literature search suggests that there are quite a few recent papers on resting BOLD EEG. I recall a well done paper by a group at the Max Planck Institute that did include noise regressors, and found unique slow BOLD-EEG relations. I cannot seem to find it at the moment however!

 

MOOC on non-linear approaches to social and cognitive sciences. Votes needed!

My colleagues at Aarhus University have put together a fascinating proposal for a Massive Online Open Course (MOOC) on “Analyzing Behavioral Dynamics: non-linear approaches to social and cognitive sciences”. I’ve worked with Riccardo and Kristian since my masters and I can promise you the course will be excellent. They’ve spent the past 5 years exhaustively pursuing methodology in non-linear dynamics, graph theoretical, and semantic/semiotic analyses and I think will have a lot of interesting practical insights to offer. Best of all the course is free to all, as long as it gets enough votes on the MPF website. I’ve been a bit on the fence regarding my feelings about MOOCs, but in this case I think it’s really a great opportunity to give novel methodologies more exposure. Check it out- if you like it, give them a vote and consider joining the course!

https://moocfellowship.org/submissions/analyzing-behavioral-dynamics-non-linear-approaches-to-social-and-cognitive-sciences

Course Description

In the last decades, the social sciences have come to confront the temporal nature of human behavior and cognition: How do changes of heartbeat underlie emotions? How do we regulate our voices in a conversation? How do groups develop coordinative strategies to solve complex problems together?
This course enables you to tackle this sort of questions: addresses methods of analysis from nonlinear dynamics and complexity theory, which are designed to find and characterize patterns in this kind of complicated data. Traditionally developed in fields like physics and biology, non-linear methods are often neglected in social and cognitive sciences.

The course consists of two parts:

  1. The dynamics of behavior and cognition
    In this part of the course you are introduced some examples of human behavior that challenge the assumptions of linear statistics: reading time, voice dynamics in clinical populations, etc. You are then shown step-by-step how to characterize and quantify patterns and temporal dynamics in these behaviors using non-linear methods, such as recurrence quantification analysis.
  2. The dynamics of interpersonal coordination
    In this second part of the course we focus on interpersonal coordination: how do people manage to coordinate action, emotion and cognition? We consider several real-world cases: heart beat synchronization during firewalking rituals, voice adaptation during conversations, joint problem solving in creative tasks – such as building Lego models together. You are then shown ways to analyze how two or more behaviors are coordinated and how to characterize their coupling – or lack-thereof.

This course provides a theoretical and practical introduction to non-linear techniques for social and cognitive sciences. It presents concrete case studies from actual research projects on human behavior and cognition. It encourages you to put all this to practice via practical exercises and quizzes. By the end of this course you will be fully equipped to go out and do your own research projects applying non-linear methods on human behavior and coordination.

Learning objectives

  • Given a timeseries (e.g. a speech recording, or a sequence of reaction times), characterize its patterns: does it contain repetitions? How stable? How complex?
  • Given a timeseries (e.g. a speech recording, or a sequence of reaction times), characterize how it changes over time.
  • Given two timeseries (e.g. the movements of two dancers) characterize their coupling: how do they coordinate? Do they become more similar over time? Can you identify who is leading and who is following?

MOOC relevance

Social and cognitive research is increasingly investigating phenomena that are temporally unfolding and non-linear. However, most educational institutions only offer courses in linear statistics for social scientists. Hence, there is a need for an easy to understand introduction to non-linear analytical tools in a way that is specifically aimed at social and cognitive sciences. The combination of actual cases and concrete tools to analyze them will give the course a wide appeal.
Additionally, methods oriented courses on MOOC platforms such as Coursera have generally proved very attractive for students.

Please spread the word about this interesting course!

Correcting your naughty insula: modelling respiration, pulse, and motion artifacts in fMRI

important update: Thanks to commenter “DS”, I discovered that my respiration-related data was strongly contaminated due to mechanical error. The belt we used is very susceptible to becoming uncalibrated, if the subject moves or breathes very deeply for example. When looking at the raw timecourse of respiration I could see that many subjects, included the one displayed here, show a great deal of “clipping” in the timeseries. For the final analysis I will not use the respiration regressors, but rather just the pulse and motion. Thanks DS!

As I’m working my way through my latest fMRI analysis, I thought it might be fun to share a little bit of that here. Right now i’m coding up a batch pipeline for data from my Varela-award project, in which we compared “adept” meditation practitioners with motivation, IQ, age, and gender-matched controls on a response-inhibition and error monitoring task. One thing that came up in the project proposal meeting was a worry that, since meditation practitioners spend so much time working with the breath, they might respirate differently either at rest or during the task. As I’ve written about before, respiration and other related physiological variables such as cardiac-pulsation induced motion can seriously impact your fMRI results (when your heart beats, the veins in your brain pulsate, creating slight but consistent and troublesome MR artifacts). As you might expect, these artifacts tend to be worse around the main draining veins of the brain, several of which cluster around the frontoinsular and medial-prefrontal/anterior cingulate cortices. As these regions are important for response-inhibition and are frequently reported in the meditation literature (without physiological controls), we wanted to try to control for these variables in our study.

disclaimer: i’m still learning about noise modelling, so apologies if I mess up the theory/explanation of the techniques used! I’ve left things a bit vague for that reason. See bottom of article for references for further reading. To encourage myself to post more of these “open-lab notes” posts, I’ve kept the style here very informal, so apologies for typos or snafus. 😀

To measure these signals, we used the respiration belt and pulse monitor that come standard with most modern MRI machines. The belt is just a little elastic hose that you strap around the chest wall of the subject, where it can record expansions and contractions of the chest to give a time series corresponding to respiration, and the pulse monitor a standard finger clip. Although I am not an expert on physiological noise modelling, I will do my best to explain the basic effects you want to model out of your data. These “non-white” noise signals include pulsation and respiration-induced motion (when you breath, you tend to nod your head just slightly along the z-axis), typical motion artifacts, and variability of pulsation and respiration. To do this I fed my physiological parameters into an in-house function written by Torben Lund, which incorporates a RETROICOR transformation of the pulsation and respiration timeseries. We don’t just use the raw timeseries due to signal aliasing- the phsyio data needs to be shifted to make each physiological event correspond to a TR. The function also calculates the respiratory volume time delay (RVT), a measure developed by Rasmus Birn, to model the variability in physiological parameters1. Variability in respiration and pulse volume (if one group of subjects tend to inhale sharply for some conditions but not others, for example) is more likely to drive BOLD artifacts than absolute respiratory volume or frequency (if one group of subjects tend to inhale sharply for some conditions but not others, for example). Finally, as is standard, I included the realignment parameters to model subject motion-related artifacts. Here is a shot of my monster design matrix for one subject:

DM_NVR

You can see that the first 7 columns model my conditions (correct stops, unaware errors, aware errors, false alarms, and some self-report ratings), the next 20 model the RETROICOR transformed pulse and respiration timeseries, 41 columns for RVT, 6 for realignment pars, and finally my session offsets and constant. It’s a big DM, but since we have over 1000 degrees of freedom, i’m not too worried about all the extra regressors in terms of loss of power. What would be worrisome is if for example stop activity correlated strongly with any of the nuisance variables –  we can see from the orthogonality plot that in this subject at least, that is not the case. Now lets see if we actually have anything interesting left over after we remove all that noise:

stop SPM

We can see that the Stop-related activity seems pretty reasonable, clustering around the motor and premotor cortex, bilateral insula, and DLPFC, all canonical motor inhibition regions (FWE-cluster corrected p = 0.05). This is a good sign! Now what about all those physiological regressors? Are they doing anything of value, or just sucking up our power? Here is the f-contrast over the pulse regressors:

pulse

Here we can see that the peak signal is wrapped right around the pons/upper brainstem. This makes a lot of sense- the area is full of the primary vasculature that ferries blood into and out of the brain. If I was particularly interested in getting signal from the brainstem in this project, I could use a respiration x pulse interaction regressor to better model this6. Penny et al find similar results to our cardiac F-test when comparing AR(1) with higher order AR models [7]. But since we’re really only interested in higher cortical areas, the pulse regressor should be sufficient. We can also see quite a bit of variance explained around the bilateral insula and rostral anterior cingulate. Interestingly, our stop-related activity still contained plenty of significant insula response, so we can feel better that some but not all of the signal from that region is actually functionally relevant. What about respiration?

resp

Here we see a ton of variance explained around the occipital lobe. This makes good sense- we tend to just slightly nod our head back and forth along the z-axis as we breath. What we are seeing is the motion-induced artifact of that rotation, which is most severe along the back of the head and periphery of the brain. We see a similar result for the overall motion regressors, but flipped to the front:

Ignore the above, respiration regressor is not viable due to “clipping”, see note at top of post. Glad I warned everyone that this post was “in progress” 🙂 Respiration should be a bit more global, restricted to ventricles and blood vessels.

motion

Wow, look at all the significant activity! Someone call up Nature and let them know, motion lights up the whole brain! As we would expect, the motion regressor explains a ton of uninteresting variance, particularly around the prefrontal cortex and periphery.

I still have a ways to go on this project- obviously this is just a single subject, and the results could vary wildly. But I do think even at this point we can start to see that it is quite easy and desirable to model these effects in your data (Note: we had some technical failure due to the respiration belt being a POS…) I should note that in SPM, these sources of “non-white” noise are typically modeled using an autoregressive (AR(1)) model, which is enabled in the default settings (we’ve turned it off here). However as there is evidence that this model performs poorly at faster TRs (which are the norm now), and that a noise-modelling approach can greatly improve SnR while removing artifacts, we are likely to get better performance out of a nuisance regression technique as demonstrated here [4]. The next step will be to take these regressors to a second level analysis, to examine if the meditation group has significantly more BOLD variance-explained by physiological noise than do controls. Afterwards, I will re-run the analysis without any physio parameters, to compare the results of both.

References:


1. Birn RM, Diamond JB, Smith MA, Bandettini PA.
Separating respiratory-variation-related fluctuations from neuronal-activity-related fluctuations in fMRI.
Neuroimage. 2006 Jul 15;31(4):1536-48. Epub 2006 Apr 24.

2. Brooks J.C.W., Beckmann C.F., Miller K.L. , Wise R.G., Porro C.A., Tracey I., Jenkinson M.
Physiological noise modelling for spinal functional magnetic resonance imaging studies
NeuroImage in press: DOI: doi: 10.1016/j.neuroimage.2007.09.018

3. Glover GH, Li TQ, Ress D.
Image-based method for retrospective correction of physiological motion effects in fMRI: RETROICOR.
Magn Reson Med. 2000 Jul;44(1):162-7.

4. Lund TE, Madsen KH, Sidaros K, Luo WL, Nichols TE.
Non-white noise in fMRI: does modelling have an impact?
Neuroimage. 2006 Jan 1;29(1):54-66.

5. Wise RG, Ide K, Poulin MJ, Tracey I.
Resting fluctuations in arterial carbon dioxide induce significant low frequency variations in BOLD signal.
Neuroimage. 2004 Apr;21(4):1652-64.

2. Brooks J.C.W., Beckmann C.F., Miller K.L. , Wise R.G., Porro C.A., Tracey I., Jenkinson M.
Physiological noise modelling for spinal functional magnetic resonance imaging studies
NeuroImage in press: DOI: doi: 10.1016/j.neuroimage.2007.09.018

7. Penny, W., Kiebel, S., & Friston, K. (2003). Variational Bayesian inference for fMRI time series. NeuroImage, 19(3), 727–741. doi:10.1016/S1053-8119(03)00071-5

Mindfulness and neuroplasticity – summary of my recent paper.

First, let me apologize for an overlong hiatus from blogging. I submitted my PhD thesis October 1st, and it turns out that writing two papers and a thesis in the space of about three months can seriously burn out the old muse. I’ve coaxed her back through gentle offerings of chocolate, caffeine, and a bit of videogame binging. As long as I promise not to bring her within a mile of a dissertation, I believe we’re good for at least a few posts per month.

With that taken care of, I am very happy to report the successful publication of my first fMRI paper, published last month in the Journal of Neuroscience. The paper was truly a labor of love taking nearly 3 years to complete and countless hours of head-scratching work. In the end I am quite happy with the finished product, and I do believe my colleagues and I managed to produce a useful result for the field of mindfulness training and neuroplasticity.

note: this post ended up being quite long. if you are already familiar with mindfulness research, you may want to skip ahead!

Why mindfulness?

First, depending on what brought you here, you may already be wondering why mindfulness is an interesting subject, particularly for a cognitive neuroscientist. In light of the large gaps regarding our understanding of the neurobiological foundations of neuroimaging, is it really the right time to apply these complex tools to meditation?  Can we really learn anything about something as potentially ambiguous as “mindfulness”? Although we have a long way to go, and these are certainly fair questions, I do believe that the study of meditation has a lot to contribute to our understanding of cognition and plasticity.

Generally speaking, when you want to investigate some cognitive phenomena, a firm understanding of your target is essential to successful neuroimaging. Areas with years of behavioral research and concrete theoretical models make for excellent imaging subjects, as in these cases a researcher can hope to fall back on a sort of ‘ground truth’ to guide them through the neural data, which are notoriously ambiguous and difficult to interpret. Of course well-travelled roads also have their disadvantages, sometimes providing a misleading sense of security, or at least being a bit dry. While mindfulness research still has a ways to go, our understanding of these practices is rapidly evolving.

At this point it helps to stop and ask, what is meditation (and by extension, mindfulness)? The first thing to clarify is that there is no such thing as “meditation”- rather meditation is really term describing a family resemblance of highly varied practices, covering an array of both spiritual and secular practices. Meditation or “contemplative” practices have existed for more than a thousand years and are found in nearly every spiritual tradition. More recently, here in the west our unending fascination of the esoteric has lead to a popular rise in Yoga, Tai Chi, and other physically oriented contemplative practices, all of which incorporate an element of meditation.

At the simplest level of description [mindfulness] meditation is just a process of becoming aware, whether through actual sitting meditation, exercise, or daily rituals.  Meditation (as a practice) was first popularized in the west during the rise of transcendental meditation (TM). As you can see in the figure below, interest in TM lead to an early boom in research articles. This boom was not to last, as it was gradually realized that much of this initially promising research was actually the product of zealous insiders, conducted with poor controls and in some cases outright data fabrication. As TM became known as  a cult, meditation research underwent a dark age where publishing on the topic could seriously damage a research career. We can see also that around the 1990’s, this trend started to reverse as a new generation of researchers began investigating “mindfulness” meditation.

pubmed graphy thing
Sidenote: research everywhere is expanding. Shouldn’t we start controlling these highly popular “pubs over time” figures for total publishing volume? =)

It’s easy to see from the above why when Jon Kabat-Zinn re-introduced meditation to the West, he relied heavily on the medical community to develop a totally secularized intervention-oriented version of meditation strategically called “mindfulness-based stress reduction.” The arrival of MBSR was closely related to the development of mindfulness-based cognitive therapy (MBCT), a revision of cognitive-behavioral therapy utilizing mindful practices and instruction for a variety of clinical applications. Mindfulness practice is typically described as involving at least two practices; focused attention (FA) and open monitoring (OM). FA can be described as simply noticing when attention wanders from a target (the breath, the body, or a flower for example) and gently redirecting it back to that target. OM is typically (but not always) trained at an later stage, building on the attentional skills developed in FA practice to gradually develop a sense of “non-judgmental open awareness”. While a great deal of work remains to be done, initial cognitive-behavioral and clinical research on mindfulness training (MT) has shown that these practices can improve the allocation of attentional resources, reduce physiological stress, and improve emotional well-being. In the clinic MT appears to effectively improve symptoms on a variety of pathological syndromes including anxiety and depression, at least as well as standard CBT or pharmacological treatments.

Has the quality of research on meditation improved since the dark days of TM? When answering this question it is important to note two things about the state of current mindfulness research. First, while it is true that many who research MT are also practitioners, the primary scholars are researchers who started in classical areas (emotion, clinical psychiatry, cognitive neuroscience) and gradually became involved in MT research. Further, most funding today for MT research comes not from shady religious institutions, but from well-established funding bodies such as the National Institute of Health and European Research Council. It is of course important to be aware of the impact prior beliefs can have on conducting impartial research, but with respect to today’s meditation and mindfulness researchers, I believe that most if not all of the work being done is honest, quality research.

However, it is true that much of the early MT research is flawed on several levels. Indeed several meta-analyses have concluded that generally speaking, studies of MT have often utilized poor design – in one major review only 8/22 studies met criteria for meta-analysis. The reason for this is quite simple- in the absence of pilot data, investigators had to begin somewhere. Typically it doesn’t bode well to jump into unexplored territory with an expensive, large sample, fully randomized design. There just isn’t enough to go off of- how would you know which kind of process to even measure? Accordingly, the large majority of mindfulness research to date has utilized small-scale, often sub-optimal experimental design, sacrificing experimental control in order build a basic idea of the cognitive landscape. While this exploratory research provides a needed foundation for generating likely hypotheses, it is also difficult to make any strong conclusions so long as methodological issues remain.

Indeed, most of what we know about these mindfulness and neuroplasticity comes from studies of either advanced practitioners (compared to controls) or “wait-list” control studies where controls receive no intervention. On the basis of the findings from these studies, we had some idea how to target our investigation, but there remained a nagging feeling of uncertainty. Just how much of the literature would actually replicate? Does mindfulness alter attention through mere expectation and motivation biases (i.e. placebo-like confounds), or can MT actually drive functionally relevant attentional and emotional neuroplasticity, even when controlling for these confounds?

The name of the game is active-control

Research to date links mindfulness practices to alterations in health and physiology, cognitive control, emotional regulation, responsiveness to pain, and a large array of positive clinical outcomes. However, the explicit nature of mindfulness training makes for some particularly difficult methodological issues. Group cross-sectional studies, where advanced practitioners are compared to age-matched controls, cannot provide causal evidence. Indeed, it is always possible that having a big fancy brain makes you more likely to spend many years meditating, and not that meditating gives you a big fancy brain. So training studies are essential to verifying the claim that mindfulness actually leads to interesting kinds of plasticity. However, unlike with a new drug study or computerized intervention, you cannot simply provide a sugar pill to the control group. Double-blind design is impossible; by definition subjects will know they are receiving mindfulness. To actually assess the impact of MT on neural activity and behavior, we need to compare to groups doing relatively equivalent things in similar experimental contexts. We need an active control.

There is already a well-established link between measurement outcome and experimental demands. What is perhaps less appreciated is that cognitive measures, particularly reaction time, are easily biased by phenomena like the Hawthorne effect, where the amount of attention participants receive directly contributes to experimental outcome. Wait-lists simply cannot overcome these difficulties. We know for example, that simply paying controls a moderate performance-based financial reward can erase attentional reaction-time differences. If you are repeatedly told you’re training attention, then come experiment time you are likely expect this to be true and try harder than someone who has received no such instruction. The same is true of emotional tasks; subjects told frequently they are training compassion are likely to spend more time fixating on emotional stimuli, leading to inflated self-reports and responses.

I’m sure you can quickly see how it is extremely important to control for these factors if we are to isolate and understand the mechanisms important for mindfulness training. One key solution is active-control, that is providing both groups (MT and control) with a “treatment” that is at least nominally as efficacious as the thing you are interested in. Active-control allows you exclude numerous factors from your outcome, potentially including the role of social support, expectation, and experimental demands. This is exactly what we set out to do in our study, where we recruited 60 meditation-naïve subjects, scanned them on an fMRI task, randomized them to either six weeks of MT or active-control, and then measured everything again. Further, to exclude confounds relating to social interaction, we came up with a particularly unique control activity- reading Emma together.

Jane Austen as Active Control – theory of mind vs interoception

To overcome these confounds, we constructed a specialized control intervention. As it was crucial that both groups believed in their training, we needed an instructor who could match the high level of enthusiasm and experience found in our meditation instructors. We were lucky to have the help of local scholar Mette Stineberg, who suggested a customized “shared reading” group to fit our purposes. Reading groups are a fun, attention demanding exercise, with purported benefits for stress and well-being. While these claims have not been explicitly tested, what mattered most was that Mette clearly believed in their efficacy- making for a perfect control instructor. Mette holds a PhD in literature, and we knew that her 10 years of experience participating in and leading these groups would help us to exclude instructor variables from our results.

With her help, we constructed a special condition where participants completed group readings of Jane Austin’s Emma. A sensible question to ask at this point is – “why Emma?” An essential element of active control is variable isolation, or balancing your groups in such way that, with the exception of your hypothesized “active ingredient”, the two interventions are extremely similar. As MT is thought to depend on a particular kind of non-judgmental, interoceptive kind of attention, Chris and Uta Frith suggested during an early meeting that Emma might be a perfect contrast. For those of you who haven’t read the novel, the plot is brimming over with judgment-heavy theory-of-mind-type exposition. Mette further helped to ensure a contrast with MT by emphasizing discussion sessions focused on character motives. In this way we were able to ensure that both groups met for the same amount of time each week, with equivalently talented and passionate instructors, and felt that they were working towards something worthwhile. Finally, we made sure to let every participant know at recruitment that they would receive one of two treatments intended to improve attention and well-being, and that any benefits would depend upon their commitment to the practice. To help them practice at home, we created 20-minute long CD’s for both groups, one with a guided meditation and the other with a chapter from Emma.

Unlike previous active-controlled studies that typically rely on relaxation training, reading groups depend upon a high level of social-interaction. Reading together allowed us not only to exclude treatment context and expectation from our results, but also more difficult effects of social support (the “making new friends” variable). To measure this, we built a small website for participants to make daily reports of their motivation and minutes practiced that day. As you can see in the figure below, when we averaged these reports we found that not only did the reading group practice significantly more than those in MT, but that they expressed equivalent levels of motivation to practice. Anecdotally we found that reading-group members expressed a high level of satisfaction with their class, with a sub-group of about 8 even continued their meetings after our study concluded. The meditation group by comparison, did not appear to form any lasting social relationships and did not continue meeting after the study. We were very happy with these results, which suggest that it is very unlikely our results could be explained by unbalanced motivation or expectation.

Impact of MT on attention and emotion

After we established that active control was successful, the first thing to look at was some of our outside-the-scanner behavioral results. As we were interested in the effect of meditation on both attention and meta-cognition, we used an “error-awareness task” (EAT) to examine improvement in these areas. The EAT (shown below) is a typical “go-no/go” task where subjects spend most of their time pressing a button. The difficult part comes whenever a “stop-trial” occurs and subject must quickly halt their response. In the case where the subject fails to stop, they then have the opportunity to “fix” the error by pressing a second button on the trial following the error. If you’ve ever taken this kind of task, you know that it can be frustratingly difficult to stop your finger in time – the response becomes quite habitual. Using the EAT we examined the impact of MT on both controlling responses (a variable called “stop accuracy”), as well as also on meta-cognitive self-monitoring (percent “error-awareness”).

The error-awareness task

We started by looking for significant group by time interactions on stop accuracy and error-awareness, which indicate that score fluctuation on a measure was statistically greater in the treatment (MT) group than in the control group. In repeated-measures design, this type of interaction is your first indication that the treatment may have had a greater effect than the control group. When we looked at the data, it was immediately clear that while both groups improved over time (a ‘main effect’ of time), there was no interaction to be found:

Group x time analysis of SA and EA.

While it is likely that much of the increase over time can be explained by test-retest effects (i.e. simply taking the test twice), we wanted to see if any of this variance might be explained by something specific to meditation. To do this we entered stop accuracy and error-awareness into a linear model comparing the difference of slope between each group’s practice and the EAT measures. Here we saw that practice predicted stop accuracy improvement only in the meditation group, and that the this relationship was statistically greater than in the reading group:

Practice vs Stop accuracy (MT only shown). We did of course test our interaction, see paper for GLM goodness =)

These results lead us to conclude that while we did not observe a treatment effect of MT on the error-awareness task, the presence of strong time effects and MT-only correlation with practice suggested that the improvements within each group may relate to the “active ingredients” of MT but reflect motivation-driven artifacts in the reading group. Sadly we cannot conclude this firmly- we’d have needed to include a third passive control group for comparison. Thankfully this was pointed out to us by a kind reviewer, who noted that this argument is kind of like having one’s cake and eating it, so we’ll restrict ourselves to arguing that the EAT finding serves as a nice validation of the active control- both groups improved on something, and a potential indicator of a stop-related treatment mechanism.

While the EAT served as a behavioral measure of basic cognitive processes, we also wanted to examine the neural correlates of attention and emotion, to see how they might respond to mindfulness training in our intervention. For this we partnered with Karina Blair at the National Institute of Mental Health to bring the Affective Stroop task (shown below) to Denmark .

Affective Stroop Trial Scheme

The Affective Stroop Task (AST) depends on a basic “number-counting Stroop” to investigate the neural correlates of attention, emotion, and their interaction. To complete the task, your instruction is simply “count the number of numbers in the first display (of numbers), count the number of numbers in the second display, and decide which display had more number of numbers”.  As you can see in the trial example above, conflict in the task (trial-type “C”) is driven by incongruence between the Arabic numeral (e.g. “4”) and the numeracy of the display (a display of 5 “4”’s). Meanwhile, each trial has nasty or neutral emotional stimuli selected from the international affective picture system. Using the AST, we were able to examine the neural correlates of executive attention by contrasting task (B + C > A) and emotion (negative > neutral) trials.

Since we were especially interested in changes over time, we expanded on these contrasts to examine increased or decreased neural response between the first and last scans of the study. To do this we relied on two levels of analysis (standard in imaging), where at the “first” or “subject level” we examined differences between the two time points for each condition (task and emotion), within each subject. We then compared these time-related effects (contrast images) between each group using a two-sample t-test with total minutes of practice as a co-variate. To assess the impact of meditation on performing the AST, we examined reaction times in a model with factors group, time, task, and emotion. In this way we were able to examine the impact of MT on neural activity and behavior while controlling for the kinds of artifacts discussed in the previous section.

Our analysis revealed three primary findings. First, the reaction time analysis revealed a significant effect of MT on Stroop conflict, or the difference between reaction time to incongruent versus congruent trials. Further, we did not observe any effect on emotion-related RTs- although both groups sped up significantly to negative trials vs neutral (time effect), this increase was equivalent in both groups. Below you can see the stroop-conflict related RTs:

Stroop conflict result

This became particularly interesting when we examine the neural response to these conditions, and again observed a pattern of overall [BOLD signal] increases in the dorsolateral prefrontal cortex to task performance (below):

DLPFC increase to task

Interestingly, we did not observe significant overall increases to emotional stimuli  just being in the MT group didn’t seem to be enough to change emotional processing. However, when we examined correlations with amount practice and increased BOLD to negative emotion across the whole brain, we found a striking pattern of fronto-insular BOLD increases to negative images, similar to patterns seen in previous studies of compassion and mindfulness practice:

Greater association of prefrontal-insular response to negative emotion and practice
Greater association of prefrontal-insular response to negative emotion and practice.

When we put all this together, a pattern began to emerge. Overall it seemed like MT had a relatively clear impact on attention and cognitive control. Practice-correlated increases on EAT stop accuracy, reduced Affective Stroop conflict, and increases in dorsolateral prefrontal cortex responses to task all point towards plasticity at the level of executive function. In contrast our emotion-related findings suggest that alterations in affective processing occurred only in MT participants with the most practice. Given how little we know about the training trajectories of cognitive vs affective skills, we felt that this was a very interesting result.

Conclusion: the more you do, the what you get?

For us, the first conclusion from all this was that when you control for motivation and a host of other confounds, brief MT appears to primarily train attention-related processes. Secondly, alterations in affective processing seemed to require more practice to emerge. This is interesting both for understanding the neuroscience of training and for the effective application of MT in clinical settings. While a great deal of future research is needed, it is possible that the affective system may be generally more resilient to intervention than attention. It may be the case that altering affective processes depends upon and extends increasing control over executive function. Previous research suggests that attention is largely flexible, amenable to a variety of training regimens of which MT is only one beneficial intervention. However we are also becoming increasingly aware that training attention alone does not seem to directly translate into closely related benefits.

As we begin to realize that many societal and health problems cannot be solved through medication or attention-training alone, it becomes clear that techniques to increase emotional function and well-being are crucial for future development.  I am reminded of a quote overheard at the Mind & Life Summer Research Institute and attributed to the Dalai Lama. Supposedly when asked about their goal of developing meditation programs in the west, HHDL replied that, what was truly needed in the West was not “cognitive training, as (those in the west) are already too clever. What is needed rather is emotion training, to cultivate a sense of responsibility and compassion”. When we consider falling rates of empathy in medical practitioners and the link to health outcome, I think we do need to explore the role of emotional and embodied skills in supporting a wide-array of functions in cognition and well-being. While emotional development is likely to depend upon executive function, given all the recent failures to show a transfer from training these domains to even closely related ones, I suspect we need to begin including affective processes in our understanding of optimal learning. If these differences hold, then it may be important to reassess our interventions (mindful and otherwise), developing training programs that are customized in terms of the intensity, duration, and content appropriate for any given context.

Of course, rather than end on such an inspiring note, I should point out that like any study, ours is not without flaws (you’ll have to read the paper to find out how many 😉 ) and is really just an initial step. We made significant progress in replicating common neural and behavioral effects of MT while controlling for important confounds, but in retrospect the study could have been strengthened by including measures that would better distinguish the precise mechanisms, for example a measure of body awareness or empathy. Another element that struck me was how much I wish we’d had a passive control group, which could have helped flesh out how much of our time effect was instrument reliability versus motivation. As far as I am concerned, the study was a success and I am happy to have done my part to push mindfulness research towards methodological clarity and rigor. In the future I know others will continue this trend and investigate exactly what sorts of practice are needed to alter brain and behavior, and just how these benefits are accomplished.

In the near-future, I plan to give mindfulness research a rest. Not that I don’t find it fascinating or worthwhile, but rather because during the course of my PhD I’ve become a bit obsessed with interoception and meta-cognition. At present, it looks like I’ll be spending my first post-doc applying predictive coding and dynamic causal modeling to these processes. With a little luck, I might be able to build a theoretical model that could one day provide novel targets for future intervention!

Link to paper:

Cognitive-Affective Neural Plasticity following Active-Controlled Mindfulness Intervention

Thanks to all the collaborators and colleagues who made this study possible.

Special thanks to Kate Mills (@le_feufollet) for proofing this post 🙂

Insula and Anterior Cingulate: the ‘everything’ network or systemic neurovascular confound?

It’s no secret in cognitive neuroscience that some brain regions garner more attention than others. Particularly in fMRI research, we’re all too familiar with certain regions that seem to pop up in study after study, regardless of experimental paradigm. When it comes to areas like the anterior cingulate cortex (ACC) and insula (AIC), the trend is obvious. Generally when I see the same brain region involved in a wide a variety of tasks, I think there must be some very general level function which encompasses these paradigms. Off the top of my head, the ACC and AIC are major players in cognitive control, pain, emotion, consciousness, salience, working memory, decision making, and interoception to name a few. Maybe on a bad day I’ll look at a list like that and think, well localization is just all wrong, and really what we have is a big fat prefrontal cortex doing everything in conjunction. A paper published yesterday in Cerebral Cortex took my breath away and lead to a third, more sinister option: a serious methodological confound in a large majority of published fMRI papers.

Neurovascular coupling and the BOLD signal: a match not made in heaven

An important line of research in neuroimaging focuses on noise in fMRI signals. The essential problem of fMRI is that, while it provides decent spatial resolution, the data is acquired slowly and indirectly via the blood-oxygenation level dependent (BOLD) signal. The BOLD signal is messy, slow, and extremely complex in its origins. Although we typically assume increasing BOLD signal equals greater neural activity, the details of just what kind of activity (e.g. excitatory vs inhibitory, post-synaptic vs local field) are murky at best. Advancements in multi-modal and optogenetic imaging hold a great deal of promise regarding the signal’s true nature, but sadly we are currently at a “best guess” level of understanding. This weakness means that without careful experimental design, it can be difficult to rule out non-neural contributors to our fMRI signal. Setting aside the worry about what neural activity IS measured by BOLD signal, there is still the very real threat of non-neural sources like respiration and cardiovascular function confounding the final result. This is a whole field of research in itself, and is far too complex to summarize here in its entirety. The basic issue is quite simple though.

End-tidal C02, respiration, and the BOLD Signal

In a nutshell, the BOLD signal is thought to measure downstream changes in cerebral blood-flow (CBF) in response to neural activity. This relationship, between neural firing and blood flow, is called neurovascular coupling and is extremely complex, involving astrocytes and multiple chemical pathways. Additionally, it’s quite slow: typically one observes a 3-5 second delay between stimulation and BOLD response. This creates our first noise-related issue; the time between each ‘slice’ of the brain, or repetition time (TR), must be optimized to detect signals at this frequency. This means we sample from our participant’s brain slowly. Typically we sample every 3-5 seconds and construct our paradigms in ways that respect the natural time lag of the BOLD signal. Stimulate too fast, and the vasculature doesn’t have time to respond. Stimulation frequency also helps prevent our first simple confound: our pulse and respiration rates tend oscillate at slightly slower frequencies (approximately every 10-15 seconds). This is a good thing, and it means that so long as your design is well controlled (i.e. your events are properly staggered and your baseline is well defined) you shouldn’t have to worry too much about confounds. But that’s our first problematic assumption; consider for example when one’s paradigms use long blocks of obscure things like “decide how much you identify with these stimuli”. If cognitive load differs between conditions, or your groups (for example, a PTSD and a control group) react differently to the stimuli, respiration and pulse rates might easily begin to overlap your sampling frequency, confounding the BOLD signal.

But you say, my experiment is well controlled, and there’s no way my groups are breathing THAT differently! Fair enough, but this leads us to our next problem: end tidal CO2. Without getting into the complex physiology, end-tidal CO2 is a by-product of respiration. When you hold your breath, CO2 blood levels rise dramatically. CO2 is a potent vasodilator, meaning it opens blood vessels and increases local blood flow. You’ve probably guessed where I’m going with this: hold your breath in the fMRI and you get massive alterations in the BOLD signal. Your participants don’t even need to match the sampling frequency of the paradigm to confound the BOLD; they simply need to breath at slightly different rates in each group or condition and suddenly your results are full of CO2 driven false positives! This is a serious problem for any kind of unconstrained experimental design, especially those involving poorly conceptualized social tasks or long periods of free activity. Imagine now that certain regions of the brain might respond differently to levels of CO2.

This image is from Change & Glover’s paper, “Relationship between respiration, end-tidal CO2, and BOLD signals in resting-state fMRI”. Here they measure both CO2 and respiration frequency during a standard resting-state scan. The image displays the results of group-level regression of these signals with BOLD. I’ve added circles in blue around the areas that respond the strongest. Without consulting an atlas, we can clearly see that bilateral anterior insula extending upwards into parietal cortex, anterior cingulate, and medial-prefrontal regions are hugely susceptible to respiration and CO2. This is pretty damning for resting-state fMRI, and makes sense given that resting state fluctuations occur at roughly the same rate as respiration. But what about well-controlled event related designs? Might variability in neurovascular coupling cause a similar pattern of response? Here is where Di et al’s paper lends a somewhat terrifying result:


Di et al recently investigated the role of vascular confounds in fMRI by administrating a common digit-symbol substitution task (DSST), a resting state, and a breath-holding paradigm. Signals related to resting-state and breath-holding were then extracted and entered into multiple regression with the DSST-related activations. This allowed Di et al to estimate what brain regions were most influenced by low-frequency fluctuation (ALFF, a common resting state measure) and purely vascular sources (breath-holding). From the figure above, you can see that regions marked with the blue arrow were the most suppressed, meaning the signal explained by the event-related model was significantly correlated with the covariates, and in red where the signal was significantly improved by removal of the covariates. The authors conclude that “(results) indicated that the adjustment tended to suppress activation in regions that were near vessels such as midline cingulate gyrus, bilateral anterior insula, and posterior cerebellum.” It seems that indeed, our old friends the anterior insula and cingulate cortex are extremely susceptible to neurovascular confound.

What does this mean for cognitive neuroscience? For one, it should be clear that even well-controlled fMRI designs can exhibit such confounds. This doesn’t mean we should throw the baby out with the bathwater though; some designs are better than others. Thankfully it’s pretty easy to measure respiration with most scanners, and so it is probably a good idea at minimum to check if one’s experimental conditions do indeed create differential respiration patterns. Further, we need to be especially cautious in cases like meditation or clinical fMRI, where special participant groups may have different baseline respiration rates or stronger parasympathetic responses to stimuli. Sadly, I’m afraid that looking back, these findings greatly limit our conclusions in any design that did not control for these issues. Remember that the insula and ACC are currently cognitive neuroscience’s hottest regions. I’m not even going to get into resting state, where these problems are all magnified 10 fold. I’ll leave you with this image from neuroskeptic, estimating the year’s most popular brain regions:

Are those spikes publication fads, every-task regions, or neurovascular artifacts? You be the judge.

 
edit:As many of you had questions or comments regarding the best way to deal with respiratory related issues, I spoke briefly with resident noise expert Torben Lund at yesterday’s lab meeting. Removal of respiratory noise is fairly simple, but the real problem is with end-tidal C02. According to Torben, most noise experts agree that regression techniques only partially remove the artifact, and that an unknown amount is left behind even following signal regression. This may be due to slow vascular saturation effects that build up and remain irrespective of shear breath frequency. A very tricky problem indeed, and certainly worth researching.
 
 
Note: credit goes to my methods teacher and fMRI noise expert Torben Lund, and CFIN neurobiologist Rasmus Aamand, for introducing and explaining the basis of the C02/respiration issue to me. Rasmus particularly, whose sharp comments lead to my including respiration and pulse measures in my last meditation project.