Will multivariate decoding spell the end of simulation theory?

Decoding techniques such as multivariate pattern analysis (MVPA) are hot stuff in cognitive neuroscience, largely because they offer a tentative promise of actually reading out the underlying computations in a region rather than merely describing data features (e.g. mean activation profiles). While I am quite new to MVPA and similar machine learning techniques (so please excuse any errors in what follows), the basic process has been explained to me as a reversal of the X and Y variables in a typical general linear model. Instead of specifying a design matrix of explanatory (X) variables and testing how well those predict a single independent (Y) variable (e.g. the BOLD timeseries in each voxel), you try to estimate an explanatory variable (essentially decoding the ‘design matrix’ that produced the observed data) from many Y variables, for example one Y variable per voxel (hence the multivariate part). The decoded explanatory variable then describes (BOLD) responses in way that can vary in space, rather than reflecting an overall data feature across a set of voxels such as mean or slope. Typically decoding analyses proceed in two steps, one in which you train the classifier on some set of voxels and another where you see how well that trained model can classify patterns of activity in another scan or task. It is precisely this ability to detect patterns in subtle spatial variations that makes MVPA an attractive technique- the GLM simply doesn’t account for such variation.

The implicit assumption here is that by modeling subtle spatial variations across a set of voxels, you can actually pick up the neural correlates of the underlying computation or representation (Weil and Rees, 2010, Poldrack, 2011). To illustrate the difference between an MVPA and GLM analysis, imagine a classical fMRI experiment where we have some set of voxels defining a region with a significant mean response to your experimental manipulation. All the GLM can tell us is that in each voxel the mean response is significantly different from zero. Each voxel within the significant region is likely to vary slightly in its actual response- you might imagine all sorts of subtle intensity variations within a significant region- but the GLM essentially ignores this variation. The exciting assumption driving interest in decoding is that this variability might actually reflect the activity of sub-populations of neurons and by extension, actual neural representations. MVPA and similar techniques are designed to pick out when these reflect a coherent pattern; once identified this pattern can be used to “predict” when the subject was seeing one or another particular stimulus. While it isn’t entirely straightforward to interpret the patterns MVPA picks out as actual ‘neural representations’, there is some evidence that the decoded models reflect a finer granularity of neural sub-populations than represented in overall mean activation profiles (Todd, 2013; Thompson 2011).

Professor Xavier applies his innate talent for MVPA.
Professor Xavier applies his innate talent for MVPA.

As you might imagine this is terribly exciting, as it presents the possibility to actually ‘read-out’ the online function of some brain area rather than merely describing its overall activity. Since the inception of brain scanning this has been exactly the (largely failed) promise of imaging- reverse inference from neural data to actual cognitive/perceptual contents. It is understandable then that decoding papers are the ones most likely to appear in high impact journals- just recently we’ve seen MVPA applied to dream states, reconstruction of visual experience, and pain experience all in top journals (Kay et al., 2008, Horikawa et al., 2013, Wager et al., 2013). I’d like to focus on that last one for the remainer of this post, as I think we might draw some wide-reaching conclusions for theoretical neuroscience as a whole from Wager et al’s findings.

Francesca and I were discussing the paper this morning- she’s working on a commentary for a theoretical paper concerning the role of the “pain matrix” in empathy-for-pain research. For those of you not familiar with this area, the idea is a basic simulation-theory argument-from-isomorphism. Simulation theory (ST) is just the (in)famous idea that we use our own motor system (e.g. mirror neurons) to understand the gestures of others. In a now infamous experiment Rizzolatti et al showed that motor neurons in the macaque monkey responded equally to their own gestures or the gestures of an observed other (Rizzolatti and Craighero, 2004). They argued that this structural isomorphism might represent a general neural mechanism such that social-cognitive functions can be accomplished by simply applying our own neural apparatus to work out what was going on for the external entity. With respect to phenomena such empathy for pain and ‘social pain’ (e.g. viewing a picture of someone you broke up with recently), this idea has been extended to suggest that, since a region of networks known as “the pain matrix” activates similarly when we are in pain or experience ‘social pain’, that we “really feel” pain during these states (Kross et al., 2011) [1].

In her upcoming commentary, Francesca points out an interesting finding in the paper by Wager and colleagues that I had overlooked. Wager et al apply a decoding technique in subjects undergoing painful and non-painful stimulation. Quite impressively they are then able to show that the decoded model predicts pain intensity in different scanners and various experimental manipulations. However they note that the model does not accurately predict subject’s ‘social pain’ intensity, even though the subjects did activate a similar network of regions in both the physical and social pain tasks (see image below). One conclusion from these findings it that it is surely premature to conclude that because a group of subjects may activate the same regions during two related tasks, those isomorphic activations actually represent identical neural computations [2]. In other words, arguments from structural isomorpism like ST don’t provide any actual evidence for the mechanisms they presuppose.

Figure from Wager et al demonstrating specificity of classifier for pain vs warmth and pain vs rejection. Note poor receiver operating curve (ROC) for 'social pain' (rejecter vs friend), although that contrast picks out similar regions of the 'pain matrix'.
Figure from Wager et al demonstrating specificity of classifier for pain vs warmth and pain vs rejection. Note poor receiver operating curve (ROC) for ‘social pain’ (rejecter vs friend), although that contrast picks out similar regions of the ‘pain matrix’.

To me this is exactly the right conclusion to take from Wager et al and similar decoding papers. To the extent that the assumption that MVPA identifies patterns corresponding to actual neural representations holds, we are rapidly coming to realize that a mere mean activation profile tells us relatively little about the underlying neural computations [3]. It certainly does not tell us enough to conclude much of anything on the basis that a group of subjects activate “the same brain region” for two different tasks. It is possible and even likely that just because I activate my motor cortex when viewing you move, I’m doing something quite different with those neurons than when I actually move about. And perhaps this was always the problem with simulation theory- it tries to make the leap from description (“similar brain regions activate for X and Y”) to mechanism, without actually describing a mechanism at all. I guess you could argue that this is really just a much fancier argument against reverse inference and that we don’t need MVPA to do away with simulation theory. I’m not so sure however- ST remains a strong force in a variety of domains. If decoding can actually do away with ST and arguments from isomorphism or better still, provide a reasonable mechanism for simulation, it’ll be a great day in neuroscience. One thing is clear- model based approaches will continue to improve cognitive neuroscience as we go beyond describing what brain regions activate during a task to actually explaining how those regions work together to produce behavior.

I’ve curated some enlightening responses to this post in a follow-up – worth checking for important clarifications and extensions! See also the comments on this post for a detailed explanation of MVPA techniques. 

References

Horikawa T, Tamaki M, Miyawaki Y, Kamitani Y (2013) Neural Decoding of Visual Imagery During Sleep. Science.

Kay KN, Naselaris T, Prenger RJ, Gallant JL (2008) Identifying natural images from human brain activity. Nature 452:352-355.

Kross E, Berman MG, Mischel W, Smith EE, Wager TD (2011) Social rejection shares somatosensory representations with physical pain. Proceedings of the National Academy of Sciences 108:6270-6275.

Poldrack RA (2011) Inferring mental states from neuroimaging data: from reverse inference to large-scale decoding. Neuron 72:692-697.

Rizzolatti G, Craighero L (2004) The mirror-neuron system. Annu Rev Neurosci 27:169-192.

Thompson R, Correia M, Cusack R (2011) Vascular contributions to pattern analysis: Comparing gradient and spin echo fMRI at 3T. Neuroimage 56:643-650.

Todd MT, Nystrom LE, Cohen JD (2013) Confounds in Multivariate Pattern Analysis: Theory and Rule Representation Case Study. NeuroImage.

Wager TD, Atlas LY, Lindquist MA, Roy M, Woo C-W, Kross E (2013) An fMRI-Based Neurologic Signature of Physical Pain. New England Journal of Medicine 368:1388-1397.

Weil RS, Rees G (2010) Decoding the neural correlates of consciousness. Current opinion in neurology 23:649-655.


[1] Interestingly this paper comes from the same group (Wager et al) showing that pain matrix activations do NOT predict ‘social’ pain. It will be interesting to see how they integrate this difference.

[2] Nevermind the fact that the ’pain matrix’ is not specific for pain.

[3] With all appropriate caveats regarding the ability of decoding techniques to resolve actual representations rather than confounding individual differences (Todd et al., 2013) or complex neurovascular couplings (Thompson et al., 2011).

10 thoughts on “Will multivariate decoding spell the end of simulation theory?

  1. Not trying to be obnoxious (it’s just the way I am :P) but I have to leave my two cents (pennies) on this, mainly because this is a pet peeve of mine. There are some inaccuracies in your description of what “decoding” analyses actually do. That’s not surprising because there is a lot of confusion floating around the neuroimaging community about MVPA, GLM, univariate and multivariate methods, decoding, encoding, and lots of other things I am too lazy to remember right now.

    1. In a way, your description that you are essentially inverting the process of model fitting is right, so that you are using a classifier to make a prediction of what the design matrix was. But this is only correct up to a degree, because in truth you *are* using a design matrix to train your classifier in the first place and only afterwards you use an independent data set in your cross-validation procedure to test whether the trained classification rule generalises to novel data. The reason to do this is partly due to how classification problems are set up but primarily this is essential because you need to ensure that you are not overfitting (that is, reliably decoding noise).

    Now what it boils down to really is that you are fitting a model to the (training) data to explain how to distinguish experimental conditions (i.e. columns in your design matrix) and then testing that this model performs reasonably well when you feed it new data. There is nothing to distinguish it from the GLM. You can apply cross-validation to the GLM (or any model fitting) in the same way by first estimating the model parameters and then applying the fitted model to new data to see if it can predict the experimental observations reliably. In fact, there are situations where this might be a very useful thing to do.

    2. This brings me to another point, which is that multivariate pattern analysis does not typically measure (or model) “subtle variations in the response of each voxel”, certainly not any more so than more conventional GLM-based analyses (as a matter of fact, unlike in the early days of MVPA in fMRI most studies nowadays use the standard GLM to get voxel responses to feed into MVPA).

    In its most typical form, MVPA uses information contain across a pattern of voxels, that is, it tests whether there are some reliable relationships in the response profiles of a set of multiple voxels (usually 10s or 100s). This is why the acronym is also frequently translated as multi-voxel pattern analysis rather than multivariate. As Karl Friston quite rightly pointed out once, any pattern is inherently multivariate, so multivariate pattern analysis is really a redundant term (so it makes sense to use multivoxel to retain the acronym).

    So what distinguishes standard GLM analyses and MVPA is not that it picks up subtle variations in the signal of a voxel but the fact that standard analyses typically average a large number of voxels together. In whole-brain analyses you either apply contrains on the minimum cluster size of an activation or you may use ROI-based analysis to extract the mean signal from a region. Either way, you throw away any information there might be in the spatial pattern of voxels in a region.

    Of course, it is not inconceivable to do what you describe, that is, to use classification techniques to decode an experimental variable from subtle variations in the signal of a single voxel. Put another way, you could use time points as features rather than voxels, as is typically done. You could also combine both. However, this doesn’t change the fact that you are still only using a different method to do a similar thing. You *could* use the GLM as well to do something very similar by building a parametric model (and this is frequently done).

    It is not my intention to diss MVPA methods. They can be very useful tools and I have used them too in the past (but perhaps not always in the smartest way…). I have not had much to do with them for a few years but I’m actually just now doing a project that will involve a bit of MVPA (in a much smarter way than I did in the past, I think😉. But they are just a tool like any other method at our disposal. They are good to address certain questions while more traditional analyses are better for other purposes.

    The important thing though to keep in mind is that *they are not fundamentally different* from any other method out there. In the end you are still only looking at the activity measured in a voxel/neuron (or rather a set of voxels/neurons or in the same voxel/neuron across time etc) and inferring something meaningful from that. That is essentially the same damn thing.

    Therefore any claims, or even any discussion, whether MVPA is going to do away with one thing or another, or whether it is “mind-reading” and the like, in my opinion completely miss the point. Neuroimaging as a whole is a way to read the mind (or more precisely, the brain). It’s in the name. You are measuring responses from actual neuronal populations inside the brain (in a more or less indirect way, depending on the technique) and you can do this while a person is exposed to different experimental conditions or is in different mental states.

    In that sense, any methods, be it MVPA or the GLM or anything else, is “decoding brain states”.

    • Thanks a ton for this reply Sam. If it wasn’t getting so late i’d make further updates to the original post to better reflect this information. I suppose where I got a bit offtrack was in describing the role of time and space in MVPA. It makes sense that in the time dimension the GLM and classifier approaches use exactly the same data. I guess the argument for MVPA techniques is precisely as you say, that the particular spatial pattern across voxels reflects an underlying neural representation or at least neural sub-population.

      It’s definitely true that GLM and MVPA are both in a sense decoding techniques and I think this subtlety is often lost on readers, myself included. But like you said MVPA does retain certain spatial information that is lost in the GLM analysis- so it is a matter of having a bit more information in one analysis than the other. I guess the real question is what exactly does that extra information reflect- underlying neural reps as is hoped, or just plain old more information (including nasty stuff like confounds).

      But to the extent that the extra information is causally relevant, we shouldn’t be too surprised that MVPA can show a greater specificity/predictive power than GLM in certain cases. For the simulation debate in general, I think that the spatial information might be relevant just in showing that what looks like a homogenous activation between conditions can actually be further broken down into condition specific sub-populations. Really it’s just arguing that arguing from the sake of two spatially overlapping blobs is a poor choice.

      Also I wish I’d said more about your point regarding model-based fMRI. Like you say, parametric fMRI analyses certainly go beyond merely describing data features to actually fitting meaningful parameters to those activations. In the same vein model-based approaches are likely to shed light on these old school debates from isomorphism.

      • I think the thoughts I am trying to convey are really two-fold:

        1. While people frequently talk about the added sensitivity of MVPA compared to the GLM, in the strictest sense the GLM is a completely different thing. It’s just a way to get the data our for each individual voxel. You could do it in many other ways but it’s one way and you can then feed that result into MVPA or do lots of other things, including the standard mass aggregate when you analyse the response from a cluster or region-or-interest. It’s true though that in common parlance GLM and the standard approach for mass-univariate statistical parametric mapping are one and the same thing – They are not and this is partly what annoys me😉

        2. The multivariate aspect obviously adds sensitivity by looking at pattern information, or generally any information of more than one variable (e.g. voxels in a region). As such it is more sensitive to the information content in a region than just looking at the average response from that region. Such an approach can reveal that region A contains some diagnostic information about an experimental variable while region B does not, even though they both show the same mean activation. This is certainly useful knowledge that can help us advance our understanding of the brain – but in the end it is still only one small piece in the puzzle. And as both Tal and James pointed out (in their own ways) and as you discussed as well, you can’t really tell what the diagnostic information actually represents.
        Conversely, you can’t be sure that just because MVPA does not pick up diagnostic information from a region that it therefore doesn’t contain any information about the variable of interest. MVPA can only work as long as there is a pattern of information within the features you used.

        This last point is most relevant to James’ comment. Say you are using voxels as features to decode some experimental variable. If all the neurons with different tuning characteristics in an area are completely intermingled (like orientation-preference in mouse visual cortex for instance) you should not really see any decoding – even if the neurons in that area are demonstrably selective to the experimental variable.

        Similarly, (and this is Tal’s point) if there are neuronal populations in a region that respond to a particular stimulus, say because you think it is beautiful, then you will pick that up with MVPA (and, incidentally, you even might do with “standard univariate” methods if you’re particularly unlucky). It may be completely unrelated to the experimental manipulation you were interested in.

        So, in my book decoding methods are not really capable of dismantling any theories. Just like any other technique, they demand careful thought about what your results might mean, what could confound them, and how to test the underlying processes further.

  2. Thanks for a really interesting post.

    I guess the question is ultimately quantitative rather than qualitative. There must be some neural difference between (say) personal pain and observed pain, otherwise they would be indistinguishable, which I don’t think anyone is arguing.

    What we want to know is whether there’s “enough” similarity between the neural representations to count as evidence for the simulation theory… but it’s not clear where to draw the line.

    • Thanks NS!

      It’s a good point that in essence, ST just argues that we uses “our own mental apparatus” to work out the contents of the other’s mind in X situation. It’s unclear what exactly should count as evidence for this. I saw one commenter describe the original Rizollati macaque study as definitive, in that it recorded from single neurons. But we can push the argument further and ask if mean or summary activation in a single neuron accurately describes whatever that neuron is representing. E.g. two neurons might show similar mean activation profiles but different firing frequencies. Or we could go the other way and ask if my “action representation” needs to be connected into the same network to count as evidence for ST. Ultimately that is the real problem with ST- going from a description at one level (two blobs activate in similar space) to a mechanism at another, without showing evidence for that jump. I’m not sure MVPA can help resolve that debate but it can at least be another card in the deck against this kind of isomorphic reverse inference.

  3. “Just like any other technique, they demand careful thought about what your results might mean, what could confound them, and how to test the underlying processes further.”
    Which is another way of saying: The results are multi-interpretable, introduce a vast room for error and can actually best serve as a confirmation bias generator.

    Seek and ye shall find. The idea that which such granularity one is able to deduce high order processes is ludicrous.

    • I don’t think that sets it apart from any scientific investigation. To assume anything other than “results are multi-interpretable” is deluding themselves about how science works. The only thing wrong with this is if one assumes that there is one killer experiment to reveal the definite answer – and this is of course unrealistic. Every finding is a piece in a puzzle, a lot of findings will get reinterpreted when more knowledge has been gained, and many will turn out to be incorrect. That doesn’t mean the entire process if flawed.

      • True, but unfortunately this is not how the majority of the writers of papers based on fMRI studies see it. Of the many i’ve seen sofar to them fMRI is a blackbox that produces accurate outcomes. They seem to assume that the image they see is actually the real representation of the brain, rather than a wild guess.
        I can’t make out if that is because the basic principles of fMRI eludes them or i’ve just by sheer bad luck only read papers done by mediocre scientists under publication pressure.

        But if i read papers where the writer based on an fMRI concludes that he has demonstrated that belief in general and religious belief are the action of the same area of the brain or a female researcher who self stimulates to orgasm during an fMRI concludes that orgasm opens the mind to a wider consciousness due to the fMRI showing the whole brain going red i just want bang my head on my desk in desperation.

        And that it gets published makes me want to write angry letters. Which i don’t.

        fMRI is a very crude tool, so low in resolution and proper definition of its capacities it should be excluded from research but by the very few who actually understand its limits. Saves me a headache from banging my head on the desk and many a researcher looking like an idiot

        • It is true that fMRI is used very poorly by a very large number of researchers. This is largely due to the nature of cognitive neuroscience, which has really only come into it’s own in the past two decades. Most early teams where just that, teams of professionals from different disciplines collaborating to get the most out of a scanner. Now you have many fMRI institutes that are capable of putting out research fairly regularly, but a poor academic infrastructure for actually training cognitive neuroscientists. So you get a lot of social psychologists or worse asking questions that fMRI is not really able to answer. I always wonder why it seems so trendy to damn an entire field on the basis of it’s worst practices. Even with the above there are plenty of us who spend a whole lot of time studying the actual discipline of cognitive neuroscience- biology, information theory, advanced statistics, etc. We are well aware of the limitations. With that in mind:

          “Seek and ye shall find. The idea that which such granularity one is able to deduce high order processes is ludicrous.”

          Is just a ridiculous hasty generalization. With properly designed experiments you can in fact deduce useful information about higher order proceses. Working memory is a great example- there is tons of fantastic research on the impact of cognitive load and item rehearsal that is greatly advanced by fMRI research. Even Karl Friston states that while fMRI may be slow and messy, this likely makes it ideal for detecting certain cognitive and sensory processes, obviously at the expense of others. To quote leading neurobiologist Logothetis, with respect to assessing cognition, “it doesn’t make much sense to read a newspaper with a microscope.”

          I agree that studies like that female orgasm study were poorly done. Any good cognitive neuroscientist should understand the role that motion and respiration play in fMRI investigations, and should take proper care in their experimental design to mitigate and control for these factors. What annoys me is lay persons and social scientists demonizing our entire field on the basis of the bad apples. Perhaps when Cognitive Neuroscience is recognized as the technically challenging and unique field that it is, and more universities have dedicated programs which are not merely cobbled together collaborations between existing faculties, this problem will lessen. Until then, we don’t pay much attention to those papers (aside from criticizing them on twitter and forums such as this one) and neither should you.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s