The goal of the following experiment is to use multivariate pattern analysis of functional MRI data to investigate multimodal processing in the human brain. This is achieved by first creating a specific kind of sensory stimuli, which are presented to the subjects through a single sensory modality. For instance, the visual, but implies strong associations in a different modality.
For instance, the auditory while the subjects perceive such stimuli neural activity in the parts of the brain pertaining to the modality of the association. The auditory cortices in this example is recorded using FMRI. The recorded activity is analyzed using multi-variate pattern analysis with the goal of predicting which of several stimuli the subject perceived in the original modality.
The results show that unimodal stimuli containing information pertinent to more than one modality can induce content specific neural activity in the early sensory cortices of modalities other than the one through which they are presented. The technique we introduce today, which we refer to as Crossmodal multivariate pattern analysis, represents a natural extension of conventional multivariate pattern analysis with the difference that sensory STEMI are classified across rather than within sensory modalities. We first had the idea for this method when trying to produce empirical evidence for a neuro architectural framework introduced by Demio more than two decades ago.
In this video, we'll introduce you not only to the technical aspects of multivariate pattern analysis or MVPA for short, but also to this theoretical framework. We will first point out some key differences between MVPA and conventional univariate FMRI analysis. Consider the following.
Example, if a subject is presented with two different visual stimuli like an apple and an orange, then both stimuli averaged across a number of trials induce a specific pattern of neural activity. In the primary visual cortex symbolized here by the activation levels of six hypothetical voxels. In conventional FMRI analysis, there are essentially two methods to analyze these patterns.
The first is to compare the average level of activity the stimuli induced in the entire region of interest. The difference between the averages, however, may not be significant. The second method is to establish a subtraction contrast for each voxel.
The activation level during the apple condition is subtracted from the activation level during the orange condition, and the resulting difference for each voxel can be visualized on a whole brain contrast image. Again, however, these differences may be small and may reach the required statistical criterion only for a few voxels. However, unlike univariate analysis methods, MVPA is able to detect patterns within voxels by considering the activation levels of all the voxels simultaneously.
While only a few of the activation differences may be significant in isolation, the two patterns when considered in their entirety may indeed be statistically different. Another major difference between conventional FMRI analysis and MVPA is that the latter method uses what is called reverse inference. In conventional FMRI analysis, the researcher typically asks a question of the type will two different visual stimuli.
For instance, the picture of a face and the picture of a house lead to different activity levels in a specific region of interest, such as the fusiform face area. By contrast, MVPA is typically expressed in terms of reverse inference or decoding, and asks, based on the pattern of neural activity in a specific brain region, will we be able to predict which of two visual stimuli a subject perceived. It is important to note, however, that from a statistical point of view, it is equivalent to say that two stimuli lead to distinct activity patterns in a given brain region and to say that the activity pattern in that brain region permits prediction of the inducing stimulus.
In other words, the sensitivity of MVPA is superior to that of univariate analyses because it considers several voxel simultaneously and not because it proceeds in an inverse direction. To return to the earlier example, let's consider a typical MVPA paradigm that assesses whether seeing an apple induces a different pattern of neural activity in primary visual cortex than seeing an orange. In a first step, FMRI data are recorded while a subject sees a large number of the stimuli to be discriminated.
The acquired data are then divided into a training data set and a testing data set. The data from the training set are entered into a pattern classifier, which attempts to detect features in the neural patterns that distinguish the two types of trials from one another. Next, the classifier is presented with unlabeled data from the testing set and based on the patterns it detected in the training data set attributes the most likely label to each of the testing trials for each stimulus.
The classifier guess is then compared to the correct stimulus label and classifier performance is calculated as the percentage of correct guesses. As we've seen the classifier in the example reduce the correct stimulus labels in nine out of 12 cases, or 75%as chance performance. In such a two way discrimination would be 50%This suggests that there are indeed consistent differences between the neural patterns induced in V one by the orange and the apple stimuli.
Of course, the significance of this result would have to be proven statistically. One important issue to keep in mind in such a classification experiment is that the training and the testing data sets be completely independent from one another because only if that's the case can any conclusions be drawn as to the generalizability of the patterns that have been learned during the training session. For this reason, MVPA paradigms often use a so-called cross validation paradigm.
While this procedure serves to maximize the number of training and testing trials that can be gained from a given data set, it also assures at the same time that the training and testing sets are do not overlap during the individual classification steps. Consider the following MVPA experiment with eight functional runs. In a first cross validation step, the classifier is trained on the data from runs one through seven and tested on the data from run eight.
In the second step, the classifier is then trained on runs one through six as well as on run eight, and subsequently tested on run seven. Following this schema, eight cross validation steps are carried out with each run serving as the test run exactly. Once classifier performance is obtained for each cross validation step and averaging these results yields overall performance.
There are freely available software packages on the internet to perform MVPA such as pi MVPA and the toolbox offered by the Princeton Neuroscience Institute. Experimental paradigms such as the one just described, have been used successfully to predict perceptual stimuli from neural activity in the corresponding parts of the cerebral cortex, so for example, to predict visual stimuli based on the activity in visual cortices or auditory stimuli. Based on the activity in auditory cortices, we would now like to introduce an extension of this basic idea in which perceptual stimuli are predicted not only within but across modalities.
Our idea draws on the fact that perception is intricately linked to the recall of memories. For example, a visual stimulus that has a strong auditory implication such as the site of a glass base shattering on the ground. It's gonna automatically trigger in our mind's ear an auditory image that shares similarities with the auditory images we experienced on previous encounters with breaking glass.
According to a framework introduced by Theo in the late 1980s, the memory association between the site of the shattering base and the corresponding sound is stored in so-called convergence divergence zones or CDs for short convergence. Divergence zones are conceptualized as ensembles of neurons that are located at the various hierarchical levels of the sensory systems. As their name implies.
CDs at each level receive convergent bottom up projections from lower order cortices and turn. They sent back divergent top-down projections to those same lower order cortices Due to the convergent bottom-up projections. CDs can be activated by perceptual representations in multiple modalities, for instance, both by the site and the sound of a shattering base due to the divergent top-down projections.
They can then promote the reconstruction of associated images by signaling back to the early sensory cortices of additional modalities. Consider the sequence of activation a purely visual, but sound implying stimulus will induce according to this framework. The stimulus first induces a specific pattern of neural activity in the early visual cortices via convergent bottom-up projections.
The early visual cortices activate the first level of CDs, the CDZ ones. Depending on the exact pattern of activity in the corresponding early cortical sector, A CDZ may become activated or it may remain inactive. CDZ ones project upward to cdz twos just as cdz ones detected.
Activity patterns in the early visual cortices, cdz twos detect patterns of activity among cdz ones, several cdz twos may become activated, but for simplicity only one is depicted here via top down projections. Cdz ones may at the same time complete the activity pattern in the early visual cortices via several additional levels of c DZs CDZ twos. Twos project forward to CDZ ends in multimodal association.Cortices.
Again, multiple CDZ ends may become activated, but only one is depicted for reasons of simplicity. Again, CD Z twos also signal backward to CDZ ones, which in turn may further modify the pattern originally induced in the early visual cortices. The CDZ ends signal back to the CZ twos of all modalities.
In the auditory cortices, a neural pattern will be constructed, which permits the subject to experience in the mind's ear, an auditory image associated with the visually presented stimulus. The depicted top-down signaling to the somatosensory modality reflects the fact that almost any visual stimulus also implies some tactile associations. Thus, the framework predicts that a sound implying visual stimulus would lead to a content specific pattern of neural activity.
In the early auditory cortices. Analyzing this neural pattern using MVPA, it should therefore be possible to predict which of several sound implying visual stimuli a subject has seen. In the first experiment, neural activity was recorded from the early auditory cortices while subjects watched nine different video clips of objects and events that implied sound.
The region of interest in this experiment was a restricted area on the Semal plane, which comprised primary auditory cortex and very early auditory association cortices. In a second experiment, neural activity was recorded from primary somatosensory cortex while subjects viewed five different video clips that implied touch. In this experiment, the region of interest comprised the primary somatosensory cortex located in the postcentral gyrus.
Both studies involved eight subjects in the auditory study. An MVPA classifier performed above the chance level of 50%for all possible two-way discriminations between pairs of stimuli. In 26 out of the 36 discriminations classifier performance was significantly different from the chance level of 0.5.
Likewise, in the somatosensory study, the classifier performed above chance in all two-way discriminations, and eight out of the 10 discriminations reached statistical significance. As you can see, using crossmodal multivariate pattern analysis, we've been able to demonstrate that the perception of visual stimuli that implies sound or touch leads to content specific neural representations in early auditory and somatosensory cortices in accordance with the theoretical framework introduced earlier. Clearly, the experimental paradigm we have presented does not have to remain restricted to the modalities involved in our own experiments, but can be extended to other sensory modalities as well.
We hope therefore, that other groups will join us in extending this kind of research in an attempt to broaden our knowledge about how to bring in processes multimodal stimuli from the environment.