Acessibilidade / Reportar erro

A Novel Approach of Dynamic Vision Reconstruction from fMRI Profiles Using Siamese Conditional Generative Adversarial Network

Abstract

This paper aims to improve the quality of reconstructed visual stimuli and reduce the computational complexity of the visual stimuli reconstruction processes in the form of functional Magnetic Resonance Imaging (fMRI) profiles. The preceding work envisions the non-cognitive contents of brain activity vain to integrate visual data of diverse hierarchical levels. Existing approaches such as Deep Canonically Correlated Auto Encoder detect the significant challenges of reconstructing visual stimuli from brain activity: fMRI noise, large dimensionality of a limited number of data instances, and complex structure of visual stimuli. In this activity, we will also analyze the scope for utilizing the spatiotemporal data to resolve the neural correlates of visual stimulus representations and reconstruct the resembling visual stimuli. The purpose of this work is to manipulate those suffering from developmental disabilities. A novel Siamese conditional Generative Adversarial Network (ScGAN) approach is proposed to resolve these significant issues. The key features of ScGAN are as follows: 1. Siamese Neural Network (SNN) is a dimensionality reduction approach that takes as visual stimulus information alloy component and its goals to discover each of them effectively. It shows the critical component of visual stimuli. 2. In a conditional Generative Adversarial Network, the labels portrayan expansion to a latent variable to better generate and discriminate visual stimuli. Experiments on four fMRI datasets prove that our technique can reconstruct visual stimuli precisely. The performance metrics are evaluated by Mean Squared Error (MSE), Accuracy, Pearson Correlation Coefficient (PCC), Losses, Structural Similarity Index (SSIM), Computational Time, etc. It proves that the proposed method yields better outcomes in terms of accuracy.

Keywords:
Vision reconstruction; fMRI; visual cortex; encoding and decoding; CGAN; SNN

HIGHLIGHTS

• We combined Siamese Neural Network with CGAN to create a SCGAN framework.

• Visual perception reconstructed by Conditional Generative Adversarial Network (CGAN).

• Reducing image loss, the linear version learns to be expecting the latent area out of Blood Oxygen Level Dependent.

• Reconstruction on video fMRI dataset was objectively identifiable.

INTRODUCTION

Scientists and intellectuals have been trying to understand and decode the cognitive process that enables humans to perceive and explore the visual world. We now explore how the human brain communicates active visual information with the outside environment and whether or not brain activity may be used to reconstruct what someone is seeing. These questions about brain decoding and encoding[11 Naselaris T, Kay KN, Nishimoto S, Gallant JL. Encoding and decoding in fMRI. NeuroImage. 2011; 56(2):400-10.] were mostly about using artificial visual stimuli.[22 Kamitani Y, Tong F. Decoding the visual and subjective contents of the human brain. Nat Neurosci. 2005; 8:679-85.-33 Haynes JD, Rees G. Decoding mental states from brain activity in humans. Nat Rev Neurosci. 2006; 7:523-34.] Such methods expose the underlying computing vision in a very narrow-centred way. A different approach that can navigate the complexities of natural vision and locate and interpret the visual representation of assigned brain activity is desired.

However, even as we can decipher the human brain activity estimated via fMRI into deep neural network features across more than one layer of the network[44 Shen G, Horikawa T, Majima K, Kamitani Y. Deep image reconstruction from human brain activity. PLOS Computational Biology.2019;15(1):e1006633. Available from: https://doi.org/10.1371/journal.pcbi.1006633
https://doi.org/10.1371/journal.pcbi.100...
], the enormous length of the visual stimuli features and the lack of normalization in the regression approach contribute to the low deciphering accuracy. Consequently, the reconstructed visual stimulus is slightly similar to the unique one. Some version improves the reconstructed visual stimuli to be much like real ones without using the categorical data of the visual stimuli. Another activity specializes in reconstructing a specific form of visual stimulus [55 VanRullen R, Reddy L. Reconstructing Faces from fMRI Patterns using Deep Generative Neural Networks. Communication Biology 2.2019; 193. Available from: https://doi.org/10.1038/s42003-019-0438-y
https://doi.org/10.1038/s42003-019-0438-...
-66 Isola P, Zhu JY, Zhou T, Efros AA. Image-to-Image Translation with Conditional Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR);2017 July 21-26; Honolulu, HI, USA. IEEE;2017. p. 5967-5976.], which has enhanced information but lacks generality.

This article suggests SCGAN, a brand new version that reconstructs significant naturalistic visual stimuli from fMRI. Our approach comprises a Siamese and conditional generative adversarial network that allows a voxel-wise visual stimuli generation. In our work, a neural encoder based on theSiamese Neural Network[77 Jiang L,Qiao K, Wang L, Zhang C, Chen J, Zeng L, et al. Siamese Reconstruction Network: Accurate Image Reconstruction from Human Brain Activity by Learning to Compare. Appl. Sci.2019;9: 4749.] extracts a visual feature of an input visual stimulus. The fMRI decoder determines the mapping out of the fMRI information to the extracted visual features. In the SCGAN, the reconstruction results in the coarse visual stimuli of the decoded visual features. The CGAN creates a significant naturalistic visual stimulus by a coarse one.

Through quantitative analysis of the visual stimuli reconstruction using our approach, we have determined an improvement of the reconstructed visual stimuli quality among different approaches.

Related work

There are only a few experiments briefing perceived visual stimuli reconstruction in the published writing of human brain decoding. An amalgamation of multi-scale local stimuli bases with predetermined shapes to reconstruct lower-order data, namely binary contrast patterns.[88 Miyawaki Y, Uchida H, Yamashita O, Sato M, Morito Y, Hiroki CT, et al.Visual image reconstruction from human brain activity using a combination of multiscale local image decoders. Neuron.2008; 60(5):915-29.] Used Deep belief networks[99 Marcel AJ, Van Gerven, Botond Cseke, Floris P De Lange, Tom Heskes. Efficient bayesian multivariate fmri analysis using a sparsifyingspatio-temporal prior. NeuroImage.2010; 50(1):150-61.-1010 Marcel AJ Van Gerven, Floris P, De Lange, Tom Heskes. Neural decoding with hierarchical generative models. Neural Comput. 2010; 22(12):3127-42.]and a simple linear Gaussian method[1111 Schoenmakers S, Barth M, Heskes T, Van Gerven M. Linear reconstruction of perceived images from human brain activity. NeuroImage, 2013. 83:951-61.] to reconstruct the handwritten digits and characters. Researchers are constructing a reconstruction method in which visual stimuli bases can be naturally evaluated using Bayesian canonical correlation analysis (BCCA).[1212 Fujiwara Y, Miyawaki Y, Kamitani Y. Modular encoding and decoding models derived from bayesian canonical correlation analysis. Neural Comput. 2013; 25(4):979-1005.]Furthermore, work is being done to rebuild natural movie data.[1313 Nishimoto S, Naselaris T, Benjamini Y, An T Vu, Bin Yu, Gallant JL. Reconstructing visual experiences from brain activity evoked by natural movies. Curr. Biol. 2011; 21(19):1641-1646.

14 Wen H,Shi J,Zhang Y, Lu KH, Cao J, Liu Z. Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision.Cerebral Cortex.2018;28(12):4136-4160. Available from: https://doi.org/10.1093/cercor/bhx268
https://doi.org/10.1093/cercor/bhx268...
-1515 Seeliger K, Guclu U, Ambrogioni L, Gucluturk Y, vanGerven MAJ. Generative ¨ adversarial networks for reconstructing natural images from brain activity. NeuroImage.2018; 181: 775-785. Available from: DOI: 10.1016/j.neuroimage.2018.07.043
https://doi.org/10.1016/j.neuroimage.201...
] Convolution at a deep level stacking deconvolution and convolution layers improves the generative adversarial network (GAN) design.[1616 Mozafari M, Reddy L, VanRullen R. Reconstructing Natural Scenes from fMRI Patterns using BigBiGAN. In 2020 International Joint Conference on Neural networks(IJCNN);2020 July 19-24;Glasgow, United Kingdom.IEEE; 2020.p 1-8.]

The BigBiGAN method allows for the reconstruction of naturalistic visual stimulus. Because of BigBiGANl’s latent space, this method generates conspicuous semantic data. It extracts visually appealing visual stimuli data from fMRI signal data.[1717 Qiao K, Chen J, Wang L, Zhang C, Tong L , Yan B. BigGAN-based Bayesian Reconstruction of Natural Images from Human Brain Activity. Neuroscience.2020; 444: 92-105. Available from: https://doi.org/10.1016/j.neuroscience.2020.07.040
https://doi.org/10.1016/j.neuroscience.2...

18 Donahue J,Simonyan K. Large Scale Adversarial Representation Learning. In 2019 33rd Conference on Advances in Neural Information Processing Systems (NeurIPS), 2019 Vancouver, Canada.vol. 32.Available from: https://doi.org/10.48550/arXiv.1907.02544
https://doi.org/10.48550/arXiv.1907.0254...
-1919 Du C, Du C, He H. Sharing deep generative representation for perceived image reconstruction from human brain activity. 2017 International Joint Conference on Neural Networks (IJCNN), 2017 May 14-19;Anchorage, AK, USA.IEEE; 2017. p. 1049-1056.Available from: DOI: 10.1109/IJCNN.2017.7965968
https://doi.org/10.1109/IJCNN.2017.79659...
] By combining GAN with Bayesian learning, the GAN-based Bayesian Visual Reconstruction Model (GAN-BVRM) aims to improve a reconstructed visual stimulus quality from a finite data collection fusion.[1818 Donahue J,Simonyan K. Large Scale Adversarial Representation Learning. In 2019 33rd Conference on Advances in Neural Information Processing Systems (NeurIPS), 2019 Vancouver, Canada.vol. 32.Available from: https://doi.org/10.48550/arXiv.1907.02544
https://doi.org/10.48550/arXiv.1907.0254...
] The Deep Generative Multi-view Model (DGMM) demonstrated better reconstruction accuracy of visual images.[2020 Jiang L, Qiao K, Wang L, Zhang C, Chen J, Zeng L,Bu H, Yan B. Siamese Reconstruction Network: Accurate Image Reconstruction from Human Brain Activity by Learning to Compare. Appl. Sci.2019; 9: 4749.] The Siamese Reconstruction Network (SRN) method uses the limited amount of training examples available. The several samples in the training data can be increased from n to 2n pairs using these methods.[2121 Ren Z, Li J, Xue X, Li X, Yang F, Jiao Z, Gao X. Reconstructing seen image from brain activity by visually-guided cognitive representation and adversarial learning,NeuroImage.2021; 228:117602. ISSN 1053-8119. Available from: https://doi.org/10.1016/j.neuroimage.2020.117602
https://doi.org/10.1016/j.neuroimage.202...
] The DVAE/GAN (Dual-Variational Auto Encoder/GAN) design successfully reduced visual stimulus clutter and noise and overpowered the modality gap.[2222 Goodfellow IJ,Abadie JP,Mirza M,Xu B, Farley DW, Ozair S, Courville A, Bengio Y. Generative Adversarial Networks. In NeurIPS.2014;2672-2680.]

Table 1
Characteristics table of the algorithms. (KL-KL divergence, MSE-Mean Squared Error, Adv-Adversarial Loss, MAE-Mean Absolute Error,E2E-End-to-End Training)

Motivation and justification

  • The preceding work envisions the non-cognitive contents of brain activity vain to integrate visual data of diverse hierarchical levels.

  • Recent research initiatives have revealed the possibility of establishing the neurological correlates of voxel information with their associated visual image.

  • In this activity, we will also analyze the scope for utilizing the spatiotemporal data to resolve the neural correlates of visual stimulus representations and reconstruct the resembling visual stimuli via the Deep Learning approach.

Contributions to the work

  • The voxel-wise encoding approaches exhibited various single-voxel portrayals and revealed their category representations.

  • The prior work shows that the reconstruction tasks are evaluated only by accuracy and losses. This paper evaluates the existing datasets of reconstruction tasks using ten metrics and numerous parameters.

  • The current work shows various approaches to reconstructing natural visual stimuli such as images, movies, etc. This work uses a unique algorithm to reconstruct multiple data sets.

Outline of the work

Figure 1
Framework Diagram for Siamese conditional Generative Adversarial Network

Organization of the paper

The remaining paper is organized as follows: The approach to the visual stimulus reconstruction challenge is briefly discussed in Section 2. The experimental setup of the suggested approach for natural movie reconstruction using fMRI profiles is presented in Section 3. Section 4 explains the suggested approach's performance analysis utilizing fMRI activity, and Section 5 draws a conclusion and identifies areas for further research.

MATERIAL AND METHODS

Siamese Neural Network (SNN)

The SNN[Figure 2] is a category of neural network framework in which two or more sub-networks are identical. “Identical” in this context relates to a similar configuration, comprising equal weights and parameters. Improving parameters is mimicked over sub-networks, and it is utilized to compare visual stimuli feature vectors to discover similarities between two visual stimuli. The Siamese Neural Network encodes specific features of a visual stimulus. The key benefits of the Siamese networks are more resistance to unbalanced classes; it nice to have a group with the best classifier; semantic similarity as a source of information. The disadvantages are that it takes longer to train than traditional networks and cannot generate probabilities. The mean squared error loss and binary cross entropy loss are incorporated in the training phase. As a result, the objective function in the Siamese network model is,

L s ( x i a , x i t ) = y ( x i a , x i t ) log p ( x i a , x i t ) + ( 1 y ( x i a , x i t ) ) log p ( 1 y ( x i a , x i t ) ) (1)

Where, xiait stands for the feature vector and xit stands for the voxel vector, it uses trained samples to calculate the Siamese loss function of the reconstruction process.

Figure 2
Siamese Neural Network Architecture

Generative Adversarial Network (GAN)

Supervised learning is built on enormous volumes of information; however, we occasionally run across data shortages. In this case, a generative model is utilized to convert supervised learning to semi-supervised learning. The GAN network is proposed by Ian Goodfellow. [2323 Mirza M,Osindero S. Conditional Generative Adversarial Nets.ArXiv.2014.Available from: https://doi.org/10.48550/arXiv.1411.1784
https://doi.org/10.48550/arXiv.1411.1784...
] Generative Adversarial Network concurrently trains two networks. The Discriminator (D) net accepts an input visual stimulus of a training set and returns a scalar indicating the likelihood that the input is in the training set. The Generator (G) generates visual images from an image vector taken at random with simple Gaussian distribution. The generator net learns to create visual stimuli to deceive the discriminator, while the discriminator learns to differentiate between genuine and produced data. For this reason, Generative Adversarial Network[Figure 3] is “adversarial”. In another way, the discriminator and generator play a min-max game over the value function V (G; D) as follows:

min G max D V ( D , G ) = min G max D { E x ~ P d a t a ( x ) [ log ( D ( X ) ) ] + E z ~ P Z ( z ) [ log ( 1 D ( G ( z ) ) ) ] } (2)

Pdata (x) denotes the actual distribution of instances, and PZ(z) denotes a latent space distribution computed over the random vector z.

Figure 3
GAN Architecture

Conditional Generative Adversarial Network (CGAN)

GAN was able to produce some better samples of data points, but it was unable to create data points with the goal label, and the dataset produced lacked diversity. Mirza [24] introduced the CGAN framework[ Figure 4], changed the architecture by adding additional label information y to the generator’s input and attempted to create the appropriate data point. It also adds additional information to the discriminator input to make it easier to discern between original and duplicate data.

Figure 4
Architecture of CGAN

The joint hidden representation in this framework combines the random input noise Z with the label y, and the GAN training phase provides plenty of flexibility in how it accepts input. The discriminator receives the data points X and y and the generative output G(z), the same as in the vanilla GAN network. The conditional generative adversarial network loss function is comparable to the GAN:

min G max D V ( D , G ) = { E x ~ P d a t a ( x ) [ log ( D ( x | y ) ) ] + E z ~ P Z ( z ) [ log ( 1 D ( G ( z | y ) ) ) ] } (3)

Proposed Method - Siamese Conditional Generative Adversarial Network (SCGAN)

This segment defines the novel method for reconstructing the visual images from brain activity profiles as the Siamese conditional generative adversarial network model.

The training set comprises visual images and fMRI activity patterns, which are denoted as I and F, respectively. SCGAN[Figure 1] is an approach for mapping between the data space F and latent space Z. In this algorithm, an encoder network is a Siamese Neural Network (SNN), and a decoder network is a Conditional Generative Adversarial Network (CGAN). The shared latent variables are treated as the following Gaussian prior distribution,

p ( Z ) = i = 1 N ( z i | 0, I ) (4)

SNN is made up of two symmetric neural network sub-networks with equal weights. In our work, we used one convolutional layer succeeded by three fully connected layers to create each identical neural network. All layers were activated using the rectified linear units (ReLU) nonlinearity, and the adaptive moment estimation (ADAM) optimizer was utilized to modify the learning rate. In this case, the feature vectors of that original data are denoted as fvi, and their output is indicated as ‘o’. It produces new input (In) data with (N+1) dimensional space.

I n = ( f v i T , o T ) (5)

Normalize each feature can be expressed as follows,

N = ( f v i m i ) / 2 σ i (6)

This normalized feature information is fed into the decoding network of the conditional generative adversarial network. The encoder function is E:I(Z.

The decoder network, the generator, and the discriminator of the generative models are conditioned on some additional information (a). For generating the equivalent data, a new additional parameter ‘a’ is added to the generator in CGAN. It can execute the conditioning by additional data ‘a’ offers an extra input layer to the generator (G) and the discriminator (D).

The preceding primary noisePz(Z) of input data and additional data ‘a’ are mixed in combined hidden representations in the generator. The adversarial training paradigm provides a great deal of flexibility in the composition of this hidden representation. The learning function of the generator network is trained alongside two discriminator networks named DN and Dz,which learn to distinguish between the original and reconstructed data and latent points.The decoder function is D: Z(I.

The SCGAN objective function is mathematically expressed as,

min G max D V ( D , G ) = min G max D { E I ~ P d a t a ( I ) [ log ( D ( I a ) ) ] + E z ~ P Z ( z ) [ log ( 1 D ( G ( z a ) ) ) ] } (7)

“From the CGAN network,

Next, find the best discriminator among all the discriminators that are possible for this particular optimization. There may be multiple discriminators that can satisfy this optimization, but how to find the best discriminator D. For Generator (G) fixed, the optimal discriminator D obtained as,

D G * = P d a t a ( x ) P d a t a ( x ) + P g ( x ) (8)

The training criterion for the discriminator D, given any generator G, is to maximize the loss function equation. Hence, the optimal discriminator for a given G denoted as,

D G * = arg max D V ( D , G ) (9)

So, DG* is the optimal value as well as the maximum value.

The role of the generator is in reverse of that of D so that the optimal value of G could minimize the loss function occurs, when D=DG*. So, the optimal solutionG* as,

G * = arg min G V ( D G * , G ) (10)

At this point, the optimization problem stated in (A) has a unique solution G*also, this solution satisfies Pg=Pdata. so, the optimal DG* is,

D G * = P d a t a ( x ) P d a t a ( x ) + P g ( x ) (11)

Natural Image reconstruction procedure

Algorithm for SCGAN

Require: Reconstruction of Visual Stimuli (I).

Input:Visual Stimuli I , fMRI Pattern F.

Output: actual samples Pdata (I), label a, latent space variable Z, reconstructed sample I.

Initialize:

  • 1) Set the visual image training set to I and fMRI activity as F. Here, I and F . Set the latent variables indicated as Z . : Coefficient value of the Image.

  • 2) Elect the voxels that address the visual stimuli. Such voxel activity is assessed using the coefficient of determination implied as2.

  • 3) Enumerate a2 value through 5-fold cross-validation.

  • 4) Extract the features of stimuli implied as Pdata(I) via SNN.

  • 5) For each training recurrence
    1. Discriminator training:
      1. Select the stochastic mini batch of actual samples Pdata (I) and their equivalent labels (a) and it indicated as (I, a).

      2. Calculate D (I, a) as a mini-batch and back propagate the loss to amend θ(D) use to reduce a loss function.

      3. Choose the mini batch of latent space variable (z) and additional information (a) and it is implied as (z, a). It produces a counterfeit sample of visual stimulus. G(z,a)=I*|a

      4. Evaluate D (I*|a, a) as the mini-batch and back propagate the loss to modify θ(D)using the loss function minimization.

    2. Generator training:
      1. Select the mini batch of the random noise vector (z) and additional data (a) and it indicated as (z, a). It creates a counterfeit instance of the visual stimulus. G(z,a)=I*|a.

      2. Calculate D (I*|a, a) as the mini-batch, and the loss should be back propagated to amend θ(G) used to minimize the loss function.

End for

End

EXPERIMENTAL RESULTS ANALYSIS & DISCUSSIONS

Dataset Description

This section provides an overview of publicly accessible datasets utilized in Deep Learning-based visual stimuli reconstruction out of fMRI activity profiles. At the same time, many datasets are available for reconstructing the visual stimulus, such as natural movies. Table 1 shows several distinguishing aspects of datasets.

Natural Movies:[1414 Wen H,Shi J,Zhang Y, Lu KH, Cao J, Liu Z. Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision.Cerebral Cortex.2018;28(12):4136-4160. Available from: https://doi.org/10.1093/cercor/bhx268
https://doi.org/10.1093/cercor/bhx268...
] It carries 374 video clips out of random video blocks and YouTube videos at a pixel resolution of 800*600, and the number of voxel information in this dataset is 10214. The fMRI activity pattern taken from the Region of interest is V1, V2, V3, V4, LOC, PPA, FFA, TPJ, LIP, FEF, and PEF.

Table 2.
The dataset description of Natural movies dataset.

Voxel Preferences

Most voxels do not acknowledge visual images; voxel selection is crucial for brain decoding techniques. One popular strategy is to select the voxels correlated with a visual stimulus during the training section. We select voxels for which approach gives improved predictability in encoding operation.

It validates our idea that the visual stimulus’s best-predicted voxels should be involved in the decoding model. The2 (Coefficient of Determination), which reflects a percentage of variance described with the model, quantifies the model fit between data predictions and computes the voxel activity. In this task, we first estimated the voxel on training phase data using fivefold cross-validation and chose the voxel information with a positive coefficient of determination for further study.

Performance Metrics

The performance of visual stimuli reconstruction of Natural movies data is calculated using the succeeding performance metrics. The description and formula of each metric are shown in Table 3.

Table 3
Performance Metrics Description

Results and Discussion

The evaluation of visual stimulus reconstruction tasks based on performance metrics is presented below:

Performance analysis of Reconstruction task based on Performance Metrics

Table 4to Table 13 show a performance evaluation ofthenatural movie reconstruction taskbased on performance metrics discussed in Table 2.

Table 4
Performance Evaluation based on PCC (mean±std)

Table 5
Performance Evaluation based on MSE (mean±std)

Table 6
Performance Evaluation based on SSIM (mean±std)

Table 7
Performance Evaluation based on PSM (Standardized difference between real and reconstruct samples)

Table 8
Performance Evaluation based on MMD (Difference along with the samples feature mean)

Table 9
Performance Evaluation based on IS (mean ± std)

Table 10
Performance Evaluation based on FID (mean ± std)

Table 11
Performance Evaluation based on Accuracy (%)

Table 12
Performance Evaluation based on Losses up to 5000 epochs

Table 13
Performance Evaluation based on Computational Complexity

Experimental Result

To test the SCGAN model, we have used the natural movies dataset, a high-grade quality dataset, with the fMRI profile recording of three subjects while presented with the images. The dataset consists of the BOLD activity of images from voxels (10214) in visual cortex areas of the brain, such as V1, V2, V3, V4, LOC, PPA, FFA, TPJ, LIP, FEF, and PEF. Actual images in the work use data information from subject 1.”

The reconstructed natural movies by the proposed SCGAN algorithm are displayed in Figure 5. The initial row indicates the actual visual stimulus, and the next row indicates reconstructed images of the visual stimulus. The accuracy rate of reconstructed images is 44%.

Figure 5
Visual Stimulus reconstruction for Natural Movies dataset.

DISCUSSION

This research provides a growing body of knowledge on using deep learning approaches to model and understands human brain activity representations of visual stimuli. It builds on the previous finding with static and dynamic visual stimuli by extending the suggested method (SCGAN) to characterize and interpret fMRI activity as dynamic and static visual images. These findings support the hypothesis that the process of feed-forward directed at visual stimuli identification has a significant impact on the brain responses underlying dynamic images, not just for a ventral stream but also, to a lesser extent, for a dorsal stream. It provides information about the visual representation of the dorsal stream.

Despite the lack of repetition or neural feedback links, the SCGAN allows the construction of a fully quantifiable prediction approach of brain activity responses to every visual input. The voxel-by-voxel encoding method for depicting single-voxel depictions demonstrates the distinct activities of different brain areas during vision.

It also develops a high throughput for synthesizing brain activities to visual stimuli, enabling brain mapping of class portrayal and selection without using fMRI trials. Furthermore, this study enables decoding brain fMRI responses in semantic and visual spaces, allowing for real-time visual stimulus reconstruction.

CONCLUSION

To solve the challenge of visual stimulus reconstruction, we introduced a novel approach using a Siamese conditional generative adversarial network model. Through latent variables, we can find various relationships between fMRI activity pattern voxels and visual stimuli pixels. We also created a prediction distribution that successfully recreated visual stimuli using fMRI activity patterns. Although we focused on the difficulty of visual stimuli reconstruction in this paper, we can also use our method to solve encoding problems. The superiority of the offered approach has been proven by extensive experimental testing.

In the future, we will have two demanding and hopeful options. In our method, we can first study the reconstruction of active vision using several GAN models. Second, we may investigate multi-subject decoding by using each subject's fMRI signals as a single entity.

REFERENCES

  • 1
    Naselaris T, Kay KN, Nishimoto S, Gallant JL. Encoding and decoding in fMRI. NeuroImage. 2011; 56(2):400-10.
  • 2
    Kamitani Y, Tong F. Decoding the visual and subjective contents of the human brain. Nat Neurosci. 2005; 8:679-85.
  • 3
    Haynes JD, Rees G. Decoding mental states from brain activity in humans. Nat Rev Neurosci. 2006; 7:523-34.
  • 4
    Shen G, Horikawa T, Majima K, Kamitani Y. Deep image reconstruction from human brain activity. PLOS Computational Biology.2019;15(1):e1006633. Available from: https://doi.org/10.1371/journal.pcbi.1006633
    » https://doi.org/10.1371/journal.pcbi.1006633
  • 5
    VanRullen R, Reddy L. Reconstructing Faces from fMRI Patterns using Deep Generative Neural Networks. Communication Biology 2.2019; 193. Available from: https://doi.org/10.1038/s42003-019-0438-y
    » https://doi.org/10.1038/s42003-019-0438-y
  • 6
    Isola P, Zhu JY, Zhou T, Efros AA. Image-to-Image Translation with Conditional Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR);2017 July 21-26; Honolulu, HI, USA. IEEE;2017. p. 5967-5976.
  • 7
    Jiang L,Qiao K, Wang L, Zhang C, Chen J, Zeng L, et al. Siamese Reconstruction Network: Accurate Image Reconstruction from Human Brain Activity by Learning to Compare. Appl. Sci.2019;9: 4749.
  • 8
    Miyawaki Y, Uchida H, Yamashita O, Sato M, Morito Y, Hiroki CT, et al.Visual image reconstruction from human brain activity using a combination of multiscale local image decoders. Neuron.2008; 60(5):915-29.
  • 9
    Marcel AJ, Van Gerven, Botond Cseke, Floris P De Lange, Tom Heskes. Efficient bayesian multivariate fmri analysis using a sparsifyingspatio-temporal prior. NeuroImage.2010; 50(1):150-61.
  • 10
    Marcel AJ Van Gerven, Floris P, De Lange, Tom Heskes. Neural decoding with hierarchical generative models. Neural Comput. 2010; 22(12):3127-42.
  • 11
    Schoenmakers S, Barth M, Heskes T, Van Gerven M. Linear reconstruction of perceived images from human brain activity. NeuroImage, 2013. 83:951-61.
  • 12
    Fujiwara Y, Miyawaki Y, Kamitani Y. Modular encoding and decoding models derived from bayesian canonical correlation analysis. Neural Comput. 2013; 25(4):979-1005.
  • 13
    Nishimoto S, Naselaris T, Benjamini Y, An T Vu, Bin Yu, Gallant JL. Reconstructing visual experiences from brain activity evoked by natural movies. Curr. Biol. 2011; 21(19):1641-1646.
  • 14
    Wen H,Shi J,Zhang Y, Lu KH, Cao J, Liu Z. Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision.Cerebral Cortex.2018;28(12):4136-4160. Available from: https://doi.org/10.1093/cercor/bhx268
    » https://doi.org/10.1093/cercor/bhx268
  • 15
    Seeliger K, Guclu U, Ambrogioni L, Gucluturk Y, vanGerven MAJ. Generative ¨ adversarial networks for reconstructing natural images from brain activity. NeuroImage.2018; 181: 775-785. Available from: DOI: 10.1016/j.neuroimage.2018.07.043
    » https://doi.org/10.1016/j.neuroimage.2018.07.043
  • 16
    Mozafari M, Reddy L, VanRullen R. Reconstructing Natural Scenes from fMRI Patterns using BigBiGAN. In 2020 International Joint Conference on Neural networks(IJCNN);2020 July 19-24;Glasgow, United Kingdom.IEEE; 2020.p 1-8.
  • 17
    Qiao K, Chen J, Wang L, Zhang C, Tong L , Yan B. BigGAN-based Bayesian Reconstruction of Natural Images from Human Brain Activity. Neuroscience.2020; 444: 92-105. Available from: https://doi.org/10.1016/j.neuroscience.2020.07.040
    » https://doi.org/10.1016/j.neuroscience.2020.07.040
  • 18
    Donahue J,Simonyan K. Large Scale Adversarial Representation Learning. In 2019 33rd Conference on Advances in Neural Information Processing Systems (NeurIPS), 2019 Vancouver, Canada.vol. 32.Available from: https://doi.org/10.48550/arXiv.1907.02544
    » https://doi.org/10.48550/arXiv.1907.02544
  • 19
    Du C, Du C, He H. Sharing deep generative representation for perceived image reconstruction from human brain activity. 2017 International Joint Conference on Neural Networks (IJCNN), 2017 May 14-19;Anchorage, AK, USA.IEEE; 2017. p. 1049-1056.Available from: DOI: 10.1109/IJCNN.2017.7965968
    » https://doi.org/10.1109/IJCNN.2017.7965968
  • 20
    Jiang L, Qiao K, Wang L, Zhang C, Chen J, Zeng L,Bu H, Yan B. Siamese Reconstruction Network: Accurate Image Reconstruction from Human Brain Activity by Learning to Compare. Appl. Sci.2019; 9: 4749.
  • 21
    Ren Z, Li J, Xue X, Li X, Yang F, Jiao Z, Gao X. Reconstructing seen image from brain activity by visually-guided cognitive representation and adversarial learning,NeuroImage.2021; 228:117602. ISSN 1053-8119. Available from: https://doi.org/10.1016/j.neuroimage.2020.117602
    » https://doi.org/10.1016/j.neuroimage.2020.117602
  • 22
    Goodfellow IJ,Abadie JP,Mirza M,Xu B, Farley DW, Ozair S, Courville A, Bengio Y. Generative Adversarial Networks. In NeurIPS.2014;2672-2680.
  • 23
    Mirza M,Osindero S. Conditional Generative Adversarial Nets.ArXiv.2014.Available from: https://doi.org/10.48550/arXiv.1411.1784
    » https://doi.org/10.48550/arXiv.1411.1784
  • Funding :

    This research received no external funding.

Edited by

Editor-in-Chief:

AlexandreRasi Aoki

Associate Editor:

AlexandreRasi Aoki

Publication Dates

  • Publication in this collection
    03 July 2023
  • Date of issue
    2023

History

  • Received
    04 May 2022
  • Accepted
    24 Mar 2023
Instituto de Tecnologia do Paraná - Tecpar Rua Prof. Algacyr Munhoz Mader, 3775 - CIC, 81350-010 Curitiba PR Brazil, Tel.: +55 41 3316-3052/3054, Fax: +55 41 3346-2872 - Curitiba - PR - Brazil
E-mail: babt@tecpar.br