• Using Bayesian Functional Principal Component and multi-nomial scalar-on-function regression with application in gene - data
  • Mohammad Fayaz,1 Alireza Abadi,2,* Soheila Khodakarim,3 Abolfazl Movafagh,4
    1. PhD Student of Biostatistics - School of Allied Medical Sciences, Shahid Beheshti Univesity of Medical Sciences, Tehran, Iran
    2. Professor of Biostatistics, Department of Community Medicine, Faculty of medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
    3. 4Associate Professor, School of Allied Medical Sciences, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran
    4. Department of Medical Genetics, Shahid Beheshti University of Medical Sciences


  • Introduction: Functional Data analysis is a branch in statistics that considers the underlying curves for the observed data with dimension reduction methods. The high-dimensional dataset exists in any discipline such as bioinformatics. In this research, we introduce Bayesian functional principal component analyses and multi-nomial scalar-on-function regression by applying them with extensive simulation and a sample dataset.
  • Methods: We use functional principal component analysis with B-Spline basis functions and generalized cross-validation for smoothing them. We also consider prior distribution for eigenfunctions and estimate the posterior distribution and compare the results with the Winbugs and R. We also extend and use the Bayesian scalar-on-function regression to the multinomial responses for predicting the tumor type in the high-dimensional gene data. The sample dataset contains 63 subjects with 2308 gene expression measurements for four groups of small round blue cell tumors for tissue samples and extensive simulation.
  • Results: The minimum number of basis function to get the highest prediction accuracy for the model predictions were estimated. The overall accuracy, specificity, and sensitivity are 100%. We estimate the coefficient functions of each cell tumor type.
  • Conclusion: The high number of basis functions can model the underlying curves very precisely, but it takes time and increase complexity. We reach the same accuracy with the lower number of basis functions and smoothing parameters. The functional principle components are an efficient method for dimension reduction. We get the result with the extracted eigenfunctions and subject-specific scores in the regression.
  • Keywords: Biostatistics, Functional Data Analysis, Bayesian Data Analysis, Cancer