• Integrated TCGA and GSE132956 Analysis Utilizing the R-STRING-Cytoscape-GEPHI Pipeline: Identification of Cut-point Proteins and Biomarkers for the Early Detection of Pancreatic Ductal Adenocarcinoma
  • Neda ghanbari,1 Rezvan Hosseiny,2 Navid Abedpoor,3,*
    1. Department of Genetics, Fal.C., Islamic Azad University, Isfahan, Iran
    2. Department of Biotecnology, Fal.C., Islamic Azad University, Isfahan, Iran
    3. Department of Sports Physiology, Isf.C., Islamic Azad University, Isfahan, Iran.


  • Introduction: Pancreatic ductal adenocarcinoma (PDAC) continues to be a fatal tumor, with 5-year survival rates under 10%, primarily attributable to late-stage diagnosis. The current biomarker CA19-9 exhibits restricted sensitivity (79-81%) for early detection. Large-scale transcriptome datasets from TCGA and GEO offer unparalleled prospects for systematic biomarker identification through integrated bioinformatic methodologies. The goal is to use an integrated R-STRING-Cytoscape-GEPHI pipeline to perform an in-depth bioinformatics analysis of TCGA RNAseq and GSE132956 pancreatic cancer datasets, aiming to identify crucial proteins and develop strong biomarker panels for early PDAC detection with enhanced diagnostic efficiency.
  • Methods: We used R statistical software to analyze RNA sequencing data from TCGA (n=178 PDAC samples) and GSE132956 (n=11 PDAC samples) for differential expression analysis (|log₂FC| ≥ 1, P < 0.05). We used the STRING database (confidence score > 0.4) to build protein-protein interaction (PPI) networks, which we then imported into Cytoscape 3.9 for network topology analysis. To find hub genes, we used centrality techniques like degree, betweenness, and closeness centrality metrics. We used GEPHI software with modularity-based clustering to visualize the network and find communities. Biomarker validation using ROC analysis and survival analysis utilizing separate datasets.
  • Results: Differential expression analysis revealed 4,129 genes that were significantly dysregulated between PDAC and normal tissues. The STRING-based PPI network building showed that the primary linked component has 3,847 nodes and 28,156 edges (network density = 0.004, clustering coefficient = 0.398). Cytoscape centrality analysis found 24 hub genes, with TP53 (degree = 301, betweenness = 1.0, closeness = 0.63), EGFR (degree = 239), and VEGFA (degree = 253) as the main cutpoint proteins. AKT1, MYC, PIK3CA, KRAS, CCND1, and CDH1 were other important hub genes with degrees between 155 and 250. The GEPHI community discovery method found three main functional clusters that included all of the hub genes. Survival analysis indicated substantial predictive significance for hub genes (log-rank P < 0.001). The multi-biomarker panel that combined hub genes had an AUC of 0.94, a sensitivity of 89%, and a specificity of 92% for early-stage PDAC identification. Compared to CA19-9 alone, which had an AUC of 0.79, this was a much more effective outcome.
  • Conclusion: The combined R-STRING-Cytoscape-GEPHI analytical process was able to find important cut-point proteins and biomarker candidates in the TCGA and GSE132956 datasets. The 24-gene hub signature exhibits remarkable early detection proficiency and establishes a solid basis for clinical use to enhance PDAC screening results.
  • Keywords: Keywords: Pancreatic ductal adenocarcinoma, hub genes, biomarkers, early detection