Monday, 20 January 2014

Integrating Gene Expression and Clinical Data in Sjogren's Syndrome

Integration of gene expression data with functional interaction and annotation data reveals patterns of connection between pSS-associated genes and the cellular processes in which they are involved


Katherine James 1,2, Jessica R Tarn 1, Shereen Al-Ali 1, Jennifer Hallinan 2, David A Young 1, Wan-Fai Ng 1

1 Musculoskeletal Research Group, Institute of Cellular Medicine, Newcastle University,

2 School of Computing Science, Newcastle University


Objectives: There is considerable discordance in data from different gene expression studies of primary Sjögren's Syndrome (pSS). Combining these data with other types of information, such as functional interactions and annotation data, can provide a more complete view of the cell in order to identify the key genes and biological pathways that are involved in the disease process of pSS.


Methods: In this study, a list of genes, found to be differentially expressed between pSS patients and controls in four large-scale microarray studies, was derived from the literature. The enrichments of Gene Ontology (GO) biological process annotations for this list were calculated in order to identify those processes that may be involved in pSS pathogenesis.


BioGRID is a comprehensive and highly-curated resource for functional association data generated by multiple experimental techniques. Using BioGRID data, a functional interaction network was generated in which nodes represented genes or gene products, and edges represented any type of BioGRID interaction between the nodes. The network was visualised using the Cytoscape visualisation platform and further annotated based on the Gene Ontology enrichment results. Finally, the network was filtered to produce sub-networks of pSS-associated genes.


Results: Following filtering, a total of 99 of the pSS-associated genes were involved in 111 interactions in the sub-network, the majority of which were connected in one component of 88 genes. All four gene expression datasets were represented within this connected component. Several tight clusters between genes annotated to the processes "innate immune response", "multi-organism process", "response to virus" and "response to stress" were observed in the integrated network. The sub-network also revealed patterns of interaction between these clusters and the pSS-associated genes. In addition, a large number of the pSS-associated genes were found to be annotated to these GO biological processes.


Conclusion: Gene enrichment and network analyses of the pSS-associated genes suggest that the innate immune responses, multi-organism processes, and the responses to virus and to stress are likely to be involved in pSS pathogenesis. Integration of multiple types of data in this manner can aid in the interpretation of results since combining diverse data sources reveals global properties not evident from a single data source. Future studies may benefit from incorporating additional detailed clinical data during the analysis of expression data in order to elucidate the relationship between gene expression and clinical phenotype.