Unravelling evolution of Nanog, the key transcription factor involved in self-renewal of undifferentiated embryonic stem cells, by pattern recognition in nucleotide and tandem repeats characteristics
Nanog, an important transcription factor in embryonic stem cells (ESC), is the key factor in maintaining pluripotency to establish ESC identity and has the ability to induce embryonic germ layers. Nanog is responsible for self-renewal and pluripotency of stem cells as well as cancer invasiveness, tumor cell proliferation, motility and drug-resistance. Understanding the underlying mechanisms of Nanog evolution and regulation can lead to future advances in treatment of cancers. Recent integration of machine learning models with genetics has provided a powerful tool for knowledge discovery and uncovering evolutionary pathways. Herein, sequences of 47 Nanog genes from various species were extracted and two datasets of features were computationally extracted from these sequences. At the first dataset, 76 nucleotide acid attributes were calculated for each Nanog sequence. The second dataset was prepared based on the 10,480 repeated nucleotide sequences (from 5 to 50bp lengths). Then, various data mining algorithms such as decision tree models were applied on these datasets to find the evolutionary pathways of Nanog diversion. Attribute weighting models were highlighted features such as the frequencies of AA and GC as the most important genomic features in Nanog gene classification and differentiation. Similar findings were obtained by tree induction algorithms. Results from the second database showed that some short sequence strings, such as ACTACT, TCCTGA, CCTGA, GAAGAC, and TATCCC can be effectively used to identify Nanog genes in various species. The outcomes of this study, for the first time, unravels the importance of particular genomic features in Nanog gene evolution paving roads toward better understanding of stem cell development and human targeted disorder therapy.