How do transcription factors function in the expression of genes




















In these previous methods, the whole promoter regions were always used as transcriptional regulatory regions that include TFBSs. As we all know, promoter regions are much longer than TFBSs; therefore, it will be better for TFBS prediction if we can narrow down the potential transcription factor binding region. As early as the s, the gene transcription was found to be related with the sensibility to DNase I deoxyribonuclease I of chromatin [ 12 ]. The sensibility to DNase I of chromatin which contains the actively transcribed genes is times stronger than the one of the chromatin which does not contain the actively transcribed genes [ 13 ].

In , Sheffield et al. The structure of the chromatin that contains DHS sites is looser, so that gene regulatory proteins can bind to these regions preferentially to exert biological functions [ 15 — 18 ]. Within the DHS sites, the regions are not digested easily and protected by specific proteins which probably are gene regulatory proteins such as transcription factors.

In this study, the DHS sites were combined with gene expression data to deduce the target genes, and it was found that approximately 71 percent of DHS sites associated with at least one gene and some of these DHS sites associated with up to 44 genes, and among these genes the protein-coding genes were more than RNA genes. In our previous study, a model-based procedure has been developed to predict the functional TFBSs. The model utilized known position weight matrix to identify potential TFBSs in the gene promoter regions and built quantitative relationship between the TFBSs and gene expression levels.

The transcriptional regulatory region was arbitrarily defined as the upstream region of transcription start site. In this study, we proposed a modified method that combined the DNase I hypersensitive sites with promoter regions to promote the accuracy of TFBS identification and recognize the regulatory function of transcription factors. The cervical cancer HeLaS3 cell, which is a clonal derivative of the parent HeLa cell, has been very useful in the clonal analysis of mammalian cell populations relating to chromosomal variation, cell nutrition, and plaque-forming ability.

In recent years, as a tier of 2 cell types of ENCODE project, large sets of genome-wide study used the next generation sequencing technology to investigate gene expression, transcription factor binding sites, histone modification, and DNase I hypersensitive sites in HeLaS3 cell line. In this study, using genome-wide gene expression profile combined with DNase I hypersensitivity data, we developed a new method to predict the most important transcript factor in interferon alpha treated HeLaS3 cell line.

We utilized the Quantile Normalization [ 21 ] to eliminate the difference among the parallel experiments and then used the Scaling Normalization [ 22 ] to eliminate the difference between two cell types. The genes not reliably detected in at least one of the two cells were removed and only the protein-coding genes were picked up.

Removing the probe sets that were not reliably detected and that had absent annotation; finally, differentially expressed genes [ 23 ] were left for analysis, in which were upregulated and 60 were downregulated. In order to describe the correlation between the genes expression levels and the binding affinity of transcription factors in DHS sites, a simplified quantitative relationship is established using a linear model:.

Because the expression level of genes we used in this study was Log2 RMA expression value, g k was calculated according to the following formulation:. The linear model only described the quantitative relationship between gene expression levels and PWMs of one differentially expressed gene. Thus, the model can be rewritten in a matrix formulation:. D is the score matrix representing the maximum score of each motif candidate in each DHS site. The model error based on a given selection of TFs will be defined as the sum square of the differences between observed and predicted mRNA expression levels:.

This equation can be rewritten in a matrix formulation:. The model error of each set of PWMs was calculated.

The TFCV can be calculated by the following formulation:. The program of functional transcription factor prediction can be summarized as follows. For each PWM, the threshold value ts is set as the th highest score. Construct the matrix C by comparing the position of DHS site and gene's regulatory region coordinate in the genome. Therefore, transcription factors tend to bind to the DHS sites and we can utilize the DHS sites to improve the accuracy of transcription factor binding sites prediction.

Overlapping between transcription factors binding regions and DHS sites. The blue bar and red bar represent the percentage of transcription factors that overlap and do not overlap with the DNase I hypersensitive sites, respectively.

Potential PWMs which corresponded to the binding sequence of a specific transcription factor were selected based on the binding affinity within DHS sites in the gene promoter region, as detailed in the methods. In order to predict the transcription factor binding sites, we calculated the score matrix D which stored the maximum scores as the binding affinity between the transcription factors and DHS sites.

In these PWM candidates, not all of them are real functional transcription factor binding sites. According to the methods, if the TFCV scores of PWMs are higher, their contributions to the alteration of gene expression are more significant. ISGF-3 is also a transcriptional activator induced by interferon alpha.

To verify the accuracy of our model, we repeatedly run our model by changing the number of TFBSs to top , , , or highest scores for each PWM. The Pearson correlation coefficient between the TFCV scores of each pair of predicted results was calculated. A heatmap corresponding to the Pearson correlation coefficient is shown in Figure 4. Obviously, the correlation between the prediction of top and top is the lowest 0. Most of the top 10 PWMs are the same among these five prediction results, and most of them belong to interferon regulatory factor family.

In this study, we modified the previous procedure ModifModeler to identify functional transcription factors. In the previous procedure, the transcription factor binding regions were set as the promoter regions [ 24 ]. To improve the accuracy of the identification of transcription factor binding sites, we reduced the searching space of transcription factor binding regions. We have known that transcription factors tended to bind to DNase I hypersensitive sites; thus we combined the DNase I hypersensitive sites with promoter regions to construct a new model.

In our model, using DHS sites within transcriptional regulatory region of each differentially expressed gene to replace all promoter regions, the binding regions of transcription factors were shortened and the accuracy of predicting transcription factor binding sites was improved.

These predicted top 10 transcription factors with the largest TFCVs made significant contribution to the alteration of gene expression after interferon treatment. Meanwhile a factor named interferon-stimulated response element ISGF-3 also contributes to the alteration of gene expression significantly. It also indicates that our modified model can identify transcription factors which induced the gene expression change. The identification of transcription factor binding sites is still a challenging and meaningful area.

In the future, the identification of transcription factor binding sites will be very important and helpful for the understanding of the gene regulation mechanism [ 28 ]. Gene expression is regulated by many different elements synthetically. To predict different regulatory elements and understand their function, we also need to modify our model to adapt to various gene regulatory elements, such as microRNA and RNA binding proteins.

In summary, focusing on the integration with DNase I hypersensitive sites allows high accuracy in our prediction procedure. As we all know, the identification of transcription factor binding sites can be used in clinic to find the change of regulatory elements in damaged or diseased cells and then help with the therapy of disease in the gene expression level [ 29 ].

We believe that our optimized method will contribute to an existing analytical network of gene expression. This page has been archived and is no longer updated. Transcription factors include a wide number of proteins, excluding RNA polymerase, that initiate and regulate the transcription of genes. One distinct feature of transcription factors is that they have DNA-binding domains that give them the ability to bind to specific sequences of DNA called enhancer or promoter sequences.

Some transcription factors bind to a DNA promoter sequence near the transcription start site and help form the transcription initiation complex. Other transcription factors bind to regulatory sequences, such as enhancer sequences, and can either stimulate or repress transcription of the related gene. As a process, it may explain why organisms containing mostly the same DNA exhibit different cell types and functions 5. Gene expression is an intricate process and involves the coordination of multiple dynamic events, which are subject to multi-level regulation 6.

Those regulatory levels include the transcriptional level, the post-transcriptional level, the translational level and the post-translational level. Regulating gene expression is crucial in living organisms 7. Gene regulation is essential in cellular differentiation in multicellular organisms, since it can contribute to the function and the structure of a specific cell, and is an integral part of organism development 4.

All of the above prove that apart from inherited genetic information, cell function and structure are influenced by information that is not encoded in the DNA sequence. This information has also been termed epigenetic information 5.

Epigenetics is defined as both heritable alterations in gene activity and expression and also stable, long-term alterations in the transcriptional potential of a cell that may not be heritable 8.

Epigenetics comprises of a number of mechanisms, which include DNA methylation, histone modification, post-translation modifications, chromatin remodeling and various forms of regulatory RNA molecules. These mechanisms seem to influence gene expression 5.

Gene transcriptional regulation is a fundamental part of both tissue-specific gene expression and gene activity in response to stimuli 9.

The main regulators of gene transcription are transcription factors TFs. TFs are defined as proteins that can bind specific DNA sequences to control transcription Each cellular life form follows different strategies for the initiation and regulation of transcription. Bacteria have two distinct mechanisms for the initiation of transcription, the promoter-centric mechanism, in which specific TFs interact with the promoter in order to alter its ability to bind RNA polymerase or RNA-centric mechanism, in which TFs interact with RNA in order to alter its promoter preference In eukaryotes, a number of TFs interact with their cognate DNA motifs and recruit transcriptional cofactors to alter the chromatin environment.

Lastly, the archaea transcriptional mechanism can be summarized as a simplified version of the eukaryotic transcriptional mechanism Archaea feature a transcriptional apparatus that includes additional RNA polymerase subunits and basal TFs that direct transcription initiation and elongation. The above underline the importance of TFs in both the initiation and regulation of gene transcription.

The activation of TFs is quite complex and may involve multiple intracellular transduction pathways or direct activation through specific molecules that bind, known as ligands TFs mostly regulate gene activity by binding to specific short DNA base pair patterns termed motifs or cis -regulatory elements CREs in upstream, intron, or downstream regions of target genes.

They can also act by interacting with other genomic locations that may be distant to the primary DNA sequence These are defined as gene regulatory regions. The interaction between DNA and TFs goes beyond the structural and sequence level since several other factors participate in the process, such as the influence of cofactors, epigenetic modifications and the cooperative binding of other TFs Thus, gene regulation involves a large number of molecular mechanisms.

Therefore, an in-depth examination of the evolution of TFs, which takes into account the interaction with all the molecular factors mentioned above, and the manner through which TFs influence the evolution of other molecular mediators, is essential to the understanding of organism evolution. TF function involves two basic features: i The ability to recognize and bind short, specific sequences of DNA within regulatory regions; and ii the ability to recruit or bind proteins that participate in transcriptional regulation Consequently, the evolution of TFs mainly depends on alterations in binding sites, binding partners and expression patterns Moreover, as an integral part of gene expression, they are closely related to the evolution of epigenetic mechanisms 5.

The current literature on TF evolution provides a broad range of information. Firstly, gene duplication and gene loss as crucial drivers of evolution 21 , 22 are subsequently important drivers of TF evolution. Regardless of organism complexity, they are present in all domains of life.

Duplication and deletion can influence transcriptional regulatory networks by increasing or reducing the number of TFs with specific binding preferences 23 , Following the duplication of a TF gene, the two resulting gene copies are likely the same. Since they share the same sequence, including the DBD sequence, they bind to the same target genes.

Ensuing mutations in the DNA binding domain sequence can lead to one of the TF copies to switch to regulating different target genes. On a more lineage-specific level, TFs display several differences. Although the basal transcription machinery has long been considered universally conserved, it is currently accepted that it too diversifies during evolution.

The size and subunit composition of the basal transcription machinery increase highly during evolution, consisting of roughly 6 subunits in bacteria, up to 15 in the archaea, and a large number in eukaryotes, which have at least 3 different RNA polymerases Significant differences are apparent between prokaryotes and eukaryotes. Firstly, some DBDs are specific to evolutionary lineages; e. Moreover, eukaryotic TFs are relatively longer than other eukaryotic proteins with a different function, while this association is reversed in prokaryotes.

This phenomenon may be due to the fact that eukaryotic TFs have a number of long intrinsic disordered segments that are needed to leverage the formation of a multi-protein transcription protein complex Another characteristic specific to eukaryotes are the repeats of the same DBD family in one polypeptide chain. This characteristic may be the result of a mechanism eukaryotes use that increases the length and diversity of DNA binding recognition sequences using a limited number of DNA binding domain families Several factors seem to affect the evolution, emergence, disappearance and function of CREs.

These factors include insertion and deletion mutational mechanisms, slippage processes, tje large rearrangement of promoter regions, co-operation amongst TFs and the existence of initial sequence distributions that are biased towards the mutational neighborhood of strongly binding sequences 30 , Insertion and deletion mutational mechanisms can lead to the slow emergence of binding sites out of a random sequence, while factors that accelerate these processes may include the already sufficient genomic sequence from which sites can evolve and the possible co-operativity between adjacent TFs Furthermore, since the interaction of TFs' with TF binding sites is integral in gene regulation, a mutation in either TF or binding site hinders that interaction and may lead to dysfunctional gene expression.

Therefore, in order to maintain proper gene expression levels, TF evolution and CREs evolution are closely intertwined They specifically bear a co-evolutionary association, where in order to sustain proper interaction, a mutation in one interacting partner could be compensated by a corresponding mutation in its' interacting partner during the course of evolution Although prokaryote individual TFs can recognize long DNA motifs that are alone capable of defining the genes they may regulate, organisms with larger genomes are characterized by TFs that recognize sequences too short to be able to define unique genomic positions.

Moreover, the development of multicellular organisms requires molecular systems that are complex and able to execute combinational processes. In an effort to overcome these obstacles, organisms have evolutionary developed co-operative recognition of DNA by multiple TFs TFs can collaborate through a variety of mechanisms, with each co-operative mechanism determining the specifics of the regulatory interaction.

Some of the mechanisms through which TFs cooperate include protein-protein interaction and indirect co-operation A prime example of protein-protein interaction among TFs is the formation of functional dimers. A number of eukaryotic TFs proteins are not able to bind DNA sequences as monomeric proteins and require physical interaction with an identical molecule or one within the same family to form functional dimers that are able to bind targeted DNA sequences.

It has been suggested that, at first, TFs function as monomers, something supported by the fact that TFs in less complex organisms can sufficiently bind target sequences as monomers Several promoters that include symmetrical palindromic repeats of the DNA-recognition motif could have potentially brought two or more copies of the same TF protein into proximity. If, by chance, an interaction domain with only one interaction sequence appeared, then this would help establish the formation of a TF complex on DNA because this specific complex would recognize a larger DNA motif These events would lead to more relaxed evolutionary constraints on the TF DBD within a redundant duplicate gene and would allow the emergence of a DNA-binding domain that binds with less affinity, but is still functional.

Once such evolutionary steps are taken, the TF must function as an obligate dimer.



0コメント

  • 1000 / 1000