C5orf52 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C5orf52, chromosome 5 open reading frame 52 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 1914680; HomoloGene: 12129; GeneCards: C5orf52; OMA:C5orf52 - orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Chromosome 5 open reading frame 52 (C5orf52) is a gene of unknown function. It encodes the protein A6NGY3. The C5orf52 gene is strongly predicted to be localized in the cytoplasm.
Gene
This gene is found on the positive strand of chromosome 5 (5q33.3) which spans a total of 9218 nucleotides that make up the gene. C5orf52 codes for 2 introns and 3 exons with 537 base pairs of this gene being antisense to splice gene SOX30, which raises the possibility of regulated alternate expression.
Gene expression
There are multiple sources that predict C5orf52 being tissue-specific in normal tissues. It was expressed in the appendix, brain, colon, duodenum, endometrium, gall bladder, kidney, lung, lymph node, prostate, small intestine, spleen, urinary bladder, and was expressed in higher levels in the testis.
RNA
There is only one known isoform (C5orf52 isoform X1). Its sequence has a length of 1023 base pairs that encodes for 3 exons. Transcription starts at the 385th base pair and stops at the 864th base pair. This gene contains both a 5' UTR (length of 387 nucleotides) and a 3' UTR (length of 162 nucleotides).
Protein
The DUF5528 A6NGY3 is encoded by the C5orf52 gene and has a length of 159 amino acids. The molecular mass is 17.9 kDa and the isoelectric point is 10.8.
Function
Although the exact function of C5orf52 is unknown in humans, there is large evidence for the gene being associated with Spermatogenesis as there is very high expression in the testis, with lowered expression in the brain, colon, duodenum, and small intestine. C5orf52 does not have any transmembrane domains or signal sequences.
Structure
The protein is slightly serine rich, which is concentrated towards the beginning of the residue, and is overall slightly deficient in aspartic acid. The distribution of charged positive and negative amino acids in the protein are equally spread out and result in no big charged clusters. The predicted tertiary structures of the human protein were compiled with the use of multiple bioinformatic tools. All of the tools aided in predicting the protein to contain a long string of alpha helices near the C-terminus and extended strands near the N-terminus.
Gene level regulation
Different sites were identified to be present on the protein and these include: N-myristoylation site, amidation site, N-glycosylation site, cAMP- and cGMP-dependent protein kinase phosphorylation site, Casein kinase II phosphorylation site, and protein kinase C phosphorylation site. Other areas of the protein were predicted phosphorylation sites in Serine, Threonine, and Tyrosine. Only two Serines and one Threonine were strongly conserved with close orthologs.
Homology and evolution
Paralogs
There are no predicted paralogs for C5orf52 in Homo sapiens.
Orthologs
Orthologs were found by comparing the C5orf52 gene across NCBI’s database with different species' genetic codes. Twenty organisms from a variety of orders were selected to compare and further investigate. These species included mammals, reptiles, amphibians, birds, and an invertebrate. The data in the table was sorted by the sequence percent identity to the human protein and then sorted by date of divergence.
Orthologs of Homo sapiens C5orf52. Data is sorted by sequence identity and then date of divergence. Shading is associated with the grouping of organisms
Phylogeny
The oldest orthologs of human c5orf52 was found in Paralvinella palmiformis, which is an invertebrate with a date of divergence of around 686 million years ago. The length of the tree branch or the amount of time seems to be smaller with orthologs closer to the Humans. The upper cluster of animals are all mammals, which would follow the trend with similar identities being correlated to a smaller distance away from the gene in humans. The length of the branch is proportional to the date of divergence from humans.
Phylogenetic Tree containing ortholog species to the Human gene C5orf52. Tree is in Radial format meaning the distance of the line from the main branch describes species divergence. Source Phylogeny.fr
Protein divergence
When the human cytochrome C and fibrinogen alpha chain sequences were compared to its orthologs, the protein m (corrected percent divergence) trendline was very similar to that of the fibrinogen alpha chain. Fibrinogen alpha chain sequence has a fast rate of change over time, which indicates that human c5orf52 does as well.
Conserved regions
Multiple sequence alignments indicated amino acid residue conservation throughout the C5orf52 with close orthologs. The most highly conserved regions spanned throughout the middle of the protein around amino acid 90 and had strong clumped conservation towards the C-terminus, which didn't have strong conservation.
Interactions
C5orf52 is not predicted to have any binary interactions with proteins. The true reason for this is unknown at this point. One possible explanation is the lack of any transmembrane domains. It may also be because of the lack of information on C5orf52. It may play a role in specialized pathways and conditions that aren’t explored yet in the database. A neighboring gene, upstream on the negative strand, SOX30, was found to have 63 binary interactions on PSICQUIC.
References
- ^ GRCh38: Ensembl release 89: ENSG00000187658 – Ensembl, May 2017
- ^ GRCm38: Ensembl release 89: ENSMUSG00000020434 – Ensembl, May 2017
- "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
- "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
- "DeepLoc protein location prediction".
- "C5orf52 Chromosomal location". Gene Cards.
- "C5orf52 Splicing". NCBI AceView. Retrieved December 4, 2024.
- "NCBI Gene entry of C5orf52 Chromosome 5 open reading frame 52 [Homo sapiens]". NCBI Gene.
- "NCBI C5orf52 Gene Information". NCBI.
- "C5orf52 Nucleotide". NCBI. 25 August 2024.
- "A6NGY3 Classification". InterPro.
- "C5orf52 Protein". NCBI.
- "Protein Information". Gene Cards.
- "Isoelectric Point of A6NGY3". Archive Ensembl.
- "NCBI Gene entry of C5orf52 Chromosome 5 open reading frame 52 [Homo sapiens]". NCBI Gene.
- "SOSUI Transmembrane domains". SOSUI.
- "Protein compositional Tool". SAPS.
- "Protein Structure Results". iCn3D.
- "Protein Structure Database". Alphafold.
- "Protein Structure and Orientation Prediction". I-TASSER.
- "C5orf52 Sites". MyHit Motif Scan.
- "Phosphorylation Prediction". NetPhos.
- "Basic Local Alignment Search Tool". NCBI BLAST.
- "C5orf52 farthest ortholog". TimeTree.
- "Phylogenetic Tree Tool". Phylogenry.fr.
- "C5orf52 Binary Interactions". PSICQUIC View.
- "C5orf52 Interactions". IntAct.
- "SOX30 Binary Interactions". PSICQUIC View.