Sinha S

About the dark corners in the gene function space of Escherichia coli remaining without illumination by scientific literature

Although Escherichia coli (E. coli) is the most studied prokaryote organism in the history of life sciences, many molecular mechanisms and gene functions encoded in its genome remain to be discovered. This work aims at quantifying the illumination of the E. coli gene function space by the scientific literature and how close we are towards the goal of a complete list of E. coli gene functions.


Conserved sequence motifs in human TMTC1, TMTC2, TMTC3, and TMTC4, new Omannosyltransferases from the GT-C/PMT clan, are rationalized as ligand binding sites

The human proteins TMTC1, TMTC2, TMTC3 and TMTC4 have been experimentally shown to be components of a new O-mannosylation pathway. Their own mannosyl-transferase activity has been suspected but their actual enzymatic potential has not been demonstrated yet. So far, sequence analysis of TMTCs has been compromised by evolutionary sequence divergence within their membrane-embedded N-terminal region, sequence inaccuracies in the protein databases and the difficulty to interpret the large functional variety of known homologous proteins (mostly sugar transferases and some with known 3D structure).


Structural modelling of the lumenal domain of human GPAA1, the metallo-peptide synthetase subunit of the transamidase complex, reveals zinc-binding mode and two flaps surrounding the active site

The transamidase complex is a molecular machine in the endoplasmic reticulum of eukaryotes that attaches a glycosylphosphatidylinositol (GPI) lipid anchor to substrate proteins after cleaving a C-terminal propeptide with a defined sequence signal. Its five subunits are very hydrophobic; thus, solubility, heterologous expression and complex reconstruction are difficult.


Hypocrisy Around Medical Patient Data: Issues of Access for Biomedical Research, Data Quality, Usefulness for the Purpose and Omics Data as Game Changer

Whether due to simplicity or hypocrisy, the question of access to patient data for biomedical research is widely seen in the public discourse only from the angle of patient privacy. At the same time, the desire to live and to live without disability is of much higher value to the patients. This goal can only be achieved by extracting research insight from patient data in addition to working on model organisms, something that is well understood by many patients.


Genomics-driven discovery of a biosynthetic gene cluster required for the synthesis of BII-Rafflesfungin from the fungus Phoma sp. F3723

BACKGROUND: Phomafungin is a recently reported broad spectrum antifungal compound but its biosynthetic pathway is unknown. We combed publicly available Phoma genomes but failed to find any putative biosynthetic gene cluster that could account for its biosynthesis.


Darkness in the Human Gene and Protein Function Space: Widely Modest or Absent Illumination by the Life Science Literature and the Trend for Fewer Protein Function Discoveries Since 2000

The mentioning of gene names in the body of the scientific literature 1901-2017 and their fractional counting was used as a proxy to assess the level of biological function discovery. We define a literature score of one as full publication equivalent (FPE), the amount of literature necessary to achieve one publication solely dedicated to a gene. We find that less than 5000 human genes have each at least 100 FPEs in the available literature corpus.


Function of a membrane-embedded domain evolutionarily multiplied in the GPI lipid anchor pathway proteins PIG-B, PIG-M, PIG-U, PIG-W, PIG-V, and PIG-Z

Distant homology relationships among proteins with many transmembrane regions (TMs) are difficult to detect as they are clouded by the TMs' hydrophobic compositional bias and mutational divergence in connecting loops. In the case of several GPI lipid anchor biosynthesis pathway components, the hidden evolutionary signal can be revealed with dissectHMMER, a sequence similarity search tool focusing on fold-critical, high complexity sequence segments.
