Tantoso E

About the dark corners in the gene function space of Escherichia coli remaining without illumination by scientific literature

Although Escherichia coli (E. coli) is the most studied prokaryote organism in the history of life sciences, many molecular mechanisms and gene functions encoded in its genome remain to be discovered. This work aims at quantifying the illumination of the E. coli gene function space by the scientific literature and how close we are towards the goal of a complete list of E. coli gene functions.

Read

To kill or to be killed: pangenome analysis of Escherichia coli strains reveals a tailocin specific for pandemic ST131

Escherichia coli is one of the most well-known commensal Gram-negative bacteria, which is commonly associated with the gut microbiome. Since first identified in 1844, it has been widely studied as a model organism in the laboratory. However, recent findings have shown not only the versatility of E. coli living in different ecological niches but also the diversity of its genotypes including strains with pathogenicity for animals and human [1, 2].

Read

Translational Informatics Management System (TIMS): Towards OMICS based clinical data management for long term curation of clinical studies

With the maturation of sequencing technology over the past decade, the cost associated to an OMICS based clinical study is no longer a limiting factor even for large cohorts, e.g., the UK’s 100K genomes project (Samuel & Farsides, 2017). However, the real cost of such a study goes beyond sequencing or data generation in general (Muir et al., 2016); the amount of raw sequencing data per sample can be quite sizable and quickly amass to quite a collection even for a modest cohort in contrast to the array based technology that it has inevitably displaced.

Read

Hypocrisy Around Medical Patient Data: Issues of Access for Biomedical Research, Data Quality, Usefulness for the Purpose and Omics Data as Game Changer

Whether due to simplicity or hypocrisy, the question of access to patient data for biomedical research is widely seen in the public discourse only from the angle of patient privacy. At the same time, the desire to live and to live without disability is of much higher value to the patients. This goal can only be achieved by extracting research insight from patient data in addition to working on model organisms, something that is well understood by many patients.

Read

Finite-size effects in transcript sequencing count distribution: its power-law correction necessarily precedes downstream normalization and comparative analysis

Background: Though earlier works on modelling transcript abundance from vertebrates to lower eukaroytes have specifically singled out the Zip’s law, the observed distributions often deviate from a single power-law slope. In hindsight, while power-laws of critical phenomena are derived asymptotically under the conditions of infinite observations, real world observations are finite where the finite-size effects will set in to force a power-law distribution into an exponential decay and consequently, manifests as a curvature (i.e., varying exponent values) in a log-log plot.

Read