Analytics of Biological Sequence Data

Translational Informatics Management System (TIMS): Towards OMICS based clinical data management for long term curation of clinical studies

With the maturation of sequencing technology over the past decade, the cost associated to an OMICS based clinical study is no longer a limiting factor even for large cohorts, e.g., the UK’s 100K genomes project (Samuel & Farsides, 2017). However, the real cost of such a study goes beyond sequencing or data generation in general (Muir et al., 2016); the amount of raw sequencing data per sample can be quite sizable and quickly amass to quite a collection even for a modest cohort in contrast to the array based technology that it has inevitably displaced.

Read

Hypocrisy Around Medical Patient Data: Issues of Access for Biomedical Research, Data Quality, Usefulness for the Purpose and Omics Data as Game Changer

Whether due to simplicity or hypocrisy, the question of access to patient data for biomedical research is widely seen in the public discourse only from the angle of patient privacy. At the same time, the desire to live and to live without disability is of much higher value to the patients. This goal can only be achieved by extracting research insight from patient data in addition to working on model organisms, something that is well understood by many patients.

Read

Molecular mechanism of the Escherichia coli AhpC in the function of a chaperone under heat-shock conditions

Peroxiredoxins (Prxs) are ubiquitous antioxidants utilizing a reactive cysteine for peroxide reduction and acting as a molecular chaperone under various stress conditions. Besides other stimulating factors, oxidative- and heat stress conditions trigger their ATP-independent chaperoning function.

Read

Function of a membrane-embedded domain evolutionarily multiplied in the GPI lipid anchor pathway proteins PIG-B, PIG-M, PIG-U, PIG-W, PIG-V, and PIG-Z

Distant homology relationships among proteins with many transmembrane regions (TMs) are difficult to detect as they are clouded by the TMs' hydrophobic compositional bias and mutational divergence in connecting loops. In the case of several GPI lipid anchor biosynthesis pathway components, the hidden evolutionary signal can be revealed with dissectHMMER, a sequence similarity search tool focusing on fold-critical, high complexity sequence segments.

Read

Finite-size effects in transcript sequencing count distribution: its power-law correction necessarily precedes downstream normalization and comparative analysis

Background: Though earlier works on modelling transcript abundance from vertebrates to lower eukaroytes have specifically singled out the Zip’s law, the observed distributions often deviate from a single power-law slope. In hindsight, while power-laws of critical phenomena are derived asymptotically under the conditions of infinite observations, real world observations are finite where the finite-size effects will set in to force a power-law distribution into an exponential decay and consequently, manifests as a curvature (i.e., varying exponent values) in a log-log plot.

Read