HIV polyprotein Gag is increasingly found to contribute to protease inhibitor resistance. Despite its role in viral maturation and in developing drug resistance, there remain gaps in the knowledge of the role of certain Gag subunits (e.g. p6), and that of non-cleavage mutations in drug resistance. As p6 is flexible, it poses a problem for structural experiments, and is hence often omitted in experimental Gag structural studies.
Ligand binding pockets in proteins contain water molecules, which play important roles in modulating protein-ligand interactions. Available crystallographic data for the 5′ mRNA cap-binding pocket of the translation initiation factor protein eIF4E shows several structurally conserved waters, which also persist in molecular dynamics simulations. These waters engage an intricate hydrogen-bond network between the cap and protein.
Aggregation is an irreversible form of protein complexation and often toxic to cells. The process entails partial or major unfolding that is largely driven by hydration. We model the role of hydration in aggregation using ‘Dehydrons'. ‘Dehydrons' are unsatisfied backbone hydrogen bonds in proteins that seek shielding from water molecules by associating with ligands or proteins. We find that the residues at aggregation interfaces have hydrated backbones, and in contrast to other forms of protein-protein interactions, are under less evolutionary pressure to be conserved.
De novo clustering is a popular technique to perform taxonomic profiling of a microbial community by grouping 16S rRNA amplicon reads into operational taxonomic units (OTUs). In this work, we introduce a new dendrogram-based OTU clustering pipeline called CRiSPy. The key idea used in CRiSPy to improve clustering accuracy is the application of an anomaly detection technique to obtain a dynamic distance cutoff instead of using the de facto value of 97 percent sequence similarity as in most existing OTU
R-loop is the structure co-transcriptionally formed between nascent RNA transcript and DNA template, leaving the non-transcribed DNA strand unpaired. This structure can be involved in the hyper-mutation and dsDNA breaks in mammalian immunoglobulin (Ig) genes, oncogenes and neurodegenerative disease related genes. R-loops have not been studied at the genome scale yet. To identify the R-loops, we developed a computational algorithm and mapped R-loop forming sequences (RLFS) onto 66 803 sequences defined by UCSC as ‘known’ genes.