We present a new Matched Interface and Boundary (MIB) regularization method for treating charge singularity in solvated biomolecules whose electrostatics are described by the Poisson–Boltzmann (PB) equation. In a regularization method, by decomposing the potential function into two or three components, the singular component can be analytically represented by the Green’s function, while other components possess a higher regularity. Our new regularization combines the efficiency of two-component schemes with the accuracy of the three-component schemes. Based on this regularization, a new MIB finite difference algorithm is developed for solving both linear and nonlinear PB equations, where the nonlinearity is handled by using the inexact-Newton’s method. Compared with the existing MIB PB solver based on a three-component regularization, the present algorithm is simpler to implement by circumventing the work to solve a boundary value Poisson equation inside the molecular interface and to compute related interface jump conditions numerically. Moreover, the new MIB algorithm becomes computationally less expensive, while maintains the same second order accuracy. This is numerically verified by calculating the electrostatic potential and solvation energy on the Kirkwood sphere on which the analytical solutions are available and on a series of proteins with various sizes.
Shenggao ZhouSoochow UniversityR. G. WeissETH ZurichLi-Tien ChengUniversity of California, San DiegoJoachim DzubiellaUniversity of FreiburgJ. Andrew McCammonUniversity of California, San DiegoBo LiUniversity of California, San Diego
Numerical Analysis and Scientific ComputingData Analysis, Bio-Statistics, Bio-Mathematicsmathscidoc:2005.25001
Proceedings of the National Academy of Sciences of the United States of America, 116, (30), 14989–14994, 2019.7
Ligand-receptor binding and unbinding are fundamental biomolecular processes and particularly essential to drug efficacy. Environmental water fluctuations, however, impact the corresponding thermodynamics and kinetics and thereby challenge theoretical descriptions. Here, we devise a holistic, implicit-solvent, multi-method approach to predict the (un)binding kinetics for a generic ligand-pocket model. We use the variational implicit-solvent model (VISM) to calculate the solute-solvent interfacial structures and the corresponding free energies, and combine the VISM with the string method to obtain the minimum energy paths and transition states between the various metastable (“dry” and “wet”) hydration states. The resulting dry-wet transition rates are then used in a spatially-dependent multi-state continuous-time Markov chain Brownian dynamics simulations, and the related Fokker–Planck equation calculations, of the ligand stochastic motion, providing the mean first-passage times for binding and unbinding. We find the hydration transitions to significantly slow down the binding process, in semi-quantitative agreement with existing explicit-water simulations, but significantly accelerate the unbinding process. Moreover, our methods allow the characterization of non-equilibrium hydration states of pocket and ligand during the ligand movement, for which we find substantial memory and hysteresis effects for binding versus unbinding. Our study thus provides a significant step forward towards efficient, physics-based interpretation and predictions of the complex kinetics in realistic ligand-receptor systems.
Many cellular processes are governed by stochastic reaction events. These events do not necessarily occur in single steps of individual molecules, and, conversely, each birth or death of a macromolecule (e.g., protein) could involve several small reaction steps, creating a memory between individual events and thus leading to nonmarkovian reaction kinetics. Characterizing this kinetics is challenging. Here, we develop a systematic approach for a general reaction network with arbitrary intrinsic waiting-time distributions, which includes the stationary generalized chemical-master equation (sgCME), the stationary generalized Fokker–Planck equation, and the generalized linear-noise approximation. The first formulation converts a nonmarkovian issue into a markovian one by introducing effective transition rates (that explicitly decode the effect of molecular memory) for the reactions in an equivalent reaction network with the same substrates but without molecular memory. Nonmarkovian features of the reaction kinetics can be revealed by solving the sgCME. The latter 2 formulations can be used in the fast evaluation of fluctuations. These formulations can have broad applications, and, in particular, they may help us discover new biological knowledge underlying memory effects. When they are applied to generalized stochastic models of gene-expression regulation, we find that molecular memory is in effect equivalent to a feedback and can induce bimodality, fine-tune the expression noise, and induce switch.
Yujie YeDepartment of Biochemistry and Cellular and Molecular Biology, The University of Tennessee, Knoxville, Tennessee, United States of AmericaXin KangShanghai Center for Mathematical Sciences, Fudan University, Shanghai, ChinaJordan BaileyDepartment of Biochemistry and Cellular and Molecular Biology, The University of Tennessee, Knoxville, Tennessee, United States of AmericaChunhe LiShanghai Center for Mathematical Sciences, Fudan University, Shanghai, ChinaTian HongDepartment of Biochemistry and Cellular and Molecular Biology, The University of Tennessee, Knoxville, Tennessee, United States of America
Multistep cell fate transitions with stepwise changes of transcriptional profiles are common to many developmental, regenerative and pathological processes. The multiple intermediate cell lineage states can serve as differentiation checkpoints or branching points for channeling cells to more than one lineages. However, mechanisms underlying these transitions remain elusive. Here, we explored gene regulatory circuits that can generate multiple intermediate cellular states with stepwise modulations of transcription factors. With unbiased searching in the network topology space, we found a motif family containing a large set of networks can give rise to four attractors with the stepwise regulations of transcription factors, which limit the reversibility of three consecutive steps of the lineage transition. We found that there is an enrichment of these motifs in a transcriptional network controlling the early T cell development, and a mathematical model based on this network recapitulates multistep transitions in the early T cell lineage commitment. By calculating the energy landscape and minimum action paths for the T cell model, we quantified the stochastic dynamics of the critical factors in response to the differentiation signal with fluctuations. These results are in good agreement with experimental observations and they suggest the stable characteristics of the intermediate states in the T cell differentiation. These dynamical features may help to direct the cells to correct lineages during development. Our findings provide general design principles for multistep cell linage transitions and new insights into the early T cell development. The network motifs containing a large family of topologies can be useful for analyzing diverse biological systems with multistep transitions.
HIV-1 is the most common and pathogenic strain of human immunodeficiency virus consisting of many subtypes. To study the difference among HIV-1 subtypes in infection, diagnosis and drug design, it is important to identify HIV-1 subtypes from clinical HIV-1 samples. In this work, we propose an effective numeric representation called Subsequence Natural Vector (SNV) to encode HIV-1 sequences. Using the representation, we introduce an improved linear discriminant analysis method to classify HIV-1 viruses correctly. SNV is based on distribution of nucleotides in HIV-1 viral sequences. It not only computes the number of nucleotides, but also describes the position and variance of nucleotides in viruses. To validate our alignment-free method, 6902 complete genomes and 11,668 pol gene sequences of HIV-1 subtypes were collected from the up-to-date Los Alamos HIV database. SNV outperforms the three popular methods, Kameris, Comet and REGA, with almost 100% Sensitivity and Specificity, also with much less time. Our subtyping algorithm especially works better for circulating recombinant forms (CRFs) consisting of a few sequences. Our approach is also powerful to separate unique recombinant forms (URFs) from other subtypes with 100% Sensitivity and Specificity. Moreover, phylogenetic trees based on SNV representation are constructed using full-length HIV-1 genomes and pol genes respectively, where viruses from the same subtype are clustered together correctly.
Ting-Li ChenInstitute of Statistical Science, Academia SinicaDai-Ni HsiehInstitute of Statistical Science, Academia SinicaHung HungInstitute of Epidemiology and Preventive Medicine I-Ping TuInstitute of Statistical Science, Academia SinicaPei-Shien WuDept. of Biostatistics, Duke UniversityYi-Ming WuInstitute of Chemistry, Academia SinicaWei-Hau ChangInstitute of Chemistry, Academia SinicaSu-Yun HuangInstitute of Statistical Science, Academia Sinica
Statistics Theory and MethodsData Analysis, Bio-Statistics, Bio-Mathematicsmathscidoc:2004.33002
The Annals of Applied Statistics , 8, (1), 259-285, 2014
Cryo-electron microscopy (cryo-EM) has recently emerged as a powerful
tool for obtaining three-dimensional (3D) structures of biological macromolecules
in native states. A minimum cryo-EM image data set for deriving a
meaningful reconstruction is comprised of thousands of randomly orientated
projections of identical particles photographed with a small number of electrons.
The computation of 3D structure from 2D projections requires clustering,
which aims to enhance the signal to noise ratio in each view by grouping
similarly oriented images. Nevertheless, the prevailing clustering techniques
are often compromised by three characteristics of cryo-EM data: high noise
content, high dimensionality and large number of clusters. Moreover, since
clustering requires registering images of similar orientation into the same
pixel coordinates by 2D alignment, it is desired that the clustering algorithm
can label misaligned images as outliers. Herein, we introduce a clustering algorithm
γ-SUP to model the data with a q-Gaussian mixture and adopt the
minimum γ-divergence for estimation, and then use a self-updating procedure
to obtain the numerical solution. We apply γ-SUP to the cryo-EM images
of two benchmark macromolecules, RNA polymerase II and ribosome.
In the former case, simulated images were chosen to decouple clustering from
alignment to demonstrate γ-SUP is more robust to misalignment outliers than
the existing clustering methods used in the cryo-EM community. In the latter
case, the clustering of real cryo-EM data by our γ-SUP method eliminates
noise in many views to reveal true structure features of ribosome at the projection
Classification of DNA sequences is an important issue in the bioinformatics study, yet most existing methods for phylogenetic analysis including Multiple Sequence Alignment (MSA) are time-consuming and computationally expensive. The alignment-free methods are popular nowadays, whereas the manual intervention in those methods usually decreases the accuracy. Also, the interactions among nucleotides are neglected in most methods. Here we propose a new Accumulated Natural Vector (ANV) method which represents each DNA sequence by a point in R18. By calculating the Accumulated Indicator Functions of nucleotides, we can further find an Accumulated Natural Vector for each sequence. This new Accumulated Natural Vector not only can capture the distribution of each nucleotide, but also provide the covariance among nucleotides. Thus global comparison of DNA sequences or genomes can be done easily in R18. The tests of ANV of datasets of different sizes and types have proved the accuracy and time-efficiency of the new proposed ANV method.
Genome comparison is a vital research area of bioinformatics. For large-scale genome comparisons, the Multiple Sequence Alignment (MSA) methods have been impractical to use due to its algorithmic complexity. In this study, we propose a novel alignment-free method based on the one-to-one correspondence between a DNA sequence and its complete central moment vector of the cumulative Fourier power and phase spectra. In addition, the covariance between the four nucleotides in the power and phase spectra is included. We use the cumulative Fourier power and phase spectra to define a 28-dimensional vector for each DNA sequence. Euclidean distances between the vectors can measure the dissimilarity between DNA sequences. We perform testing with datasets of different sizes and types including simulated DNA sequences, exon-intron and complete genomes. The results show that our method is more accurate and efficient for performing hierarchical clustering than other alignment-free methods and MSA methods.
Next-generation sequencing technology enables the routine detection of bacterial pathogens for clinical diagnostics and genetic research. Whole-genome sequencing has been of importance in the epidemiologic analysis of bacterial pathogens. However, few whole-genome sequencing-based genotyping pipelines are available for practical applications. Here, we present the whole-genome sequencing-based single nucleotide polymorphism(SNP) genotyping method and apply to the evolutionary analysis of methicillin-resistant Staphylococcus aureus. The SNP genotyping method calls genome variants using next-generation sequencing reads of whole genomes and calculates the pair-wise Jaccard distances of the genome variants. The method may reveal the high-resolution whole-genome SNP profiles and the structural variants of different isolates of methicillin-resistant S. aureus(MRSA) and methicillin-susceptible S. aureus(MSSA) strains. The phylogenetic analysis of whole genomes and particular regions may monitor and track the evolution and the transmission dynamic of bacterial pathogens. The computer pro-
grams of the whole genome sequencing-based SNP genotyping methods are available to the public at https://github. com/