Rui DongYau Mathematical Sciences Center, Tsinghua University, Beijing, China; Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, ChinaTaojun HuDepartment of Biostatistics, School of Public Health, Peking University, Beijing 100191, ChinaYunjun ZhangDepartment of Biostatistics, School of Public Health, Peking University, Beijing 100191, ChinaYang Li Chongqing School, University of Chinese Academy of Sciences, Chongqing 400020, ChinaXiao-Hua Zhou Department of Biostatistics, School of Public Health, Peking University, Beijing 100191, China; Beijing International Center for Mathematical Research, Peking University, Beijing 100191, China
Data Analysis, Bio-Statistics, Bio-Mathematicsmathscidoc:2204.42002
Omicron, the latest SARS-CoV-2 Variant of Concern (VOC), first appeared in Africa in November 2021. At present, the question of whether a new VOC will out-compete the currently predominant variant is important for governments seeking to determine if current surveillance strategies and responses are appropriate and reasonable. Based on both virus genomes and daily-confirmed cases, we compare the additive differences in growth rates and reproductive numbers (R_0) between VOCs and their predominant variants through a Bayesian framework and phylo-dynamics analysis. Faced with different variants, we evaluate the effects of current policies and vaccinations against VOCs and predominant variants. The model also predicts the date on which a VOC may become dominant based on simulation and real data in the early stage. The results suggest that the overall additive difference in growth rates of B.1.617.2 and predominant variants was 0.44 (95% confidence interval, 95% CI: −0.38, 1.25) in February 2021, and that the VOC had a relatively high R_0. The additive difference in the growth rate of BA.1 in the United Kingdom was 6.82 times the difference between Delta and Alpha, and the model successfully predicted the dominating process of Alpha, Delta and Omicron. Current vaccination strategies remain similarly effective against Delta compared to the previous variants. Our model proposes a reliable Bayesian framework to predict the spread trends of VOCs based on early-stage data, and evaluates the effects of public health policies, which may help us better prepare for the upcoming Omicron variant, which is now spreading at an unprecedented speed.
Forest above-ground biomass (AGB) can be estimated based on light detection and ranging (LiDAR) point clouds. This paper introduces an accurate and detailed quantitative structure model (AdQSM), which can estimate the AGB of large tropical trees. AdQSM is based on the reconstruction of 3D tree models from terrestrial laser scanning (TLS) point clouds. It represents a tree as a set of closed and complete convex polyhedra. We use AdQSM to model 29 trees of various species (total 18 species) scanned by TLS from three study sites (the dense tropical forests of Peru, Indonesia, and Guyana). The destructively sampled tree geometry measurement data is used as reference values to evaluate the accuracy of diameter at breast height (DBH), tree height, tree volume, branch volume, and AGB estimated from AdQSM. After AdQSM reconstructs the structure and volume of each tree, AGB is derived by combining the wood density of the specific tree species from destructive sampling. The AGB estimation from AdQSM and the post-harvest reference measurement data show a satisfying agreement. The coefficient of variation of root mean square error (CV-RMSE) and the concordance correlation coefficient (CCC) are 20.37% and 0.97, respectively. AdQSM provides accurate tree volume estimation, regardless of the characteristics of the tree structure, without major systematic deviations. We compared the accuracy of AdQSM and TreeQSM in modeling the volume of 29 trees. The tree volume from AdQSM is compared with the reference value, and the determination coefficient (R2), relative bias (rBias), and CV-RMSE of tree volume are 0.96, 6.98%, and 22.62%, respectively. The tree volume from TreeQSM is compared with the reference value, and the R2, relative Bias (rBias), and CV-RMSE of tree volume are 0.94, −9.69%, and 23.20%, respectively. The CCCs between the volume estimates based on AdQSM, TreeQSM, and the reference values are 0.97 and 0.96. AdQSM also models the branches in detail. The volume of branches from AdQSM is compared with the destructive measurement reference data. The R2, rBias, and CV-RMSE of the branches volume are 0.97, 12.38%, and 36.86%, respectively. The DBH and height of the harvested trees were used as reference values to test the accuracy of AdQSM’s estimation of DBH and tree height. The R2, rBias, and CV-RMSE of DBH are 0.94, −5.01%, and 9.06%, respectively. The R2, rBias, and CV-RMSE of the tree height were 0.95, 1.88%, and 5.79%, respectively. This paper provides not only a new QSM method for estimating AGB based on TLS point clouds but also the potential for further development and testing of allometric equations.
We propose to combine cepstrum and nonlinear time–frequency (TF) analysis
to study multiple component oscillatory signals with time-varying frequency and
amplitude and with time-varying non-sinusoidal oscillatory pattern. The concept of
cepstrum is applied to eliminate the wave-shape function influence on the TF analysis,
and we propose a new algorithm, named de-shape synchrosqueezing transform (deshape
SST). The mathematical model, adaptive non-harmonic model, is introduced
and the de-shape SST algorithm is theoretically analyzed. In addition to simulated
signals, several different physiological, musical and biological signals are analyzed to
illustrate the proposed algorithm.
Xiaojie QiuWhitehead Institute for Biomedical Research, Cambridge, MA, USAYan ZhangDepartment of Computational and System Biology, University of Pittsburgh, Pittsburgh, PA, USAJorge D. Martin-RufinoBroad Institute of MIT and Harvard, Cambridge, MA, USAChen WengWhitehead Institute for Biomedical Research, Cambridge, MA, USAShayan HosseinzadehDepartment of Molecular and Cell Biology, University of California, Berkeley, CA, USAJianhua XingDepartment of Computational and System Biology, University of Pittsburgh, Pittsburgh, PA, USAJonathan WeissmanWhitehead Institute for Biomedical Research, Cambridge, MA, USA
Data Analysis, Bio-Statistics, Bio-Mathematicsmathscidoc:2202.42001
Single-cell (sc)RNA-seq, together with RNA velocity and metabolic labeling, reveals cellular states and transitions at unprecedented resolution. Fully exploiting these data, however, requires kinetic models capable of unveiling governing regulatory functions. Here, we introduce an analytical framework dynamo (https://github.com/aristoteleo/dynamo-release), which infers absolute RNA velocity, reconstructs continuous vector fields that predict cell fates, employs differential geometry to extract underlying regulations, and ultimately predicts optimal reprogramming paths and perturbation outcomes. We highlight dynamo’s power to overcome fundamental limitations of conventional splicing-based RNA velocity analyses to enable accurate velocity estimations on a metabolically labeled human hematopoiesis scRNA-seq dataset. Furthermore, differential geometry analyses reveal mechanisms driving early megakaryocyte appearance and elucidate asymmetrical regulation within the PU.1-GATA1 circuit. Leveraging the least-action-path method, dynamo accurately predicts drivers of numerous hematopoietic transitions. Finally, in silico perturbations predict cell-fate diversions induced by gene perturbations. Dynamo, thus, represents an important step in advancing quantitative and predictive theories of cell-state transitions.
Data-based detection and quantification of causation in complex, nonlinear dynamical systems is of paramount importance to science, engineering, and beyond. Inspired by the widely used methodology in recent years, the cross-map-based techniques, we develop a general framework to advance towards a comprehensive understanding of dynamical causal mechanisms, which is consistent with the natural interpretation of causality. In particular, instead of measuring the smoothness of the cross-map as conventionally implemented, we define causation through measuring the scaling law for the continuity of the investigated dynamical system directly. The uncovered scaling law enables accurate, reliable, and efficient detection of causation and assessment of its strength in general complex dynamical systems, outperforming those existing representative methods. The continuity scaling-based framework is rigorously established and demonstrated using datasets from model complex systems and the real world.
Persistent homology is constrained to purely topological persistence, while multiscale graphs account only for geometric information. This work introduces persistent spectral theory to create a unified low-dimensional multiscale paradigm for revealing topological persistence and extracting geometric shapes from high-dimensional datasets. For a point-cloud dataset, a filtration procedure is used to generate a sequence of chain complexes and associated families of simplicial complexes and chains, from which we construct persistent combinatorial Laplacian matrices. We show that a full set of topological persistence can be completely recovered from the harmonic persistent spectra, that is, the spectra that have zero eigenvalues, of the persistent combinatorial Laplacian matrices. However, non-harmonic spectra of the Laplacian
matrices induced by the filtration offer another powerful tool for data analysis, modeling, and prediction. In this work, fullerene stability is predicted by using both harmonic spectra and non-harmonic persistent spectra, while the latter spectra are successfully devised to analyze the structure of fullerenes and model protein flexibility, which cannot be straightforwardly extracted from the current persistent homology. The proposed method is found to provide excellent predictions of the protein B-factors for which current popular biophysical models break down.
Rui DongTsinghua UniversityHui ZhengThe University of Illinois at ChicagoKun TianTsinghua UniversityShek-Chung YauThe Hong Kong University of Science and TechnologyWeiguang MaoTsinghua UniversityWenping YuNankai UniversityChangchuan YinThe University of Illinois at ChicagoChenglong YuSouth Australian Health and Medical Research InstituteRong Lucy HeChicago State UniversityJie YangThe University of Illinois at ChicagoStephen S.-T YauTsinghua University
Data Analysis, Bio-Statistics, Bio-Mathematicsmathscidoc:1903.42004
We construct a virus database called VirusDB (http://yaulab.math.tsinghua.edu.cn/VirusDB/) and an online inquiry system to serve people who are interested in viral classification and prediction. The database stores all viral genomes, their corresponding natural vectors, and the classification information of the single/multiple-segmented viral reference sequences downloaded from National Center for Biotechnology Information. The online inquiry system serves the purpose of computing natural vectors and their distances based on submitted genomes, providing an online interface for accessing and using the database for viral classification and prediction, and back-end processes for automatic and manual updating of database content to synchronize with GenBank. Submitted genomes data in FASTA format will be carried out and the prediction results with 5 closest neighbors and their classifications will be returned by email. Considering the one-to-one correspondence between sequence and natural vector, time efficiency, and high accuracy, natural vector is a significant advance compared with alignment methods, which makes VirusDB a useful database in further research.