We propose to combine cepstrum and nonlinear time–frequency (TF) analysis
to study multiple component oscillatory signals with time-varying frequency and
amplitude and with time-varying non-sinusoidal oscillatory pattern. The concept of
cepstrum is applied to eliminate the wave-shape function influence on the TF analysis,
and we propose a new algorithm, named de-shape synchrosqueezing transform (deshape
SST). The mathematical model, adaptive non-harmonic model, is introduced
and the de-shape SST algorithm is theoretically analyzed. In addition to simulated
signals, several different physiological, musical and biological signals are analyzed to
illustrate the proposed algorithm.
Rui DongTsinghua UniversityHui ZhengThe University of Illinois at ChicagoKun TianTsinghua UniversityShek-Chung YauThe Hong Kong University of Science and TechnologyWeiguang MaoTsinghua UniversityWenping YuNankai UniversityChangchuan YinThe University of Illinois at ChicagoChenglong YuSouth Australian Health and Medical Research InstituteRong Lucy HeChicago State UniversityJie YangThe University of Illinois at ChicagoStephen S.-T YauTsinghua University
Data Analysis, Bio-Statistics, Bio-Mathematicsmathscidoc:1903.42004
We construct a virus database called VirusDB (http://yaulab.math.tsinghua.edu.cn/VirusDB/) and an online inquiry system to serve people who are interested in viral classification and prediction. The database stores all viral genomes, their corresponding natural vectors, and the classification information of the single/multiple-segmented viral reference sequences downloaded from National Center for Biotechnology Information. The online inquiry system serves the purpose of computing natural vectors and their distances based on submitted genomes, providing an online interface for accessing and using the database for viral classification and prediction, and back-end processes for automatic and manual updating of database content to synchronize with GenBank. Submitted genomes data in FASTA format will be carried out and the prediction results with 5 closest neighbors and their classifications will be returned by email. Considering the one-to-one correspondence between sequence and natural vector, time efficiency, and high accuracy, natural vector is a significant advance compared with alignment methods, which makes VirusDB a useful database in further research.
Comparing DNA and protein sequence groups plays an important role in biological evolutionary relationship research. Despite many methods available for sequence comparison, only a few can be used for group comparison. In this study, we propose a novel approach using convex hulls. We use statistical information contained within the sequences to represent each sequence as a point in high dimensional space. We find that the points belonging to one biological group are located in a different region of space than points belonging to other biological groups. To be more precise, the convex hull of the points from one group are disjoint from the convex hulls of points from other groups. This finding allows us to do phylogenetic analysis for groups in an efficient way. Five different theorems are presented for checking whether two convex hulls intersect or are disjoint. Test results for datasets related to HRV, HPV, Ebolavirus, PKC and protein phosphatase domains demonstrate that our method performs well and provides a new tool for studying group phylogeny. More significantly, the convex analysis presents a new way to search for sequences belonging to a biological group by examining points within the group’s convex hull.