MathSciDoc: An Archive for Mathematician ∫

Log In

Sign Up

Help

3 paper(s) uploaded by RuiDong.

Index	Title	Upload Time	Modified Time
1	Virus Database and Online Inquiry System Based on Natural Vectors Rui Dong Tsinghua University Hui Zheng The University of Illinois at Chicago Kun Tian Tsinghua University Shek-Chung Yau The Hong Kong University of Science and Technology Weiguang Mao Tsinghua University Wenping Yu Nankai University Changchuan Yin The University of Illinois at Chicago Chenglong Yu South Australian Health and Medical Research Institute Rong Lucy He Chicago State University Jie Yang The University of Illinois at Chicago Stephen S.-T Yau Tsinghua University Data Analysis, Bio-Statistics, Bio-Mathematics mathscidoc:1903.42004 Evolutionary Bioinformatics, 13, 1-7, 2017.10 [ Download ] [ 2019-03-18 10:27:40 uploaded by RuiDong ] [ 1917 downloads ] [ 0 comments ] Abstract × We construct a virus database called VirusDB (http://yaulab.math.tsinghua.edu.cn/VirusDB/) and an online inquiry system to serve people who are interested in viral classification and prediction. The database stores all viral genomes, their corresponding natural vectors, and the classification information of the single/multiple-segmented viral reference sequences downloaded from National Center for Biotechnology Information. The online inquiry system serves the purpose of computing natural vectors and their distances based on submitted genomes, providing an online interface for accessing and using the database for viral classification and prediction, and back-end processes for automatic and manual updating of database content to synchronize with GenBank. Submitted genomes data in FASTA format will be carried out and the prediction results with 5 closest neighbors and their classifications will be returned by email. Considering the one-to-one correspondence between sequence and natural vector, time efficiency, and high accuracy, natural vector is a significant advance compared with alignment methods, which makes VirusDB a useful database in further research. OK [ Abstract ] [ Full ] Please log in for comment!	2019-03-18 10:27:40	2019-03-18 10:34:55
2	A new method to cluster genomes based on cumulative Fourier power spectrum Rui Dong Tsinghua University Ziyue Zhu Tsinghua University Changchuan Yin University of Illinois at Chicago Rong L. He Chicago State University Stephen S.-T. Yau Tsinghua University Data Analysis, Bio-Statistics, Bio-Mathematics mathscidoc:1903.42005 Gene, 673, 2018.6 [ Download ] [ 2019-03-18 10:32:15 uploaded by RuiDong ] [ 1876 downloads ] [ 0 comments ] Abstract × Analyzing phylogenetic relationships using mathematical methods has always been of importance in bioinformatics. Quantitative research may interpret the raw biological data in a precise way. Multiple Sequence Alignment (MSA) is used frequently to analyze biological evolutions, but is very time-consuming. When the scale of data is large, alignment methods cannot finish calculation in reasonable time. Therefore, we present a new method using moments of cumulative Fourier power spectrum in clustering the DNA sequences. Each sequence is translated into a vector in Euclidean space. Distances between the vectors can reflect the relationships between sequences. The mapping between the spectra and moment vector is one-to-one, which means that no information is lost in the power spectra during the calculation. We cluster and classify several datasets including Influenza A, primates, and human rhinovirus (HRV) datasets to build up the phylogenetic trees. Results show that the new proposed cumulative Fourier power spectrum is much faster and more accurately than MSA and another alignment-free method known as k-mer. The research provides us new insights in the study of phylogeny, evolution, and efficient DNA comparison algorithms for large genomes. The computer programs of the cumulative Fourier power spectrum are available at GitHub (https://github.com/YaulabTsinghua/cumulative-Fourier-power-spectrum). OK [ Abstract ] [ Full ] Please log in for comment!	2019-03-18 10:32:15	-
3	A novel alignment-free method for HIV-1 subtype classification Lily He Tsinghua University Rui Dong Tsinghua University Rong Lucy He Chicago State University Stephen S.-T. Yau Tsinghua University Data Analysis, Bio-Statistics, Bio-Mathematics mathscidoc:2004.42001 Infection, Genetics and Evolution, 77, 104080, 2020.1 [ Download ] [ 2020-04-23 10:51:36 uploaded by RuiDong ] [ 1541 downloads ] [ 0 comments ] Abstract × HIV-1 is the most common and pathogenic strain of human immunodeficiency virus consisting of many subtypes. To study the difference among HIV-1 subtypes in infection, diagnosis and drug design, it is important to identify HIV-1 subtypes from clinical HIV-1 samples. In this work, we propose an effective numeric representation called Subsequence Natural Vector (SNV) to encode HIV-1 sequences. Using the representation, we introduce an improved linear discriminant analysis method to classify HIV-1 viruses correctly. SNV is based on distribution of nucleotides in HIV-1 viral sequences. It not only computes the number of nucleotides, but also describes the position and variance of nucleotides in viruses. To validate our alignment-free method, 6902 complete genomes and 11,668 pol gene sequences of HIV-1 subtypes were collected from the up-to-date Los Alamos HIV database. SNV outperforms the three popular methods, Kameris, Comet and REGA, with almost 100% Sensitivity and Specificity, also with much less time. Our subtyping algorithm especially works better for circulating recombinant forms (CRFs) consisting of a few sequences. Our approach is also powerful to separate unique recombinant forms (URFs) from other subtypes with 100% Sensitivity and Specificity. Moreover, phylogenetic trees based on SNV representation are constructed using full-length HIV-1 genomes and pol genes respectively, where viruses from the same subtype are clustered together correctly. OK [ Abstract ] [ Full ] Please log in for comment!	2020-04-23 10:51:36	-

1

Visit the profile of RuiDong

Contact us: office-iccm@tsinghua.edu.cn | Copyright Reserved