A novel fast vector method for genetic sequence comparison

Yongkun Li 1Department of Mathematical Sciences, Tsinghua University Lily He 1Department of Mathematical Sciences, Tsinghua University Rong Lucy He 2Department of Biological Sciences, Chicago State University Stephen S.-T. Yau(Corresponding author) 1Department of Mathematical Sciences, Tsinghua University

Data Analysis, Bio-Statistics, Bio-Mathematics mathscidoc:1903.42002

Scientific Reports, 1, (12226), 2017.9
With sharp increasing in biological sequences, the traditional sequence alignment methods become unsuitable and infeasible. It motivates a surge of fast alignment-free techniques for sequence analysis. Among these methods, many sorts of feature vector methods are established and applied to reconstruction of species phylogeny. The vectors basically consist of some typical numerical features for certain biological problems. The features may come from the primary sequences, secondary or three dimensional structures of macromolecules. In this study, we propose a novel numerical vector based on only primary sequences of organism to build their phylogeny. Three chemical and physical properties of primary sequences: purine, pyrimidine and keto are also incorporated to the vector. Using each property, we convert the nucleotide sequence into a new sequence consisting of only two kinds of letters. Therefore, three sequences are constructed according to the three properties. For each letter of each sequence we calculate the number of the letter, the average position of the letter and the variation of the position of the letter appearing in the sequence. Tested on several datasets related to mammals, viruses and bacteria, this new tool is fast in speed and accurate for inferring the phylogeny of organisms.
No keywords uploaded!
[ Download ] [ 2019-03-15 14:57:39 uploaded by hell16 ] [ 978 downloads ] [ 0 comments ]
@inproceedings{yongkun2017a,
  title={A novel fast vector method for genetic sequence comparison},
  author={Yongkun Li, Lily He, Rong Lucy He, and Stephen S.-T. Yau(Corresponding author)},
  url={http://archive.ymsc.tsinghua.edu.cn/pacm_paperurl/20190315145739799223200},
  booktitle={Scientific Reports},
  volume={1},
  number={12226},
  year={2017},
}
Yongkun Li, Lily He, Rong Lucy He, and Stephen S.-T. Yau(Corresponding author). A novel fast vector method for genetic sequence comparison. 2017. Vol. 1. In Scientific Reports. http://archive.ymsc.tsinghua.edu.cn/pacm_paperurl/20190315145739799223200.
Please log in for comment!
 
 
Contact us: office-iccm@tsinghua.edu.cn | Copyright Reserved