A novel alignment-free vector method to cluster protein sequences

Lily He Department of Mathematical Sciences, Tsinghua University Yongkun Li Department of Mathematical Sciences, Tsinghua University Rong Lucy He Department of Biological Sciences, Chicago State University Stephen S.-T. Yau(Corresponding author) Department of Mathematical Sciences, Tsinghua University

Data Analysis, Bio-Statistics, Bio-Mathematics mathscidoc:1903.42001

Journal of Theoretical Biology, 427, 41-52, 2017.6
Classification of protein are crucial topics in biology. The number of protein sequences stored in databases increases sharply in the past decade. Traditionally, comparison of protein sequences is usually carried out through multiple sequence alignment methods. However, these methods may be unsuitable for clustering of protein sequences when gene rearrangements occur such as in viral genomes. The computation is also very time-consuming for large datasets with long genomes. In this paper, based on three important bio- chemical properties of amino acids: the hydropathy index, polar requirement and chemical composition of the side chain, we propose a 24 dimensional feature vector describing the composition of amino acids in protein sequences. Our method not only utilizes the chemical properties of amino acids but also counts on their numbers and positions. The results on beta-globin, mammals, and three virus datasets show that this new tool is fast and accurate for classifying proteins and inferring the phylogeny of organisms.
phylogeny, biochemical properties, vector, alignment-free
[ Download ] [ 2019-03-15 14:50:00 uploaded by hell16 ] [ 1030 downloads ] [ 0 comments ]
@inproceedings{lily2017a,
  title={A novel alignment-free vector method to cluster protein sequences},
  author={Lily He, Yongkun Li, Rong Lucy He, and Stephen S.-T. Yau(Corresponding   author)},
  url={http://archive.ymsc.tsinghua.edu.cn/pacm_paperurl/20190315145001074562199},
  booktitle={Journal of Theoretical Biology},
  volume={427},
  pages={41-52},
  year={2017},
}
Lily He, Yongkun Li, Rong Lucy He, and Stephen S.-T. Yau(Corresponding author). A novel alignment-free vector method to cluster protein sequences. 2017. Vol. 427. In Journal of Theoretical Biology. pp.41-52. http://archive.ymsc.tsinghua.edu.cn/pacm_paperurl/20190315145001074562199.
Please log in for comment!
 
 
Contact us: office-iccm@tsinghua.edu.cn | Copyright Reserved