References
  1. Stephen F Altschul, Warren Gish, Webb Miller, Eugene W Myers, and David J Lipman. Basic local alignment search tool. Journal of molecular biology , 215(3):403–410, 1990.
  2. Stephen F Altschul, Warren Gish, Webb Miller, Eugene W Myers, and David J Lipman. Basic local alignment search tool. Journal of molecular biology , 215(3):403–410, 1990.
  3. Christian B Anfinsen. Principles that govern the folding of protein chains. Science , 181(4096):223–230, 1973.
  4. Christian B Anfinsen. Principles that govern the folding of protein chains. Science , 181(4096):223–230, 1973.
  5. Konstantin Berlin, Sergey Koren, Chen-Shan Chin, James P Drake, Jane M Landolin, and Adam M Phillippy. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.Nature biotechnology , 33(6):623, 2015.
  6. Guillaume Bernard, Cheong Xin Chan, Yao-ban Chan, Xin-Yi Chua, Yingnan Cong, James M Hogan, Stefan R Maetschke, and Mark A Ragan. Alignment-free inference of hierarchical and reticulate phylogenomic relationships. Briefings in bioinformatics , 20(2):426–435, 2017.
  7. B Edwin Blaisdell. A measure of the similarity of sets of sequences not requiring sequence alignment. Proceedings of the National Academy of Sciences , 83(14):5155–5159, 1986.
  8. Oliver Bonham-Carter, Joe Steele, and Dhundy Bastola. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis. Briefings in bioinformatics , 15(6):890–905, 2013.
  9. Vibha Bafna Bora, Ashwin G Kothari, and Avinash G Keskar. Robust automatic pectoral muscle segmentation from mammograms using texture gradient and euclidean distance regression. Journal of digital imaging , 29(1):115–125, 2016.
  10. Tolga Can and Y-F Wang. Ctss: a robust and efficient method for protein structure alignment based on local geometrical and biological features. In Computational systems bioinformatics. CSB2003. Proceedings of the 2003 IEEE bioinformatics conference. CSB2003 , pages 169–179. IEEE, 2003.
  11. olga Can and Y-F Wang. Ctss: a robust and efficient method for protein structure alignment based on local geometrical and biological features. In Computational systems bioinformatics. CSB2003. Proceedings of the 2003 IEEE bioinformatics conference. CSB2003 , pages 169–179. IEEE, 2003.
  12. Wah Chiu, Matthew L Baker, Wen Jiang, and Z Hong Zhou. Deriving folds of macromolecular complexes through electron cryomicroscopy and bioinformatics approaches. Current opinion in structural biology , 12(2):263–269, 2002.
  13. Wah Chiu, Matthew L Baker, Wen Jiang, and Z Hong Zhou. Deriving folds of macromolecular complexes through electron cryomicroscopy and bioinformatics approaches. Current opinion in structural biology , 12(2):263–269, 2002.
  14. Kuo-Chen Chou and Hong-Bin Shen. Recent progress in protein subcellular location prediction. Analytical biochemistry , 370(1):1, 2007.
  15. Kuo-Chen Chou. Prediction of human immunodeficiency virus protease cleavage sites in proteins. Analytical biochemistry , 233(1):1–14, 1996.
  16. Qi Dai, Yan Li, Xiaoqing Liu, Yuhua Yao, Yunjie Cao, and Pingan He. Comparison study on statistical features of predicted secondary structures for protein structural class prediction: From content to position. BMC bioinformatics , 14(1):152, 2013.
  17. Jayanta Kumar Das, Provas Das, Korak Kumar Ray, Pabitra Pal Choudhury, and Siddhartha Sankar Jana. Mathematical characterization of protein sequences using patterns as chemical group combinations of amino acids. PloS one , 11(12):e0167651, 2016.
  18. Burk A Dehority. Rumen microbiology , volume 372. Nottingham University Press Nottingham, 2003.
  19. Shuyan Ding, Shengli Zhang, Yang Li, and Tianming Wang. A novel protein structural classes prediction method based on predicted secondary structure. Biochimie , 94(5):1166–1171, 2012.
  20. Ali El-Lakkani and Seham El-Sherif. Similarity analysis of protein sequences based on 2d and 3d amino acid adjacency matrices.Chemical Physics Letters , 590:192–195, 2013.
  21. Moheb I Abo el Maaty, Mervat M Abo-Elkhier, and Marwa A Abd Elwahaab. 3d graphical representation of protein sequences and their statistical characterization. Physica A: Statistical Mechanics and Its Applications , 389(21):4668–4676, 2010.
  22. Joseph Felsenstein. Phylip (phylogeny inference package) version 3.6. distributed by the author. http://www. evolution. gs. washington. edu/phylip. html , 2004.
  23. Antara Ghosh and Soma Barman. Application of euclidean distance measurement and principal component analysis for gene identification.Gene , 583(2):112–120, 2016.
  24. Charles Miller Grinstead and James Laurie Snell. Introduction to probability . American Math. Soc., 2012.
  25. MK Gupta, R Niyogi, and M Misra. An alignment-free method to find similarity among protein sequences via the general form of chouâ\euro™s pseudo amino acid composition. SAR and QSAR in Environmental Research , 24(7):597–609, 2013.
  26. Eugene Hamori and John Ruskin. H curves, a novel method of representation of nucleotide series especially suited for long dna sequences. Journal of Biological Chemistry , 258(2):1318–1327, 1983.
  27. Bernhard Haubold. Alignment-free phylogenetics and population genetics. Briefings in bioinformatics , 15(3):407–418, 2013.
  28. Ping-an He, Jinzhou Wei, Yuhua Yao, and Zhixin Tie. A novel graphical representation of proteins and its application. Physica A: Statistical Mechanics and its Applications , 391(1-2):93–99, 2012.
  29. Ping-an He, Jinzhou Wei, Yuhua Yao, and Zhixin Tie. A novel graphical representation of proteins and its application. Physica A: Statistical Mechanics and its Applications , 391(1-2):93–99, 2012.
  30. Zhisong He, Jian Zhang, Xiao-He Shi, Le-Le Hu, Xiangyin Kong, Yu-Dong Cai, and Kuo-Chen Chou. Predicting drug-target interaction networks based on functional groups and biological features. PloS one , 5(3):e9603, 2010.
  31. Tao Huang, Shen Niu, Zhongping Xu, Yun Huang, Xiangyin Kong, Yu-Dong Cai, and Kuo-Chen Chou. Predicting transcriptional activity of multiple site p53 mutants based on hybrid properties. PLoS One , 6(8):e22940, 2011.
  32. Xiaoqiu Huang and Jinhui Zhang. Methods for comparing a dna sequence with a protein sequence. Bioinformatics , 12(6):497–506, 1996.
  33. Le-Le Hu, Tao Huang, Yu-Dong Cai, and Kuo-Chen Chou. Prediction of body fluids where proteins are secreted into based on protein interaction network. PLoS One , 6(7):e22989, 2011.
  34. Chris A Kaiser, Monty Krieger, Harvey Lodish, and Arnold Berk.Molecular cell biology. WH Freeman, 2007.
  35. Katsuko Komatsu, Shu Zhu, Hirotoshi Fushimi, Tran Kim Qui, Shaoqing Cai, and Shigetoshi Kadota. Phylogenetic analysis based on 18s rrna gene and matk gene sequences of panax vietnamensis and five related species. Planta medica , 67(05):461–465, 2001.
  36. Liang Kong, Lichao Zhang, and Jinfeng Lv. Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of chou’s pseudo amino acid composition. Journal of Theoretical Biology , 344:12–18, 2014.
  37. Sudhir Kumar, Glen Stecher, and Koichiro Tamura. Mega7: molecular evolutionary genetics analysis version 7.0 for bigger datasets.Molecular biology and evolution , 33(7):1870–1874, 2016.
  38. Tian Liu and Cangzhi Jia. A high-accuracy protein structural class prediction algorithm using predicted secondary structural information.Journal of theoretical biology , 267(3):272–275, 2010.
  39. Bi-Qing Li, Tao Huang, Lei Liu, Yu-Dong Cai, and Kuo-Chen Chou. Identification of colorectal cancer related genes with mrmr and shortest path in protein-protein interaction network. PloS one , 7(4):e33393, 2012.
  40. Bi-Qing Li, Le-Le Hu, Shen Niu, Yu-Dong Cai, and Kuo-Chen Chou. Predict and analyze s-nitrosylation modification sites with the mrmr and ifs approaches. Journal of Proteomics , 75(5):1654–1665, 2012.
  41. Chun Li, Lili Xing, Xin Wang, et al. 2-d graphical representation of protein sequences and its application to coronavirus phylogeny.BMB Rep , 41(3):217–222, 2008.
  42. Chun Li, Lili Xing, Xin Wang, et al. 2-d graphical representation of protein sequences and its application to coronavirus phylogeny.BMB Rep , 41(3):217–222, 2008.
  43. Yushuang Li, Tian Song, Jiasheng Yang, Yi Zhang, and Jialiang Yang. An alignment-free algorithm in comparing the similarity of protein sequences based on pseudo-markov transition probabilities among amino acids. PloS one , 11(12):e0167430, 2016.
  44. Zengchao Mu, Jing Wu, and Yusen Zhang. A novel method for similarity/dissimilarity analysis of protein sequences. Physica A: Statistical Mechanics and its Applications , 392(24):6361–6366, 2013.
  45. Hasan H Otu and Khalid Sayood. A new sequence distance measure for phylogenetic tree construction. Bioinformatics , 19(16):2122–2130, 2003.
  46. William R Pearson. [5] rapid and sensitive sequence comparison with fastp and fasta. 1990.
  47. Michal J Pietal, Janusz M Bujnicki, and Lukasz P Kozlowski. Gdfuzz3d: a method for protein 3d structure reconstruction from contact maps, based on a non-euclidean distance function. Bioinformatics , 31(21):3499–3505, 2015.
  48. Luca Pinello, Giosue Lo Bosco, and Guo-Cheng Yuan. Applications of alignment-free methods in epigenomics. Briefings in Bioinformatics , 15(3):419–430, 2013.
  49. Dan Ralescu and Gregory Adams. The fuzzy integral. Journal of Mathematical Analysis and Applications , 75(2):562–570, 1980.
  50. Jie Ren, Kai Song, Minghua Deng, Gesine Reinert, Charles H Cannon, and Fengzhu Sun. Inference of markovian properties of molecular sequences from ngs data and applications to comparative genomics.Bioinformatics , 32(7):993–1000, 2015.
  51. Ranjeet Kumar Rout, Pabitra Pal Choudhury, Santi Prasad Maity, BS Daya Sagar, and Sk Sarif Hassan. Fractal and mathematical morphology in intricate comparison between tertiary protein structures.Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization , 6(2):192–203, 2018.
  52. Ajay Kumar Saw, Binod Chandra Tripathy, and Soumyadeep Nandi. Alignment-free similarity analysis for protein sequences based on fuzzy integral. Scientific reports , 9(1):2775, 2019.
  53. Ariya Shajii, Deniz Yorukoglu, Yun William Yu, and Bonnie Berger. Fast genotyping of known snps through approximate k-mer matching.Bioinformatics , 32(17):i538–i544, 2016.
  54. Ping Wang, Lele Hu, Guiyou Liu, Nan Jiang, Xiaoyun Chen, Jianyong Xu, Wen Zheng, Li Li, Ming Tan, Zugen Chen, et al. Prediction of antimicrobial peptides based on sequence alignment and feature selection methods. PloS one , 6(4):e18476, 2011.
  55. Leyi Wei, Minghong Liao, Xing Gao, and Quan Zou. An improved protein structural classes prediction method by incorporating both sequence and structure information. IEEE transactions on nanobioscience , 14(4):339–349, 2014.
  56. Leyi Wei, Minghong Liao, Xing Gao, and Quan Zou. Enhanced protein fold prediction method through a novel feature extraction technique.IEEE transactions on nanobioscience , 14(6):649–659, 2015.
  57. Zhi-Cheng Wu, Xuan Xiao, and Kuo-Chen Chou. 2d-mh: A web-server for generating graphic representation of protein sequences based on the physicochemical properties of their constituent amino acids.Journal of theoretical biology , 267(1):29–34, 2010.
  58. Jian-Yi Yang, Zhen-Ling Peng, and Xin Chen. Prediction of protein structural classes for low-homology sequences based on predicted secondary structure. BMC bioinformatics , 11(1):S9, 2010.
  59. Yu-Hua Yao, Qi Dai, Chun Li, Ping-An He, Xu-Ying Nan, and Yao-Zhou Zhang. Analysis of similarity/dissimilarity of protein sequences.Proteins: Structure, Function, and Bioinformatics , 73(4):864–871, 2008.
  60. Chenglong Yu, Rong L He, and Stephen S-T Yau. Protein sequence comparison based on k-string dictionary. Gene , 529(2):250–256, 2013.
  61. Hong-Jie Yu and De-Shuang Huang. Novel 20-d descriptors of protein sequences and it’s applications in similarity analysis. Chemical Physics Letters , 531:261–266, 2012.
  62. Hong-Jie Yu and De-Shuang Huang. Novel 20-d descriptors of protein sequences and it’s applications in similarity analysis. Chemical Physics Letters , 531:261–266, 2012.
  63. Lichao Zhang, Xiqiang Zhao, and Liang Kong. Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of chou s pseudo amino acid composition. Journal of theoretical biology , 355:105–110, 2014.
  64. Shengli Zhang, Shuyan Ding, and Tianming Wang. High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure. Biochimie , 93(4):710–714, 2011.
  65. Shengli Zhang, Yunyun Liang, and Xiguo Yuan. Improving the prediction accuracy of protein structural class: Approached with alternating word frequency and normalized lempel–ziv complexity. journal of theoretical biology , 341:71–77, 2014.
  66. Shengli Zhang, Feng Ye, and Xiguo Yuan. Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via pssm. Journal of Biomolecular Structure and Dynamics , 29(6):1138–1146, 2012.
  67. Zhaojun Zhang and Wei Wang. Rna-skim: a rapid method for rna-seq quantification at transcript level. Bioinformatics , 30(12):i283–i292, 2014.
  68. Andrzej Zielezinski, Susana Vinga, Jonas Almeida, and Wojciech M Karlowski. Alignment-free sequence comparison: benefits, applications, and tools. Genome biology , 18(1):186, 2017.
Figure Legends
Figure 1. Phylogenetic-tree of different species of ND5 family generated by the proposed method (cluster dendrogram using UPGMA distance method).
Figure 2. Phylogentic-tree of eight different species of ND6 generated by our proposed method (cluster dendrogram using UPGMA distance method).
Figure 3. Shows the phylogentic tree of 10 different species of G10 family generated by our proposed method (cluster dendrogram using UPGMA distance method).
Figure 4 . Shows the phylogentic tree of 10 different species of F11 family constructed by our proposed method (cluster dendrogram using UPGMA distance method.
Figure 5. The correlation coefficients for nine ND5 proteins of our method(cluster dendrogram using UPGMA distance method) and other methods in [29, 62, 42, 4, 11]referring to ClustalW.
Figure 6 . The correlation coefficients for nine ND5 proteins of our method (cluster dendrogram using UPGMA distance method) and other methods in Ref [13] and Ref [34] referring to ClustalW.