References
- Stephen F Altschul, Warren Gish, Webb Miller, Eugene W Myers, and
David J Lipman. Basic local alignment search tool. Journal of
molecular biology , 215(3):403–410, 1990.
- Stephen F Altschul, Warren Gish, Webb Miller, Eugene W Myers, and
David J Lipman. Basic local alignment search tool. Journal of
molecular biology , 215(3):403–410, 1990.
- Christian B Anfinsen. Principles that govern the folding of protein
chains. Science , 181(4096):223–230, 1973.
- Christian B Anfinsen. Principles that govern the folding of protein
chains. Science , 181(4096):223–230, 1973.
- Konstantin Berlin, Sergey Koren, Chen-Shan Chin, James P Drake, Jane M
Landolin, and Adam M Phillippy. Assembling large genomes with
single-molecule sequencing and locality-sensitive hashing.Nature biotechnology , 33(6):623, 2015.
- Guillaume Bernard, Cheong Xin Chan, Yao-ban Chan, Xin-Yi Chua, Yingnan
Cong, James M Hogan, Stefan R Maetschke, and Mark A Ragan.
Alignment-free inference of hierarchical and reticulate phylogenomic
relationships. Briefings in bioinformatics , 20(2):426–435,
2017.
- B Edwin Blaisdell. A measure of the similarity of sets of sequences
not requiring sequence alignment. Proceedings of the National
Academy of Sciences , 83(14):5155–5159, 1986.
- Oliver Bonham-Carter, Joe Steele, and Dhundy Bastola. Alignment-free
genetic sequence comparisons: a review of recent approaches by word
analysis. Briefings in bioinformatics , 15(6):890–905, 2013.
- Vibha Bafna Bora, Ashwin G Kothari, and Avinash G Keskar. Robust
automatic pectoral muscle segmentation from mammograms using texture
gradient and euclidean distance regression. Journal of digital
imaging , 29(1):115–125, 2016.
- Tolga Can and Y-F Wang. Ctss: a robust and efficient method for
protein structure alignment based on local geometrical and biological
features. In Computational systems bioinformatics. CSB2003.
Proceedings of the 2003 IEEE bioinformatics conference. CSB2003 ,
pages 169–179. IEEE, 2003.
- olga Can and Y-F Wang. Ctss: a robust and efficient method for protein
structure alignment based on local geometrical and biological
features. In Computational systems bioinformatics. CSB2003.
Proceedings of the 2003 IEEE bioinformatics conference. CSB2003 ,
pages 169–179. IEEE, 2003.
- Wah Chiu, Matthew L Baker, Wen Jiang, and Z Hong Zhou. Deriving folds
of macromolecular complexes through electron cryomicroscopy and
bioinformatics approaches. Current opinion in structural
biology , 12(2):263–269, 2002.
- Wah Chiu, Matthew L Baker, Wen Jiang, and Z Hong Zhou. Deriving folds
of macromolecular complexes through electron cryomicroscopy and
bioinformatics approaches. Current opinion in structural
biology , 12(2):263–269, 2002.
- Kuo-Chen Chou and Hong-Bin Shen. Recent progress in protein
subcellular location prediction. Analytical biochemistry ,
370(1):1, 2007.
- Kuo-Chen Chou. Prediction of human immunodeficiency virus protease
cleavage sites in proteins. Analytical biochemistry ,
233(1):1–14, 1996.
- Qi Dai, Yan Li, Xiaoqing Liu, Yuhua Yao, Yunjie Cao, and Pingan He.
Comparison study on statistical features of predicted secondary
structures for protein structural class prediction: From content to
position. BMC bioinformatics , 14(1):152, 2013.
- Jayanta Kumar Das, Provas Das, Korak Kumar Ray, Pabitra Pal Choudhury,
and Siddhartha Sankar Jana. Mathematical characterization of protein
sequences using patterns as chemical group combinations of amino
acids. PloS one , 11(12):e0167651, 2016.
- Burk A Dehority. Rumen microbiology , volume 372. Nottingham
University Press Nottingham, 2003.
- Shuyan Ding, Shengli Zhang, Yang Li, and Tianming Wang. A novel
protein structural classes prediction method based on predicted
secondary structure. Biochimie , 94(5):1166–1171, 2012.
- Ali El-Lakkani and Seham El-Sherif. Similarity analysis of protein
sequences based on 2d and 3d amino acid adjacency matrices.Chemical Physics Letters , 590:192–195, 2013.
- Moheb I Abo el Maaty, Mervat M Abo-Elkhier, and Marwa A Abd Elwahaab.
3d graphical representation of protein sequences and their statistical
characterization. Physica A: Statistical Mechanics and Its
Applications , 389(21):4668–4676, 2010.
- Joseph Felsenstein. Phylip (phylogeny inference package) version 3.6.
distributed by the author. http://www. evolution. gs.
washington. edu/phylip. html , 2004.
- Antara Ghosh and Soma Barman. Application of euclidean distance
measurement and principal component analysis for gene identification.Gene , 583(2):112–120, 2016.
- Charles Miller Grinstead and James Laurie Snell. Introduction to
probability . American Math. Soc., 2012.
- MK Gupta, R Niyogi, and M Misra. An alignment-free method to find
similarity among protein sequences via the general form of
chouâ\euro™s pseudo amino acid composition. SAR and QSAR in
Environmental Research , 24(7):597–609, 2013.
- Eugene Hamori and John Ruskin. H curves, a novel method of
representation of nucleotide series especially suited for long dna
sequences. Journal of Biological Chemistry , 258(2):1318–1327,
1983.
- Bernhard Haubold. Alignment-free phylogenetics and population
genetics. Briefings in bioinformatics , 15(3):407–418, 2013.
- Ping-an He, Jinzhou Wei, Yuhua Yao, and Zhixin Tie. A novel graphical
representation of proteins and its application. Physica A:
Statistical Mechanics and its Applications , 391(1-2):93–99, 2012.
- Ping-an He, Jinzhou Wei, Yuhua Yao, and Zhixin Tie. A novel graphical
representation of proteins and its application. Physica A:
Statistical Mechanics and its Applications , 391(1-2):93–99, 2012.
- Zhisong He, Jian Zhang, Xiao-He Shi, Le-Le Hu, Xiangyin Kong, Yu-Dong
Cai, and Kuo-Chen Chou. Predicting drug-target interaction networks
based on functional groups and biological features. PloS one ,
5(3):e9603, 2010.
- Tao Huang, Shen Niu, Zhongping Xu, Yun Huang, Xiangyin Kong, Yu-Dong
Cai, and Kuo-Chen Chou. Predicting transcriptional activity of
multiple site p53 mutants based on hybrid properties. PLoS One ,
6(8):e22940, 2011.
- Xiaoqiu Huang and Jinhui Zhang. Methods for comparing a dna sequence
with a protein sequence. Bioinformatics , 12(6):497–506, 1996.
- Le-Le Hu, Tao Huang, Yu-Dong Cai, and Kuo-Chen Chou. Prediction of
body fluids where proteins are secreted into based on protein
interaction network. PLoS One , 6(7):e22989, 2011.
- Chris A Kaiser, Monty Krieger, Harvey Lodish, and Arnold Berk.Molecular cell biology. WH Freeman, 2007.
- Katsuko Komatsu, Shu Zhu, Hirotoshi Fushimi, Tran Kim Qui, Shaoqing
Cai, and Shigetoshi Kadota. Phylogenetic analysis based on 18s rrna
gene and matk gene sequences of panax vietnamensis and five related
species. Planta medica , 67(05):461–465, 2001.
- Liang Kong, Lichao Zhang, and Jinfeng Lv. Accurate prediction of
protein structural classes by incorporating predicted secondary
structure information into the general form of chou’s pseudo amino
acid composition. Journal of Theoretical Biology , 344:12–18,
2014.
- Sudhir Kumar, Glen Stecher, and Koichiro Tamura. Mega7: molecular
evolutionary genetics analysis version 7.0 for bigger datasets.Molecular biology and evolution , 33(7):1870–1874, 2016.
- Tian Liu and Cangzhi Jia. A high-accuracy protein structural class
prediction algorithm using predicted secondary structural information.Journal of theoretical biology , 267(3):272–275, 2010.
- Bi-Qing Li, Tao Huang, Lei Liu, Yu-Dong Cai, and Kuo-Chen Chou.
Identification of colorectal cancer related genes with mrmr and
shortest path in protein-protein interaction network. PloS one ,
7(4):e33393, 2012.
- Bi-Qing Li, Le-Le Hu, Shen Niu, Yu-Dong Cai, and Kuo-Chen Chou.
Predict and analyze s-nitrosylation modification sites with the mrmr
and ifs approaches. Journal of Proteomics , 75(5):1654–1665,
2012.
- Chun Li, Lili Xing, Xin Wang, et al. 2-d graphical representation of
protein sequences and its application to coronavirus phylogeny.BMB Rep , 41(3):217–222, 2008.
- Chun Li, Lili Xing, Xin Wang, et al. 2-d graphical representation of
protein sequences and its application to coronavirus phylogeny.BMB Rep , 41(3):217–222, 2008.
- Yushuang Li, Tian Song, Jiasheng Yang, Yi Zhang, and Jialiang Yang. An
alignment-free algorithm in comparing the similarity of protein
sequences based on pseudo-markov transition probabilities among amino
acids. PloS one , 11(12):e0167430, 2016.
- Zengchao Mu, Jing Wu, and Yusen Zhang. A novel method for
similarity/dissimilarity analysis of protein sequences. Physica
A: Statistical Mechanics and its Applications , 392(24):6361–6366,
2013.
- Hasan H Otu and Khalid Sayood. A new sequence distance measure for
phylogenetic tree construction. Bioinformatics ,
19(16):2122–2130, 2003.
- William R Pearson. [5] rapid and sensitive sequence comparison
with fastp and fasta. 1990.
- Michal J Pietal, Janusz M Bujnicki, and Lukasz P Kozlowski. Gdfuzz3d:
a method for protein 3d structure reconstruction from contact maps,
based on a non-euclidean distance function. Bioinformatics ,
31(21):3499–3505, 2015.
- Luca Pinello, Giosue Lo Bosco, and Guo-Cheng Yuan. Applications of
alignment-free methods in epigenomics. Briefings in
Bioinformatics , 15(3):419–430, 2013.
- Dan Ralescu and Gregory Adams. The fuzzy integral. Journal of
Mathematical Analysis and Applications , 75(2):562–570, 1980.
- Jie Ren, Kai Song, Minghua Deng, Gesine Reinert, Charles H Cannon, and
Fengzhu Sun. Inference of markovian properties of molecular sequences
from ngs data and applications to comparative genomics.Bioinformatics , 32(7):993–1000, 2015.
- Ranjeet Kumar Rout, Pabitra Pal Choudhury, Santi Prasad Maity, BS Daya
Sagar, and Sk Sarif Hassan. Fractal and mathematical morphology in
intricate comparison between tertiary protein structures.Computer Methods in Biomechanics and Biomedical Engineering:
Imaging & Visualization , 6(2):192–203, 2018.
- Ajay Kumar Saw, Binod Chandra Tripathy, and Soumyadeep Nandi.
Alignment-free similarity analysis for protein sequences based on
fuzzy integral. Scientific reports , 9(1):2775, 2019.
- Ariya Shajii, Deniz Yorukoglu, Yun William Yu, and Bonnie Berger. Fast
genotyping of known snps through approximate k-mer matching.Bioinformatics , 32(17):i538–i544, 2016.
- Ping Wang, Lele Hu, Guiyou Liu, Nan Jiang, Xiaoyun Chen, Jianyong Xu,
Wen Zheng, Li Li, Ming Tan, Zugen Chen, et al. Prediction of
antimicrobial peptides based on sequence alignment and feature
selection methods. PloS one , 6(4):e18476, 2011.
- Leyi Wei, Minghong Liao, Xing Gao, and Quan Zou. An improved protein
structural classes prediction method by incorporating both sequence
and structure information. IEEE transactions on nanobioscience ,
14(4):339–349, 2014.
- Leyi Wei, Minghong Liao, Xing Gao, and Quan Zou. Enhanced protein fold
prediction method through a novel feature extraction technique.IEEE transactions on nanobioscience , 14(6):649–659, 2015.
- Zhi-Cheng Wu, Xuan Xiao, and Kuo-Chen Chou. 2d-mh: A web-server for
generating graphic representation of protein sequences based on the
physicochemical properties of their constituent amino acids.Journal of theoretical biology , 267(1):29–34, 2010.
- Jian-Yi Yang, Zhen-Ling Peng, and Xin Chen. Prediction of protein
structural classes for low-homology sequences based on predicted
secondary structure. BMC bioinformatics , 11(1):S9, 2010.
- Yu-Hua Yao, Qi Dai, Chun Li, Ping-An He, Xu-Ying Nan, and Yao-Zhou
Zhang. Analysis of similarity/dissimilarity of protein sequences.Proteins: Structure, Function, and Bioinformatics ,
73(4):864–871, 2008.
- Chenglong Yu, Rong L He, and Stephen S-T Yau. Protein sequence
comparison based on k-string dictionary. Gene , 529(2):250–256,
2013.
- Hong-Jie Yu and De-Shuang Huang. Novel 20-d descriptors of protein
sequences and it’s applications in similarity analysis. Chemical
Physics Letters , 531:261–266, 2012.
- Hong-Jie Yu and De-Shuang Huang. Novel 20-d descriptors of protein
sequences and it’s applications in similarity analysis. Chemical
Physics Letters , 531:261–266, 2012.
- Lichao Zhang, Xiqiang Zhao, and Liang Kong. Predict protein structural
class for low-similarity sequences by evolutionary difference
information into the general form of chou s pseudo amino acid
composition. Journal of theoretical biology , 355:105–110,
2014.
- Shengli Zhang, Shuyan Ding, and Tianming Wang. High-accuracy
prediction of protein structural class for low-similarity sequences
based on predicted secondary structure. Biochimie ,
93(4):710–714, 2011.
- Shengli Zhang, Yunyun Liang, and Xiguo Yuan. Improving the prediction
accuracy of protein structural class: Approached with alternating word
frequency and normalized lempel–ziv complexity. journal of
theoretical biology , 341:71–77, 2014.
- Shengli Zhang, Feng Ye, and Xiguo Yuan. Using principal component
analysis and support vector machine to predict protein structural
class for low-similarity sequences via pssm. Journal of
Biomolecular Structure and Dynamics , 29(6):1138–1146, 2012.
- Zhaojun Zhang and Wei Wang. Rna-skim: a rapid method for rna-seq
quantification at transcript level. Bioinformatics ,
30(12):i283–i292, 2014.
- Andrzej Zielezinski, Susana Vinga, Jonas Almeida, and Wojciech M
Karlowski. Alignment-free sequence comparison: benefits, applications,
and tools. Genome biology , 18(1):186, 2017.
Figure Legends
Figure 1. Phylogenetic-tree of different species of ND5 family generated
by the proposed method (cluster dendrogram using UPGMA distance method).
Figure 2. Phylogentic-tree of eight different species of ND6
generated by our proposed method (cluster dendrogram using UPGMA
distance method).
Figure 3. Shows the phylogentic tree of 10 different species of
G10 family generated by our proposed method (cluster dendrogram using
UPGMA distance method).
Figure 4 . Shows the phylogentic tree of 10 different species of
F11 family constructed by our proposed method (cluster dendrogram using
UPGMA distance method.
Figure 5. The correlation coefficients for nine ND5 proteins of
our method(cluster dendrogram using UPGMA distance method) and other
methods in [29, 62, 42, 4, 11]referring to ClustalW.
Figure 6 . The correlation coefficients for nine ND5 proteins of
our method (cluster dendrogram using UPGMA distance method) and other
methods in Ref [13] and Ref [34] referring to ClustalW.