List
of Stephen Yau's Publications in Bioinformatics
February 27th, 2026
I. Books
1.
Mathematical Principles in Bioinformatics (with Xin Zhao,
Kun Tian and Hongyu Yu), Interdisciplinary Applied Mathematics book series
(IAM, Vol. 58), Springer, January 2024.
II. Research
Papers
1.
Exploring potential transcription factors and their regulatory
relationships based on asymmetric covariance natural vector encoding method and
machine learning algorithms (with Guoqing Hu, Mengmeng Sang, Hao Wang, Jia
Ge, and Lin Xu), to appear, Briefings in Bioinformatics,
Vol. 27, No. 1(2026),
bbag044, 1-14.
2.
Asymmetric
Natural Vector Method for Predicting Ambiguous Nonstandard Base Codes (with Guoqing Hu, and HaoWang), to
appear, Communications in Information and Systems, 2026.
3.
Multi-perspective Natural Vector: A Novel Method for Viral
Sequence Feature Extraction (with Xiang Shi, Jiayi Kang, Nan Sun, and Xin
Zhao), Journal of Computational Biology, Vol. 33, No. 2 (2026), 255-266.
4.
scMFF: A Machine Learning Framework with Multiple Feature Fusion
Strategies for Cell Type Identification (with Nan Sun, Yu Wang, Xiang Shi,
Dengcheng Yang, and Rongling Wu), BMC Bioinformatics, Vol. 26, No. 277 (2025),
1-17.
5.
Energy entropy vector: a novel approach for efficient microbial
genomic sequence analysis and classification (with Hao Wang, and Guoqing
Hu), Briefings in Bioinformatics, Vol. 26, No. 5 (2025), bbaf459, 1-10.
6.
Reconstruction of masked sequences via inverse mapping of
incomplete information natural vectors (with Patrick Ding, Guoqing Hu, and
Hongyu Yu), PeerJ, Vol. 13(2025), e20126, DOI 10.7717/peerj.20126, 1-14.
7.
A new alignment‑free method: K‑mer Subsequence
Natural Vector (K‑mer SNV) for classification of fungi (with Lily
He*, Mochao Huang*, Gulinisha Yiming*, Yi Zhu*, Ruowei Liu, and Jinghan Chen),
BMC Bioinformatics, Vol. 26:170 (2025).
8.
Energy Entropy Vector: A Novel Approach for Efficient Microbial
Genomic Sequence Analysis and Classification (with Hao Wang, and Guoqing
Hu), Briefings in Bioinformatics, Vol. 26, No. 5 (2025), bbaf459.
9.
The grand biological universe: A comprehensive geometric
construction of genome space (with Hongyu Yu*, Nan Sun*, Ruohan Ren*, Tao
Zhou*, Mengcen Guan* and Leqi Zhao*), the Innovation, Vol. 6, No. 8(2025):
100937, 1-6.
10. Novel Natural
Vector with Asymmetric Covariance for Classifying Biological Sequences
(with Guoqing Hu*, Tao Zhou*, and Piyu Zhou), Gene, Vol. 962 (2025): 149532,
1-10.
11. Automated
recognition of chromosome fusion using an alignment-free natural vector method
(with Hongyu Yu), Frontiers in Genetics, section Computational Genomics, Vol.
15 (2024), 1364951, 1-10.
12. CAPE: a deep
learning framework with Chaos-Attention net for Promoter Evolution (with
Ruohan Ren*, Hongyu Yu*, Jiahao Teng, Sihui Mao, Zixuan Bian and Yangtianze
Tao), Briefings in Bioinformatics, Vol. 25, No. 5 (2024), 1-12.
13. Exploring
geometry of genome space via Grassmann manifolds (with Xiaoguang Li*, Tao
Zhou*, Xingdong Feng, and Shing-Tung Yau), The Innovation, Vol. 5, No. 5
(2024), 100677, 1-8.
14. New Virus Variant
Detection Based on the Optimal Natural Metric (with Hongyu Yu), Genes, Vol.
15:891 (2024), 1-12.
15.
A Novel Natural Graph for Efficient Clustering of Virus Genome
Sequences (with Harris Song*, Nan Sun*, and Wenping Yu), Current
Bioinformatics, Vol. 19 (2024), 687-703.
16.
The optimal metric for viral genome space (with Hongyu Yu),
Computational and Structural Biotechnology Journal, Vol. 23 (2024), 2083-2096.
17.
Geometric Analysis of SARS-CoV-2 Variants (with Mengcen Guan
and Nan Sun), Gene, Vol. 909 (2024) 148291, 1-11.
18. Quantitative
proteomics profiling reveals the inhibition of trastuzumab antitumor efficacy
by phosphorylated RPS6 in gastric carcinoma (with Chun-Ting Hu*, Shao-Jun
Pei*, Jing-Long Wang, Li-Dong Zu, Wei-Wei Shen, Lin Yuan, Feng Gao, Li-Ren
Jiang and Guo-Hui Fu), Cancer Chemotherapy and Pharmacology, Vol. 92 (2023),
341-355.
19. Utilizing the
codon adaptation index to evaluate the susceptibility to HIV-1 and SARS-CoV-2
related coronaviruses in possible target cells in humans (with Haoyu Zhou*
and Ruohan Ren*), Frontiers in Cellular and Infection Microbiology, section
Virus and Host, Vol. 12 : 1085397 (2023), 1-19.
20. Generating
Minimal Models of H1N1 NS1 Gene Sequences Using Alignment-based and
Alignment-free Algorithms (with Meng Fang*, Jiawei Xu* and Nan Sun), Genes,
Vol. 14 :186 (2023), 1-11.
21. Classification of
Protein Sequences by a Novel Alignment-Free Method on Bacterial and Virus
Families (with Mengcen Guan and Leqi Zhao), Genes, Vol. 13:1744 (2022),
1-12.
22. In-depth
Investigation of the Point Mutation Pattern of HIV-1 (with Nan Sun),
Frontiers in Cellular and Infection Microbiology-Extra-intestinal Microbiome,
Vol. 12 :1033481 (2022), 1-11.
23. An efficient
numerical representation of genome sequence: natural vector with covariance
component (with Nan Sun and Xin Zhao), Peer J., Vol..
10 :e13544
(2022), 1-23.
24. Biomolecular
Topology: Modelling and Analysis (with Jian Liu, Ke-Lin Xia, Jie Wu and
Guo-Wei Wei), Acta Mathematica Sinica, Vol. 38, No. 10 (2022), 1901-1938.
25. Nucleotide Amino
Acid K-mer Vector: An alignment-free method for comparing genomic sequences
(with Xiaona Bao, Lily He and Jingan Cui), Communications in Information and
Systems, Vol. 22, No. 3 (2022), 317-337.
26. kmer2vec: a novel
method for comparing DNA sequences by word2vec embedding (with Ruohan Ren
and Changchuan Yin), Journal of Computational Biology, Vol. 29, No. 9 (2022),
1001-1021.
27. New Genome
Sequence Detection via Natural Vector Convex Hull Method (with Ruzhang Zhao
and Shaojun Pei), IEEE/ACM Transactions on Computational Biology and
Bioinformatics, Vol. 19, No. 3 (2022), 1782-1793.
28. Full chromosomal
relationships between populations and the origin of humans (with Rui Dong,
Shaojun Pei, Mengcen Guan, Shek-Chung Yau, Changchuan Yin and Rong L He),
Frontiers in Genetics, Vol. 12 (2022), 1-10.
29. Identification of
HIV rapid mutations using differences in nucleotide distribution over time
(with Nan Sun and Jie Yang), Genes, Vol. 13, No. 170 (2022), 1-15.
30. Determination of
the nucleotide or amino acid composition of genome or protein sequences by
using natural vector method and convex hull principle (with Xiaopei Jiao*,
Shaojun Pei*, Zeju Sun and Jiayi Kang), Fundamental Research, Vol. 1 (2021),
559-564.
31. Inverted Repeats
in Coronavirus SARS-CoV-2 Genome and Implications in Evolution (with
Changchuan Yin), Communications in Information and Systems, Vol. 21, No. 1
(2021), 125-145.
32.
Geometric construction of viral genome space and its
applications (with Nan Sun*, Shaojun Pei*, Lily He, Changchuan Yin, Rong
Lucy He), Computational and Structural Biotechnology Journal, Vol. 19 (2021),
4226-4234.
33.
Analysis of the Genomic Distance between Bat Coronavirus RaTG13
and SARS-COV-2 Reveals Multiple Origins of Covid-19 (with Shaojun Pei),
Acta Mathematica Scientia, Vol. 41, No. 3 (2021), 1017-1022.
34. Amino acid
torsion angles enable prediction of protein fold classification (with Kun
Tian* and Xin Zhao*), Scientific Reports, 10:21733, 2020, 1-8.
35. Classification of
genomic components and prediction of genes of Begomovirus based on subsequence
natural vector and support vector machine (with Shaojun Pei, Rui Dong,
Yiming Bao and Rong He), Peer J, DOI 10.7717/peerj.9625, 2020, 1-15.
36. A new method
based on coding sequence density to cluster bacteria (with Nan Sun*, Rui
Dong*, Shaojun Pei and Changchuan Yin), Journal of Computational Biology, Vol.
127, No. 12 (2020), 1688-1698.
37. A novel numerical
representation for proteins: three-dimensional Chaos Game Representation and
its Extended Natural Vector (with Zeju Sun*, Shaojun Pei* and Rong Lucy
He), Computational and Structural Biotechnology Journal, Vol. 18 (2020),
1904-1913.
38. Positional
Correlation Natural Vector: A Novel Method for Genome Comparison (with Lily
He, Rui Dong and Rong Lucy He), International Journal of Molecular Sciences,
Vol. 21:3859 (2020), 1-19.
39. Analysis of the
hosts and transmission paths of SARS-CoV-2 in the COVID-19 outbreak (with
Rui Dong*, Shaojun Pei*, Changchuan Yin and Rong Lucy He), Genes, Vol. 11:637
(2020), 1-16.
40. A novel
alignment-free method for HIV-1 subtype classification (with Lily He*, Rui
Dong* and Rong Lucy He), Infection, Genetics and Evolution, Vol. 77 (2020),
1-11.
41. Splice sites
detection using chaos game representation and neural network (with Tung
Hoang and Changchuan Yin), Genomics, Vol. 112, No. 2 (2020), 1847-1852.
42. Fast and accurate
genome comparison using genome images: the Extended Natural Vector (with
Shaojun Pei*, Wenhui Dong*, Xiuqiong Chen and Rong Lucy He), Molecular Phylogenetics
and Evolution, Vol. 141 (2019), 1-7.
43. A Novel Approach to Clustering Genome Sequences Using
Inter-Nucleotide Covariance (with
Rui Dong, Lily He and Rong Lucy He), Frontiers in Genetics, section
Bioinformatics and Computational Biology, Vol. 10 (2019), 1-12.
44. Convex hull principle for classification and phylogeny of
eukaryotic proteins (with
Xin Zhao*, Kun Tian* and Rong L. He), Genomics, Vol. 111, No. 6 (2019),
1777-1784.
45. Phylogenetic analysis of protein sequences based on a novel
k-mer natural vector method (with
Yuyan Zhang and Jia Wen), Genomics, Vol. 111, No. 6 (2019), 1298-1305.
46. Comparing protein structures and inferring functions with a
novel three-dimensional Yau-Hausdorff method (with Kun Tian*, Xin Zhao* and
Yuning Zhang), Journal of Biomolecular Structure and Dynamics, Vol. 37, No. 16
(2019), 4115-4160.
47.
Large-scale genome comparison based on cumulative Fourier power
and phase spectra: central moment and covariance vector (with Shaojun Pei,
Rui Dong and Rong Lucy He), Computational and Structural Biotechnology Journal,
Vol. 17 (2019), 982-994.
48.
Protein Sequence Classification Using Natural Vector and Convex
Hull Method (with Yi Wang and Kun Tian), Journal of Computational Biology,
Vol. 26, No. 4 (2019), 315-321.
49. Virus classification based on Q-vectors (with Hui Zheng, Jie Yang and Rong
L. He), Communications in Information and Systems, Vol. 19, No. 1 (2019),
81-94.
50. Whole genome single nucleotide polymorphism genotyping of
Staphylococcus aureus (with
Changchuan Yin), Communications in Information and Systems, Vol. 19, No. 1
(2019), 57-80.
51. Assessment of kmer degeneration method for complicated genomes
(with Shuai Liu*, Shaojun
Pei* and Qi Wu), Communications in Information and Systems, Vol. 19, No. 1
(2019), 17-35.
52. A new efficient method for analyzing fungi species using
correlations between nucleotides (with Xin Zhao and Kun
Tian), BMC Evolutionary Biology, Vol. 18:200 (2018), 1-13.
53. Convex hull analysis
of evolutionary and phylogenetic relationships between biological groups (with Kun Tian* and Xin Zhao*),
Journal of Theoretical Biology, Vol. 456, No. 7 (2018), 34-40.
54. A new method
to cluster genomes based on cumulative Fourier power spectrum (with Rui Dong*, Ziyue Zhu*,
Changchuan Yin, and Rong L. He), Gene, Vol. 673 (2018), 239-250.
55. A novel fast
vector method for genetic sequence comparison (with Yongkun Li, Lily He and Rong L. He), Scientific
Reports, Vol. 7 (2017), 1-11.
56. Virus Database
and Online Inquiry System Based on Natural Vectors (with Rui Dong, Hui Zheng, Kun
Tian, Shek-Chung Yau, Weiguang Mao, Wenping Yu, Changchuan Yin, Chenglong Yu,
Rong L. He and Jie Yang), Evolutionary Bioinformatics, Vol. 13 (2017), 1-7.
57. Establishing
the phylogeny of Prochlorococcus with a new alignment-free method (with Xin Zhao*, Kun Tian* and
Rong L. He), Ecology and Evolution, Vol. 7 (2017), 11057-11065.
58. A novel
alignment-free vector method to cluster protein sequences (with Lily He*, Yongkun Li* and
Rong L. He), Journal of Theoretical Biology, Vol. 427 (2017), 41-52.
59. A coevolution
analysis for identifying protein-protein interactions by Fourier transform (with Changchuan Yin), PLoS ONE,
Vol. 12, No. 4 (2017), 1-19.
60. An
information-based network approach for protein classification (with Xiaogeng Wan and Xin Zhao),
PLoS ONE, Vol. 12, No. 3 (2017), 1-21.
61. Zika and
Flaviviruses Phylogeny Based on the Alignment-Free Natural Vector Method (with Yongkun Li, Lily He and Rong
L. He), DNA and Cell Biology, Vol. 36, No. 2 (2017), 109-116.
62. Numerical
encoding of DNA sequences by chaos game representation with application in
similarity comparison
(with Tung Hoang and Changchuan Yin), Genomics, Vol. 108 (2016), 134-142.
63. Virus
classification in 60-dimensional protein space (with Yongkun Li, Kun Tian,
Changchuan Yin and Rong L. He), Molecular Phylogenetics and Evolution, Vol. 99
(2016), 53-62.
64. A New Method
for Studying the Evolutionary Origin of the SAR11 Clade Marine Bacteria (with Xin Zhao, Xiaogeng Wan and
Rong L. He), Molecular Phylogenetics and Evolution, Vol. 98 (2016), 271-279.
65. Two
Dimensional Yau-Hausdorff Distance with Applications on Comparison of DNA and
Protein Sequences
(with Kun Tian, Xiaoqian Yang, Qin Kong, Changchuan Yin, and Rong L. He), PLos
ONE, DOI:10.1371/journal.pone.0136577 (2015), 1-19.
66. Ebolaviruses
Classification Based on Natural Vectors (with Hui Zheng, Changchuan Yin, Tung Hoang, Rong
Lucy He and Jie Yang), DNA and Cell Biology, Vol. 34, No. 6 (2015), 1-11.
67. An improved
model for whole genome phylogenetic analysis by Fourier transform (with Changchuan Yin), Journal of
Theoretical Biology, Vol. 382 (2015), 99-110.
68. A new method
to cluster DNA sequences using Fourier power spectrum (with Tung Hoang, Changchuan Yin,
Hui Zheng, Chenglong Yu and Rong Lucy He), Journal of Theoretical Biology, Vol.
372 (2015), 135-145.
69. Distinguishing
Proteins From Arbitrary Amino Acid Sequences (with Wei-Guang Mao*, Max Benson,
and Rong Lucy He), Scientific Reports, Vol. 5 (2015), 1-8.
70. Global
comparison of multiple-segmented viruses in 12-dimensional Genome space (with Hsin-Hsiung Huang, Chenglong
Yu, Hui Zheng; Troy Hernandez, Shek-Chung Yau; Rong Lucy He, and Jie Yang),
Molecular Phylogenetics and Evolution, Vol. 81 (2014), 29-36.
71. K-mer sparse
matrix model for genetic sequence and its applications in sequence comparison (with Jia Wen and YuYan Zhang),
Journal of Theoretical Biology, Vol. 363 (2014), 145-150.
72. A measure of
DNA sequence similarity by Fourier Transform with applications on hierarchical
clustering (with
Changchuan Yin, and Ying Chen), Journal of Theoretical Biology, Vol. 359
(2014), 18-28.
73. DFA7, A New
Method to Distinguish between Intron-containing and Intronless Genes (with Chenglong Yu, Mo Deng, Lu
Zheng, Rong Lucy He, and Jie Yang ), PLoS ONE, Vol. 9,
No. 7 (2014), 1-10.
74. K-mer natural
vector and its application to the phylogenetic analysis of genetic sequences (with Wen Jia, Raymond H.F. Chan,
Shek-Chung Yau, Rong L. He), Gene, Vol. 546, No. 1 (2014), 25-34.
75. Viral genome
phylogeny based on Lempel-Ziv complexity and Hausdorff distance (with Chenglong Yu, Rong Lucy He),
Journal of Theoretical Biology, Vol. 348 (2014), 12-20.
76. Protein
sequence comparison based on K-string dictionary (with Chenglong Yu and Rong L.
He), Gene, Vol. 529 (2013), 250-256.
77. Denoising the
3-Base Periodicity Walks of DNA Sequences in Gene Finding (with Changchuan Yin and Dongchul
Yoo), Journal of Medical and Bioengineering, Vol. 2, No. 2 (2013), 80-83.
78. Real Time
Classification of Viruses in 12 Dimensions (with Chenglong Yu, Troy Hernandez, Hui Zheng,
Shek-Chung Yau, Hsin-Hsiung Huang, Rong Lucy He and Jie Yang), PLoS ONE, Vol.
8, No. 5 (2013), e64528, 1-10.
79. Protein space:
A natural method for realizing the nature of protein universe (with Chenglong Yu, Mo Deng,
Shiu-Yuen Cheng, Shek-Chung Yau, Rong L. He), Journal of Theoretical Biology,
Vol. 318 (2013), 197-204.
80. Protein map:
An alignment-free sequence comparison method based on various properties of
amino acids (with
Chenglong Yu, ShiuYuen Cheng, Rong L. He), Gene, Vol. 486 (2011), 110-118.
81. A novel
clustering method via nucleotide-based Fourier power spectrum analysis (with Bo Zhao and Victor Duan),
Journal of Theoretical Biology, Vol. 279 (2011), 83-89.
82. A new
distribution vector and its application in genome clustering (with Bo Zhao and Rong L. He),
Molecular Phylogenetics and Evolution, Vol. 59 (2011), 438-443.
83. A Novel Method
of Characterizing Genetic Sequences: genome space with biological distance and
applications (with
Mo Deng, Chenglong Yu, Qian Liang, Rong L. He), PLoS One, Vol. 6, Issue 3,
e17293, March 2011, 1-9.
84. DNA sequence
comparison by a novel probabilistic method (with Chenglong Yu and Mo Deng), Information Science,
Vol. 181 (2011), 1484-1492.
85. A novel
construction of genome space with biological geometry (with Chenglong Yu, Qian Liang,
Changchuan Yin, Rong He), DNA Research, Vol. 17 (2010), 155-168.
86. A rapid method
for characterization of protein relatedness using feature vectors (with Kareem Carr, Eleanor Murray,
Ebeneger Armah, Rong He), PLoS ONE, Vol. 5, issue 3 (2010), e9950, 1-10.
87. Coding region
prediction based on a universal DNA sequence representative method (with Xianyang Jiang and Dominique
Lavenier), Journal of Computational Biology, Vol. 15, No. 10 (2008), 1237-1256.
88. A protein map
and its application
(with Chenglong Yu and Rong He), DNA and Cell Biology, Vol. 27, No. 5 (2008),
241-250.
89. Prediction of
Protein Coding Regions By the 3-Base Periodicity Analysis of a DNA Sequence (with ChangChuan Yin), Journal of
Theoretical Biology, Vol. 247 (2007), 687-694.
90. Survey on index based homology search algorithms (with Xianyang Jiang, Peiheng
Zhang and Xinchun Liu), Journal of Supercomputing, Vol. 40, No. 2 (2007),
185-212.
91. Prediction of
Primate Splice Site Using Inhomogeneous Markov Chain and Neural Network (with Libin Liu and Yee-Kin Ho),
DNA and Cell Biology, Vol. 26, No. 7 (2007), 477-483.
92. Clustering DNA
sequences by features vectors
(with Libin Liu and Yee-kin Ho), Molecular Phylogenetics and Evolution, Vol. 41
(2006), 64-69.
93. A Fourier
characteristic of coding sequences: Origins and a non-Fourier approximation (with Changchuan Yin), Journal of
Computation Biology, Vol. 12, No. 9 (2005), 1153-1165.
94. DNA sequence
representation without degeneracy
(with Jiasong Wang, Amir Niknejad, Chaoxiao Lu, Ning Jin and Yee-Kin Ho),
Nucleic Acids Research, Vol. 31, No. 12 (2003), 3078-3080.
III.
Part Lectures
1.
Yau's slide in the conference "first
Symposium of Geometry and Statistics" in Beijing, China, on July 29-31,
2023.
2.
Yau's slide
in University of Notre Dame, USA, in 2022 October.