Kejue Jia

Kejue Jia

Position
  • Postdoc Research Associate
Kejue earned his Ph.D. from Iowa state university in Bioinformatics and Computational Biology with minors in Statistics and Applied Mathematics. He is primarily a computational biologist, with a focus in protein sequence matching, protein evolution, and RNA structure prediction.

Contact

Contact Info

4014 Molecular Bio
2437 Pammel Dr.
Ames
,
IA
50011-1079

Education

  • B.S., Computer Science, Beijing University of Technology
  • M.S., Computer Science, San Diego State University
  • Ph.D., Bioinformatics and Computational Biology, Iowa State University

Research

Sequence Matching

Sequence matching lies at the very upper end in many general computational analysis pipelines. The accuracy of sequence matching determines the quality of any subsequent analyses. In Kejue's Ph.D. research, he incorporated the structural information into the amino acid substitution matrix derivation and significantly improved the accuracy of sequence matching. Especially for "twilight zone" sequences, the new substitution matrix achieves major gains in the agreement between the sequence matching and the structure alignment (see below).

Sequence matching Figure

Protein Evolution and High Order Sequence Correlations

The protein sequence correlation reflects the evolutionary dependences among residue sites. Kejue aims to push the limit of dependence detection methods from paired correlations to higher-order correlations that are more natural for a highly packed molecule such as protein. This project is motivated by the immediate needs and long-standing challenges of revealing and comprehending the complex dependences within protein structures (see below).

Higher-order correlation figure

RNA Structure Prediction

In Kejue's Postdoc study, he has also extended his studies to include RNA. In this work, Kejue is working on establishing the connection between RNA sequences, dynamics, and its alternative conformations (see below). This project will immediately yield improved predicted protein and RNA structures and their co-structures, as well as a deeper understanding of RNA evolution.

The strongly correlated nucleotide pairs in the sequence correlation matrix and the dynamics correlation matrix are highly similar pattern.

Machine Learning

Kejue also embraces newly developed Artificial Intelligence approaches. He and one of the lab members Mesih Kilinc together developed a fast and accurate protein homolog detection tool based on the protein sequence language model embeddings. The tool finds homologs with known functions for 800 uncharacterized human proteins and has confirmed these to be similar from predicted structures from Aphafold.

Software

During his studies, Kejue has built highly parallelizable pipeline management software that he uses for his research. This software package allows him to perform intensive computational tasks upon huge datasets (usually hundreds of Gbs at a time). The software is open-source and available for download at https://github.com/jkjium/contactGroups.