CANKOS Logo

CANKO SOCIETY FOR AI AND SOCIAL VALUE

A research network connecting artificial intelligence and social value

AI-Driven Protein–Ligand Interaction Analysis for Novel Drug Candidate Screening

Keywords: Protein–Ligand Interaction, Binding Affinity Prediction, Sequence-Based Machine Learning, XGBoost Regression, Drug Repositioning, Virtual Screening

Submission Type: Abstract

Status: Accepted | Submitted at: 2025-05-23 23:33:28

Abstract

This study proposes a lightweight AI-based affinity prediction framework that addresses the limitations of structure-dependent approaches for protein–ligand interaction prediction. Ligands were collected from DrugBank (n = 12,309) and encoded as 2048-dimensional Morgan fingerprints using RDKit. Protein targets were represented by 20-dimensional amino acid composition (AAC) vectors derived from their primary sequences. An XGBoost regression model was trained to predict binding affinity (pKi), and a two-stage prediction pipeline was constructed by training a secondary correction model on the residual between the predicted (s_pKi) and experimental (pKi) values. The corrected predictions (calibrated_s_pKi) achieved high accuracy with RMSE = 0.90, R² = 0.74, binary classification accuracy = 88%, and AUC = 0.948. In addition, SHAP (Shapley Additive Explanations) analysis revealed that specific amino acid features such as AAC_W (tryptophan) and AAC_H (histidine), as well as certain Morgan substructures, had a strong influence on model output. These findings suggest that amino acid composition and ligand substructure patterns contribute significantly to binding affinity. The proposed framework enables accurate and efficient prediction of protein–ligand interactions using sequence-based representations alone, without requiring 3D structural information. It can be readily applied to large-scale drug–target datasets and holds promise as a general-purpose tool for virtual screening and drug repositioning, complementing structure-based approaches such as GNN and AlphaFold.

Authors

  • Rackjune Baek (First Author), CKU Professor – rj100@hanmail.net

Comments

Please log in to comment.