Cells respond to radiation in complex and dynamic ways. Decoding these responses can provide insights into why some tumors resist treatment or recur, and ultimately lead to more effective cancer therapies. To investigate these mechanisms, I apply machine learning (ML) to large-scale biological data—particularly protein data from resources like the UniProt database.

    My work focuses on predicting protein subcellular localization, a key factor in determining how proteins function and interact. Mislocalized proteins are often associated with disease, including cancer, and understanding their spatial distribution within cells can reveal patterns of dysfunction or adaptation in response to therapy.

    To build predictive models, I perform thorough data processing and exploratory analysis (EDA), utilize dimensionality reduction techniques such as principal component analysis (PCA), and implement a range of machine learning algorithms including decision trees, support vector machines (SVM), naïve Bayes classifiers, regression models, and ensemble methods. I also explore unsupervised learning through clustering to detect underlying structure in the data.

    In addition, I’m incorporating advanced transformer-based architectures and deep learning techniques to push the boundaries of protein localization prediction. While many existing models rely solely on primary amino acid sequences, I aim to integrate richer features—such as physicochemical properties, disorder predictions, and interaction networks—to enhance prediction accuracy.

    I’ve addressed some of these limitations and proposed improvements in my recent short publication, which is available here:
https://www.academia.edu/126831346/Protein_Localization_Prediction