My name is Michael Shiferaw, and I am a PhD student specializing in cancer biology. I hold Bachelor’s and Master’s degrees in Biochemistry and Cell and Molecular Biology. I have a very strong research background in biological research. I really love combining my expertise in cell and molecular biology with my love for computers (self-taught for the most part). My research is centered on understanding how cells adapt and survive following radiation exposure, a critical issue, as radiation therapy is one of the most common treatments for cancer.
Cells respond to radiation in complex and dynamic ways. Decoding these responses can provide insights into why some tumors resist treatment or recur, and ultimately lead to more effective cancer therapies. To investigate these mechanisms, I apply machine learning (ML) to large-scale biological data—particularly protein data from resources like the UniProt database.
My work focuses on predicting protein subcellular localization, a key factor in determining how proteins function and interact. Mislocalized proteins are often associated with disease, including cancer, and understanding their spatial distribution within cells can reveal patterns of dysfunction or adaptation in response to therapy.
To build predictive models, I perform thorough data processing and exploratory analysis (EDA), utilize dimensionality reduction techniques such as principal component analysis (PCA), and implement a range of machine learning algorithms including decision trees, support vector machines (SVM), naïve Bayes classifiers, regression models, and ensemble methods. I also explore unsupervised learning through clustering to detect underlying structure in the data.
In addition, I’m incorporating advanced transformer-based architectures and deep learning techniques to push the boundaries of protein localization prediction. While many existing models rely solely on primary amino acid sequences, I aim to integrate richer features—such as physicochemical properties, disorder predictions, and interaction networks—to enhance prediction accuracy.
I’ve addressed some of these limitations and proposed improvements in my recent short publication, which is available here:
https://www.academia.edu/126831346/Protein_Localization_Prediction
By combining machine learning with cancer biology, I hope to uncover the molecular patterns that underlie treatment resistance and cellular adaptation. My long-term goal is to contribute to the development of more personalized, data-driven cancer therapies that improve patient outcomes.
All code developed for this project located under ML projects tab is open-source and available under the MIT License. It is free to use, modify, and distribute for educational, research, and non-commercial purposes. I encourage others to build upon this work to advance the intersection of machine learning and cancer biology.