Open to Research Collaboration

Erfan
Zohrabi

 |

MSc Bioinformatics student at the University of Bologna, decoding the language of life through machine learning, LLMs, and computational biology. Bridging the gap between biology and AI — one sequence at a time.

Erfan Zohrabi
5+ Repos
3 Papers
96% Best ML Acc.
01 — Research Interests

What I Explore

🤖
LLMs in Genomics

Applying large language models to DNA sequence annotation, functional prediction, and uncovering latent patterns in large-scale biological datasets.

🧬
Protein Language Models

Exploring ESM, ProtTrans and AlphaFold integration for protein structure prediction and understanding protein–ligand interactions.

🔬
Deep Learning for Genomics

Building CNNs and Transformer architectures for DNA/RNA sequence classification, promoter analysis, and cancer genomics.

🕸️
Graph Neural Networks

Applying GNNs to protein–protein interaction networks, molecular property prediction, and computational biology graphs (CS224W, Stanford).

🧫
Multi-Omics Integration

Fusing genomics, transcriptomics, and proteomics data — including single-cell & spatial omics — to model biological systems holistically.

⚗️
AI + CRISPR & Drug Discovery

Using AI to identify optimal gene-editing targets, improve CRISPR precision, and mine biomedical text for novel drug candidates.

02 — Featured Work

Recent Projects

// 001
DNA Sequence Classification for Breast Cancer Prediction

Classified promoter DNA sequences using SVM, Neural Networks, KNN, AdaBoost, and Naive Bayes. Achieved 96.3% accuracy with RBF-kernel SVM, further optimized via Particle Swarm Optimization.

PythonSVMNeural Networks PSOJupyterCancer Genomics
View on GitHub
// 002
ML Model for KCNB1 Gene Variant Pathogenicity

Random Forest model trained on ClinVar data to predict KCNB1 gene variant pathogenicity. Benchmarked against PolyPhen and SIFT in-silico tools using LOOCV and comprehensive performance analysis.

PythonRandom ForestClinVar PolyPhenSIFTLOOCV
View on GitHub
// 003
Signal Peptide Prediction in Protein Sequences

Predictive modeling pipeline for identifying signal peptides — critical for understanding protein secretion mechanisms and subcellular localization — using ML on protein sequence features.

PythonProtein MLSignal Peptides Sequence AnalysisBiopython
View on GitHub
// 004
DNA Methylation Analysis — Illumina Infinium Arrays

Statistical analysis of fluorescent intensity data and methylation statuses from Illumina arrays in R. Covers beta values, M-values, probe characteristics, and differential methylation visualization.

REpigenomicsIllumina Beta ValuesStatistical Modeling
View on GitHub
// 005
Profile HMM — Kunitz-type Protease Inhibitor Domain

Built a Profile Hidden Markov Model for the Kunitz-type protease inhibitor domain using HMMER and multiple sequence alignment — a rigorous structural bioinformatics pipeline.

PythonHMMERMSA pHMMDomain Annotation
View on GitHub
+
More Projects

Exploring LLMs in genomics, deep reinforcement learning for sequence alignment, and generative models for protein design.

View All on GitHub
03 — Technical Skills

Tools & Expertise

AI / Deep Learning
Deep Learning (PyTorch / TF) 90%
Transformer Architectures 82%
LLMs / Generative Models 80%
Graph Neural Networks 75%
Bioinformatics
Sequence Analysis / Biopython 92%
Multi-Omics Integration 83%
Variant Analysis / ClinVar 78%
Structural Bioinformatics / HMM 74%
Programming
Python 95%
R / RStudio 82%
JavaScript / Web 70%
🤖 AI & Machine Learning
PyTorch TensorFlow Scikit-learn 🤗 Transformers Keras XGBoost NumPy Pandas Matplotlib Seaborn
🧬 Bioinformatics Tools
Biopython HMMER BLAST MEGA11 Scanpy AlphaFold ESM / ProtTrans DNABERT Illumina Arrays ClinVar PolyPhen / SIFT
💻 Dev & Research Stack
Python R SQL JavaScript PHP Jupyter PyCharm RStudio Git LaTeX EndNote
📐 Methods & Algorithms
Transformers / Attention CNN / RNN / LSTM Random Forest SVM (RBF/Linear) PSO Optimization Bayesian Modeling LOOCV Graph Neural Nets VAE / GAN Seq Alignment (RL)
04 — Academic Background

Education & Certifications

2023 — Present
MSc in Bioinformatics
University of Bologna, Italy
Focused on the intersection of AI and computational biology.
  • Machine Learning for Bioinformatics
  • Deep Learning in Genomics
  • High-Throughput Sequencing Data Analysis
  • Structural Bioinformatics
  • Multi-Omics Data Integration
  • Statistical Models for Biology
Graduated
BSc in Cellular & Molecular Biology
University of Damghan | GPA: 17.87 / 20
Strong foundation in molecular biology, genetics, and biostatistics with a focus on computational approaches.
  • Bioinformatics & Sequence Alignment
  • Molecular Biology & Gene Expression
  • Biostatistics & Hypothesis Testing
  • Programming for Bioinformatics (Python)
Online Certificates
🤖
DeepLearning.AI × AWS — Apr 2024
Generative AI with Large Language Models
🧠
Neuromatch Academy
Deep Learning Course
🕸️
Stanford University
CS224W: Machine Learning with Graphs
🎲
Stanford University
CS236: Deep Generative Models
🐍
Harvard × EdX — Oct 2024
Using Python for Research
🧬
Johns Hopkins (Coursera)
Python for Genomic Data Science
05 — Experience

Research & Work

Research & Teaching
University of Damghan
Applications of Python in Bioinformatics
Research Author — Published in Journal of Ghin
Led research on leveraging Python and Biopython for complex bioinformatics tasks. Presented at the University of Damghan conference — focusing on sequence alignment and genomic analysis workflows.
Published — Journal of Ghin
Cancer Cell Cycle Research
Research Author
Led research on cancer cell cycle mechanisms in Breast Cancer and Testicular Cancer, exploring cell cycle regulation and contributing to cancer biology understanding.
Damghan University
Teaching Assistant — EB101 Bioinformatics
Elements of Bioinformatics
  • Taught introductory bioinformatics for biology dept.
  • Ran Python programming workshops
  • Trained students in Biopython sequence alignment
  • Mentored in genomic data analysis
Industry Experience
Tehran, Iran
Web Developer
OmicsCo — Bioinformatic Tech Company
Developed the company website with integrated bioinformatics tools. Implemented digital marketing strategies and enhanced online visibility for bioinformatics services.
Tehran, Iran
Webmaster & Marketing CEO
Karaphile — Biotechnology Startup
Managed multiple e-commerce platforms. Led marketing team, brand development, and negotiated strategic partnerships with suppliers for this biotech startup.
Tehran, Iran
SEO Manager
SBP Company (Safa Bazar Pars)
Led SEO strategies resulting in significant web traffic growth. Modernized digital marketing processes and optimized website structure and user experience.
Freelance
Developer & SEO Specialist
Independent Contractor
Developed custom software using Python and R. Provided SEO optimization services and enhanced clients' online presence.
06 — Publications

Academic Publications

2022
Applications of Python Programming in Bioinformatics (Use of Biopython)
Journal of Ghin — University of Damghan Conference
2021
Cancer Cell Cycle in Breast Cancer and Testicular Cancer
Journal of Ghin
2020
Targeted Drug Delivery for Cancer Treatment
Journal of Ghin
Let's work together

Get in Touch

Open to research collaborations, PhD opportunities, and conversations about AI in genomics, LLMs in biology, or any exciting project at the intersection of computation and life.

Erfanzohrabi.ez@gmail.com