Intro to Bioinformatics & Computational Biology (Week 1-2)
● What is bioinformatics? Where does data science/programming fit?
● Types of biological data (sequences, structures, clinical, omics, etc.)
● Examples of real-world bioinformatics applications (genomics, personalized medicine,
etc.)
Setting Up the Development Environment (External resources)
● Install Python/R
● Anaconda / Jupyter Notebook / RStudio setup
● Basics of Git & GitHub
● Datatypes and useful elements (video… list, dico, funcs, etc)
● Troubleshooting: Google/LLM
Introduction to the Command Line
● Why bioinformaticians use the command line
● Navigating files & running Python/R scripts
● Example: Downloading datasets using wget or curl
● Task: Write a simple bash script to list and count FASTA sequences in a folder
Phase 2: Core Learning & Practice (Weeks 3–5)
Goal: Hands-on Python/R/Bash bioinformatics tasks
Working with Biological Data in Python/R and EDA
● Libraries: pandas, numpy, tidyverse
● How to load and clean datasets (CSV, TSV, FASTA, VCF, BAM, etc.)
● Example: Cleaning a dataset
● Summary statistics (describe(), groupby(), ggplot)
● Visualizing distributions (boxplots, histograms)
● Task: Explore and Analyze a sample genome dataset
Biostatistics & Hypothesis Testing
● T-tests, ANOVA, chi-square in Python/R
● Task/Example: Compare variant frequency in diseased vs. healthy patients
Introduction to Machine Learning in Bioinformatics
● Overview of classification & regression in biological contexts
● Example: Predicting disease status based on gene expression
External links to supplement; e.g.,
● Pandas Official Documentation
● Bioconductor for R
● scikit-learn for beginners
Final Project description: Disease prediction with genome data
Build a simple model that predicts disease risk based on genomic variants. We’ll work with a
VCF file containing SNP data from a publicly available human genome dataset (e.g., 1000
Genomes Project, ClinVar, or gnomAD) and build a classification model to predict disease risk
based on pathogenic variants.
Project Completion & Showcase (Weeks 6–8)
Goal: Finish projects, polish GitHub repositories, and present findings
Project Peer Reviews & Feedback
● Interns review each other’s GitHub repos and give feedback
● Mentor feedback on best practices and improvements and show examples of good
project documentation
Final Presentation
● Interns present their projects in a short recorded video/live.

