## Biomedical Data Science

Since 2016 I have been developing and teaching a 10 credits (20 hours) course on analysis of biomedical data using the R statistical software as part of the MSc in Operational Research (with Data Science) and the MSc in Statistics.

The course covers the following topics during 5 lectures (10 hours in total):

### Introduction to biomedical data

- Typical research questions: association, causation, discovery and prediction
- Types of biomedical data: routine data (consented and unconsented), phenotypic biomarkers, genetic data, derived data
- Identifying problems in real-world data
- Data cleaning, alignment, imputation and exploration
- Mechanisms of missing data

### Discovering associations

- Covariance and correlation
- Statistical inference and linear regression
- Solving the least squares problem
- Linear algebra considerations and collinearity
- Hypothesis testing
- Power considerations
- Assessing the fit of the model

### Logistic regression and predictive models

- Case-control studies
- Generalized linear models
- Logistic regression
- Odds ratio and interpretation of results
- Likelihood and model comparison
- Measures of discrimination and calibration performance
- Predictive models and cross-validation

### Biomarker discovery and high-dimensional datasets

- High-throughput data (proteomics, metabolomics, lipidomics, glycomics)
- Biomarkers and biomarker discovery
- Dimensionality reduction: clustering and PCA
- Multiple testing
- Subset selection approaches
- Penalised regression: LASSO, ridge regression, elastic nets

### Prediction from genetic data

- Causality, confounding and stratification
- Introduction to genetic data
- Genetic variation
- Genome-wide association studies
- GWAS meta-analysis
- Approaches for genotypic prediction and genetic risk scores

The course is accompanied by self-guided material to learn and practice how to perform analyses using R (10 hours in total):

### Lab 1: Introduction to R

- Interactive terminal and workspaces
- Object types and data structures
- Basic functions and operators

### Lab 2: Data preparation and linear regression

- Merging and simple imputations
- Statistical summaries and plots
- Writing functions and loops
- Fitting linear regression monels

### Lab 3: Logistic regression and predictive models

- Using R packages
- Fitting logistic regression models
- Making predictions on withdrawn data

### Lab 4: High-dimensional datasets

- Correlation plots and PCA
- Subset selection in R
- Regularisation approaches

### Lab 5: Prediction from genetic data

- Performing genome-wide association studies
- Computing genetic risk scores
- Prediction from genetic scores
- Performing a GWAS meta-analysis