- Course pages
- Course overview
- Introduction to SLV
- Data Wrangling
- Data manipulation
- Basic analyses ([linear regression], correlation & t-test)
- Pipes
- Wrap-up
Supervised Learning and Visualization
If there is anything important - contact me!
The on-location lectures will not be recorded.
If you feel that you are stuck, ask your classmates, ask me, ask the other lecturers. Ask a lot! Ask questions during/after the lectures and in the Q&A sessions!
You can find all materials at the following location:
All three have a PhD in statistics and a ton of experience in development, data analysis and visualization.
Week # | Focus | Practical | Materials | Prof |
---|---|---|---|---|
1 | Data wrangling with R and the grammar of graphics |
tidyverse: filter(), select(), join(), pivot(), dbplyr , ggplot() : geoms , aesthetics , scales , themes |
R4DS | DG |
2 | Exploratory data analysis | Histograms , density plots , boxplots , etc. |
R4DS FIMD Ch1 | MC |
3 | Statistical learning: regression | lm() , glm() , knn() |
ISLR | DG |
4 | Statistical learning: classification | glm() , trees, lda() |
ISLR | EJvK |
5 | Classification model evaluation | prop.table() , pROC() , etc. |
ISLR | EJvK |
6 | Nonlinear models | R formulas advanced: I() , splines, e.g., bs() |
ISLR | MC |
7 | Bagging, boosting, random forest and support vector machines | randomforest , xgboost |
ISLR | MC |
8 | Benchmarking | mlr3verse |
ISLR mlr3-book | DG |
Each weak we have the following:
Twice we have:
Once we have:
We will form groups on Wednesday Sept 11!
Exploratory | Confirmatory | |
---|---|---|
Description | EDA; unsupervised learning | Correlation analysis |
Prediction | Supervised learning | Theoretical modeling |
Explanation | Visual mining | Causal inference |
Prescription | Personalised medicine | A/B testing |
Exploratory Data Analysis:
Describing interesting patterns: use graphs, summaries, to understand subgroups, detect anomalies, understand the data
Examples: boxplot, five-number summary, histograms, missing data plots, …
Supervised learning:
Regression: predict continuous labels from other values.
Examples: linear regression, generalized additive model, regression trees,…
Classification: predict discrete labels from other values.
Examples: logistic regression, support vector machines, classification trees, …
How do you think that data analysis
relates to:
People from different fields (such as statistics, computer science, information science, industry) have different goals and different standard approaches.
data analysis
.In this course we emphasize on drawing insights that help us understand the data.