Exam PA Study Manual
Welcome
0.1
FAQ: Frequently Asked Questions
1
The exam
2
Prometric Demo
3
Introduction
4
Getting started
4.1
Extract Transform Load (ETL)
4.2
Installing R
4.3
Installing RStudio
5
Learning to Code
5.1
How to use the PA R cheat sheets?
5.2
Example: SOA PA Task 8
5.3
Example 2 - Data exploration
6
R programming
6.1
Notebook chunks
6.2
Basic operations
6.3
Lists
6.4
Functions
6.5
Data frames
7
Data exploration
7.1
How to make graphs in R?
7.1.1
Add a plot
7.2
The different graph types
7.2.1
Histogram
7.2.2
Box plot
7.2.3
Scatterplot
7.2.4
Bar charts
7.3
How to save time with dplyr?
7.3.1
Data manipulation chaining
7.3.2
Types of Missing Values
7.3.3
Missing Value Resolutions:
7.3.4
Garbage in; garbage out 🗑
7.3.5
Be a detective 🔍
7.3.6
A picture is worth a thousand words 📷
7.3.7
Factor or numeric ❓
7.3.8
73.6% of statistics are false 😨
8
Introduction to modeling
8.1
Modeling vocabulary
8.2
Modeling notation
8.3
Ordinary Least Squares (OLS)
8.4
R^2 Statistic
8.5
Correlation
8.5.1
Pearson’s correlation
8.5.2
Spearman (rank) correlation
8.6
Regression vs. classification
8.7
Regression metrics
8.8
Load Dataset
8.9
Simple Linear Model
8.10
Multiple Linear Regression
8.10.1
Assumptions of OLS
8.10.2
Assumptions of GLMs
8.11
Interpretation of coefficients
8.11.1
Identity link
8.11.2
Log link
8.12
Advantages and disadvantages
8.13
GLMs for regression
8.14
Other links
9
GLMs for classification
9.1
Binary target
9.2
Count target
9.3
Link functions
9.4
GLM – Poisson / Gamma for Claims
9.5
Penalized Regression – Ridge / Lasso
9.6
Interpretation of coefficients
9.6.1
Logit
10
Classification metrics
10.0.1
Probit, Cauchit, Cloglog
10.1
Area Under the ROC Curve (AUC)
10.2
Demo the model for interpretation
10.3
Video Walkthrough
11
Additional GLM topics
11.1
Residuals
11.1.1
Raw residuals
11.1.2
Deviance residuals
11.2
Example
11.3
Log transforms of predictors
11.4
Example
11.5
Reference levels
11.6
Interactions
11.7
Offsets
11.8
Tweedie regression
11.9
Combinations of Link Functions and Target Distributions
11.9.1
Gaussian Response with Log Link
11.9.2
Gaussian Response with Inverse Link
11.9.3
Gaussian Response with Identity Link
11.9.4
Gaussian Response with Log Link and Negative Values
11.9.5
Gamma Response with Log Link
11.9.6
Gamma with Inverse Link
12
GLM variable selection
12.1
Stepwise subset selection
12.2
Example: SOA PA 6/12/19, Task 6
12.3
Penalized Linear Models
12.4
Ridge Regression
12.5
Lasso
12.6
Elastic Net
12.7
Advantages and disadvantages
12.8
Example: Ridge Regression
12.9
Example: The Lasso
13
Bias-variance trade-off
14
Tree-Based Models
14.1
Overview of Models on Exam PA
14.2
Bootstrap & Cross-Validation for Hyperparameter Tuning
14.3
Decision Trees – Core Idea
14.4
How Trees Choose Splits
14.5
Complexity Parameter & Pruning
14.6
Advantages & Disadvantages of Single Trees
14.7
Bagging & Random Forests
14.8
Boosted Trees (GBM)
14.9
Partial Dependence Plots
14.10
Final Exam Tips
14.11
Load Dataset
14.12
Single Decision Tree
14.13
Random Forest
14.14
Gradient Boosting (gbm)
14.15
XGBoost
14.16
Direct lightgbm
14.17
Variable Importance (from direct LightGBM)
14.18
Video Walkthrough
15
Writing and Communication
16
Bank Loans Case Study
16.1
Task 1 - Examine the Target Variable
16.2
Task 2 - Decide on which variables to discard
16.3
Task 3 - Examine the numeric variables
16.4
Task 4 - Examine the factor variables
16.5
Task 5 - Fit a GLM
16.6
Task 6 - Use AIC to select features
16.7
Task 8 - Create a bagged tree model
16.8
Task 9 - Measure the variable importance with a Random Forest
16.9
Save Final Model
17
Loan Risk Modeling
17.1
Data Load
17.2
Task 1: Explore Variables
17.3
Task 2: Reduce Factor Levels
17.4
Task 3: Modify Hour & Temperature
17.5
Task 4: Target Transformation
17.6
Task 5: Trees (Original & Transformed)
17.7
Task 7: GLM (Poisson & Gamma)
17.8
Task 8: Interaction (hour_dist * weather)
17.9
Task 9: Feature Selection (BIC backward)
17.10
Task 10-11: Final Model & Validation
18
Robocalls Consumer Protections
18.1
GDPR Section
18.2
Deepfake Concerns
19
References
Published with bookdown
Data exploration
15
Writing and Communication
This chapter has become obsolute due to the access of language models.