Cleaning

Eliminating NAs

Many of the variables have many NAs. 67 variables were eliminated for having NAs. 93 were retained for having no NAs.

Choosing predictors

As someone who has been training regularly for many years, I am familiar with exercise form and biomechanics.

The most important and unique predictor is the participant. People have different proportions and do exercises slightly differently than one another. However, the way an individual performs over time is unique to that individual.

Referring to the five classes of execution, it seems to me that stability is extremely important. The additional predictors I chose are thus the pitch, roll, and yaw of the dumbbell, forearm, arm, and belt. Any movement (and lack of movement) detected in any of the three dimensions for these four regions should accurately predict form and be unique for an individual.

Fortunately these none of these predictors were eliminated by removing NAs, which led me to believe I should have a reasonable model with these predictors.

Appendix

Summary

dim(movement_data)

[1] 19622   160

print(unique(movement_data$user_name), max.levels = 0)

[1] carlitos pedro    adelmo   charles  eurico   jeremy

Cleaning

Eliminating NAs

na_count <- sapply(movement_data, function(x) sum(is.na(x)))
na_df <- data.frame(na_count)
to_retain <- subset(na_df, na_count == 0)
new_movement_data <- movement_data[, row.names(to_retain)]
table(na_df$na_count)


    0 19216 
   93    67

Choosing predictors

pry_names <- names(new_movement_data)[grep("^pitch_|^roll_|^yaw_", names(new_movement_data))]
predictors <- c("user_name", pry_names, "classe")
new_movement_data <- new_movement_data[, predictors]
prediction_data <- prediction_data[, predictors[-14]]
predictors

 [1] "user_name"      "roll_belt"      "pitch_belt"     "yaw_belt"      
 [5] "roll_arm"       "pitch_arm"      "yaw_arm"        "roll_dumbbell" 
 [9] "pitch_dumbbell" "yaw_dumbbell"   "roll_forearm"   "pitch_forearm" 
[13] "yaw_forearm"    "classe"

Partitioning

set.seed(12321)
in_train <- createDataPartition(y = new_movement_data$classe, p = 0.9, list = FALSE)
training <- new_movement_data[in_train,]
testing <- new_movement_data[-in_train,]

Cross Validation

set.seed(32123)
train.control <- trainControl(method = "cv", number = 10)
model <- train(classe ~ ., data = training, method = "rf", trControl = train.control)

Accuracy

pred <- predict(model, testing)
confusion_matrix <- table(pred, testing$classe)
confusion_matrix

    
pred   A   B   C   D   E
   A 558   1   0   0   0
   B   0 377   1   0   0
   C   0   1 341   2   1
   D   0   0   0 319   1
   E   0   0   0   0 358

(sum(confusion_matrix) - sum(diag(confusion_matrix))) / sum(confusion_matrix)

[1] 0.003571429

Predicted Exercise Execution

print(predict(model, prediction_data), max.levels = 0)

 [1] B A B A A E D B A A B C B A E E A B B B

Practical Machine Learning Course Projects

Fitness Quality, not Quantity

Rohan Lewis

December 23rd, 2019

Summary

Cleaning

Eliminating NAs

Choosing predictors

Partitioning

Cross Validation

Accuracy

Predicted Exercise Execution

Appendix

Summary

Cleaning

Eliminating NAs

Choosing predictors

Partitioning

Cross Validation

Accuracy

Predicted Exercise Execution