Skip to content

Multi-Target Prediction Model Evaluation and Analysis using RFR, GBRT, DTR, ABR

Notifications You must be signed in to change notification settings

cspence001/multi-target_pred

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 

Repository files navigation

multi-target_pred

Utilizing a dataset of 10,000 mobile apps hosted in the GooglePlayStore, plot models identify variables determinant of application success and evaluate the relative feature importance in multiple-target regression models to optimize their prediction accuracy of application rating on a decimalized 1-5 scale.

main analysis, model prediction

  • Scatterplot Distributions of Rating v Reviews, Size, Installs by Type
  • Prediction Analysis of App Rating using Random Forest Regressor (RFR), Gradient Boost Regressor (GBR), Decision Tree Regressor (DTR), AdaBoost Regressor (ABR) base models.
    • Evaluating Average Error, Accuracy, Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) of models
    • Plotting Model Predictions of Rating vs Actual Rating based on Reviews, Size, Installs
    • Plotting Prediction Accuracy of RFR, GBR base models and DTR, ABR base models to Test Set
  • Hyperparameter Tuning of RFR, GBR models using GridSearchCV, Feature Importance evaluation of best grid using SHAP Tree Explainer
  • Stacked Generalization ensemble (RFR, GBR, XGBR, RidgeCV estimation) comparison to RFR, GBR base model accuracy

feature correlation

  • Heatmap Correlation of one-hot encoded Categorical Features
  • K-means Cluster Evaluation for binning Reviews, Size
  • Standardized Scaling v Normalized Scaling for Reviews, Size, Installs

base linear models

  • Linear Regression Analysis of Reviews, Size, Installs on Rating for each App Category

tier segmented linear models

  • Linear Regression Analysis of Tier Segmented Reviews, Size, Installs, Type, Content Rating on Rating

weight encoded categorical feature evaluation, pairwise plots

  • Weight of Evidence (WoE) encoding, Information Value (IV) of Categorical variables (Categories, Type, Content Rating)
  • RFR Model Evaluation of Rating Prediction based on Reviews, Size, Installs, WoE Categorical variables
    • Evaluating Feature, Permutation Importance determined by RFR Model using SHAP TreeExplainer
  • Pairwise Plots for Categorical Feature Evaluation using WoE-encoded Categorical variables
  • Target Enclosed Feature Modeling based on Example Rating
jupyter notebooks running python, pandas, numpy, matplotlib, sk.learn

About

Multi-Target Prediction Model Evaluation and Analysis using RFR, GBRT, DTR, ABR

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published