Week 7 – Model Evaluation

Model Optimization and Comparison From Baseline to Best: Fine-Tuning Models for Maximum Accuracy

Introduction

In Week 7, you will enhance your previously developed classification models (from Week 6) by improving their accuracy and generalization ability through hyperparameter tuning, cross-validation, and feature importance analysis. This week is about optimization β€” turning your baseline models into high-performing, fine-tuned prediction systems for your E-commerce Recommendation System project.

Objectives

πŸ“ Load and Preprocess Data

Sr No. Unique id channel_name category Sub-category Customer Remarks Order_id order_date_time Issue_reported at issue_responded Survey_response_Date Customer_City Product_category Item_price connected_handling_time Agent_name Supervisor Manager Tenure Bucket Agent Shift CSAT Score
1 7e9ae164-6a8b-4521-a2d4-58f7c9fff13f Outcall Product Queries Life Insurance NaN c27c9bb4-fa36-4140-9f1f-21009254ffdb NaN 01/08/2023 11:13 01/08/2023 11:47 01-Aug-23 NaN NaN NaN NaN Richard Buchanan Mason Gupta Jennifer Nguyen On Job Training Morning 5
2 b07ec1b0-f376-43b6-86df-ec03da3b2e16 Outcall Product Queries Product Specific Information NaN d406b0c7-ce17-4654-b9de-f08d421254bd NaN 01/08/2023 12:52 01/08/2023 12:54 01-Aug-23 NaN NaN NaN NaN Vicki Collins Dylan Kim Michael Lee >90 Morning 5
3 200814dd-27c7-4149-ba2b-bd3af3092880 Inbound Order Related Installation/demo NaN c273368d-b961-44cb-beaf-62d6fd6c00d5 NaN 01/08/2023 20:16 01/08/2023 20:38 01-Aug-23 NaN NaN NaN NaN Duane Norman Jackson Park William Kim On Job Training Evening 5
4 eb0d3e53-c1ca-42d3-8486-e42c8d622135 Inbound Returns Reverse Pickup Enquiry NaN 5aed0059-55a4-4ec6-bb54-97942092020a NaN 01/08/2023 20:56 01/08/2023 21:16 01-Aug-23 NaN NaN NaN NaN Patrick Flores Olivia Wang John Smith >90 Evening 5
5 ba903143-1e54-406c-b969-46c52f92e5df Inbound Cancellation Not Needed NaN e8bed5a9-6933-4aff-9dc6-ccefd7dcde59 NaN 01/08/2023 10:30 01/08/2023 10:32 01-Aug-23 NaN NaN NaN NaN Christopher Sanchez Austin Johnson Michael Lee 0-30 Morning 5
6 1cfde5b9-6112-44fc-8f3b-892196137a62 Email Returns Fraudulent User NaN a2938961-2833-45f1-83d6-678d9555c603 NaN 01/08/2023 15:13 01/08/2023 18:39 01-Aug-23 NaN NaN NaN NaN Desiree Newton Emma Park John Smith 0-30 Morning 5
7 11a3ffd8-1d6b-4806-b198-c60b5934c9bc Outcall Product Queries Product Specific Information NaN bfcb562b-9a2f-4cca-aa79-fd4e2952f901 NaN 01/08/2023 15:31 01/08/2023 23:52 01-Aug-23 NaN NaN NaN NaN Shannon Hicks Aiden Patel Olivia Tan >90 Morning 5
8 372b51a5-fa19-4a31-a4b8-a21de117d75e Inbound Returns Exchange / Replacement Very good 88537e0b-5ffa-43f9-bbe2-fe57a0f4e4ae NaN 01/08/2023 16:17 01/08/2023 16:23 01-Aug-23 NaN NaN NaN NaN Laura Smith Evelyn Kimura Jennifer Nguyen On Job Training Evening 5
9 6e4413db-4e16-42fc-ac92-2f402e3df03c Inbound Returns Missing Shopzilla app and it's all customer care serv... e6be9713-13c3-493c-8a91-2137cbbfa7e6 NaN 01/08/2023 21:03 01/08/2023 21:07 01-Aug-23 NaN NaN NaN NaN David Smith Nathan Patel John Smith >90 Split 5
10 b0a65350-64a5-4603-8b9a-a24a4a145d08 Inbound Shopzilla Related General Enquiry NaN c7caa804-2525-499e-b202-4c781cb68974 NaN 01/08/2023 23:31 01/08/2023 23:36 01-Aug-23 NaN NaN NaN NaN Tabitha Ayala Amelia Tanaka Michael Lee 31-60 Evening 5

Key Steps in Week 7 Assignment

Use the same cleaned dataset from Week 6. Ensure:

You can reuse your preprocessing code from Week 6.

βœ… Preprocessing Completed

DetailValue
Dataset Shape(85907, 49)
Features(85907, 2)
Target (CSAT_Category)Binary (Satisfied = 1, Unsatisfied = 0)

πŸ“Š Baseline Accuracy

ModelAccuracy
Logistic Regression0.8265
Random Forest0.8211
Interpretation: The baseline results show the initial performance of both models before any tuning. Typically, Random Forest performs better on mixed and non-linear datasets compared to Logistic Regression, which assumes a linear relationship.

Understanding GridSearchCV Output

Fitting 5 folds for each of 6 candidates, totalling 30 fits
Fitting 3 folds for each of 8 candidates, totalling 24 fits

Fitting 5 folds for each of 6 candidates, totalling 30 fits
Fitting 3 folds for each of 8 candidates, totalling 24 fits

βœ… These are status logs β€” not visual graphs. They show that GridSearchCV is running multiple model fits internally to test combinations of hyperparameters.

πŸ” Step-by-Step Explanation

When you see messages like the ones above, they come from GridSearchCV in Scikit-Learn, which automatically tests multiple parameter combinations to find the best-performing model.

1️⃣ Fitting 5 folds for each of 6 candidates

This refers to the Logistic Regression model.

Example combinations for Logistic Regression:

  • C=0.1, penalty='l1'
  • C=0.1, penalty='l2'
  • C=1, penalty='l1'
  • C=1, penalty='l2'
  • C=10, penalty='l1'
  • C=10, penalty='l2'

That’s why you see: 6 Γ— 5 = 30 fits β€” meaning 30 training and testing cycles in total.

2️⃣ Fitting 3 folds for each of 8 candidates

This message corresponds to the Random Forest model.

Total = 8 Γ— 3 = 24 fits.

πŸ† Best Hyperparameters Found

<li>Logistic Regression β†’ {'C': 0.1, 'penalty': 'l1', 'solver': 'liblinear'}</li>
<li>Random Forest β†’ {'max_depth': 10, 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}</li>

After testing all parameter combinations, GridSearchCV identifies which settings produce the highest validation accuracy.

βš™οΈ Hyperparameter Explanation

Model What GridSearchCV Did Best Hyperparameters Why They Matter
Logistic Regression Tested 6 parameter combinations with 5-fold cross-validation (30 fits) C=0.1, penalty='l1', solver='liblinear' Simpler, regularized model that avoids overfitting
Random Forest Tested 8 parameter combinations with 3-fold cross-validation (24 fits) max_depth=10, min_samples_leaf=2, n_estimators=100 Deep yet controlled trees for better generalization

πŸ’‘ Final Takeaway: The β€œfitting” messages simply mean GridSearchCV is testing multiple hyperparameter combinations using cross-validation. The result is a tuned, optimized model with the best configuration for your dataset.

πŸ† Best Hyperparameters Found

ModelBest Parameters
Logistic Regression{'C': 0.1, 'penalty': 'l1', 'solver': 'liblinear'}
Random Forest{'max_depth': 10, 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}
Interpretation: Grid Search identifies the combination of parameters that yield the highest accuracy. For example, Logistic Regression may prefer L2 regularization, while Random Forest benefits from deeper trees or larger minimum samples per leaf.

πŸ“ˆ Tuned Logistic Regression Performance

Accuracy: 0.8264

LabelPrecisionRecallF1-scoreSupport
00.4040.0080.0152971.000
10.8280.9980.90514211.000
Accuracy0.8260.8260.8260.826
Macro avg0.6160.5030.46017182.000
Weighted avg0.7540.8260.75117182.000
Example
Interpretation of Confusion Matrix:
- The diagonal cells represent correctly classified samples.
- The top-left cell shows correctly predicted 'Unsatisfied' customers, while the bottom-right shows correctly predicted 'Satisfied' ones.
- The fewer off-diagonal values, the better the model’s predictive power.

πŸ“ˆ Tuned Random Forest Performance

Accuracy: 0.8271

LabelPrecisionRecallF1-scoreSupport
00.5000.0030.0052971.000
10.8270.9990.90514211.000
Accuracy0.8270.870.870.87
Macro avg0.6640.5010.45517182.000
Weighted avg0.7710.8270.75017182.000
Example
Interpretation of Confusion Matrix:
- The diagonal cells represent correctly classified samples.
- The top-left cell shows correctly predicted 'Unsatisfied' customers, while the bottom-right shows correctly predicted 'Satisfied' ones.
- The fewer off-diagonal values, the better the model’s predictive power.

πŸ” Cross-Validation Accuracy

ModelMean CV Accuracy
Logistic Regression (Tuned)0.8239
Random Forest (Tuned)0.8245
Example
Interpretation: Cross-validation helps confirm that the model’s performance is stable across different data splits. Random Forest generally has a smaller variance and performs consistently well across folds.
Example

πŸ“Š Interpretation of Feature Importance

The Random Forest model highlights the top contributing features for predicting customer satisfaction. If response time or support quality rank high, it indicates that improving these aspects could significantly increase customer satisfaction scores. This chart helps prioritize actionable business areas for improvement.

πŸ“˜ Final Interpretation of Charts

πŸš€ Project Milestone for Week 7

Milestone: Optimize baseline models (Logistic Regression and Random Forest) through hyperparameter tuning and cross-validation.

Outcome: