Build a baseline regression model to predict a numeric target variable from your dataset. In your case:
The assignment focuses on supervised learning using regression, splitting the data into training and testing sets, building a Linear Regression model, evaluating it with MAE, MSE, RMSE, Explained Variance, and visualizing predictions.
Customer_support_data.csv) using Pandas.| Unique id | channel_name | category | Sub-category | Customer Remarks | Order_id | order_date_time | Issue_reported at | issue_responded | Survey_response_Date | Customer_City | Product_category | Item_price | connected_handling_time | Agent_name | Supervisor | Manager | Tenure Bucket | Agent Shift | CSAT Score |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7e9ae164-6a8b-4521-a2d4-58f7c9fff13f | Outcall | Product Queries | Life Insurance | β | c27c9bb4-fa36-4140-9f1f-21009254ffdb | β | 01/08/2023 11:13 | 01/08/2023 11:47 | 01-Aug-23 | β | β | β | β | Richard Buchanan | Mason Gupta | Jennifer Nguyen | On Job Training | Morning | 5 |
| b07ec1b0-f376-43b6-86df-ec03da3b2e16 | Outcall | Product Queries | Product Specific Information | β | d406b0c7-ce17-4654-b9de-f08d421254bd | β | 01/08/2023 12:52 | 01/08/2023 12:54 | 01-Aug-23 | β | β | β | β | Vicki Collins | Dylan Kim | Michael Lee | >90 | Morning | 5 |
| 200814dd-27c7-4149-ba2b-bd3af3092880 | Inbound | Order Related | Installation/demo | β | c273368d-b961-44cb-beaf-62d6fd6c00d5 | β | 01/08/2023 20:16 | 01/08/2023 20:38 | 01-Aug-23 | β | β | β | β | Duane Norman | Jackson Park | William Kim | On Job Training | Evening | 5 |
| eb0d3e53-c1ca-42d3-8486-e42c8d622135 | Inbound | Returns | Reverse Pickup Enquiry | β | 5aed0059-55a4-4ec6-bb54-97942092020a | β | 01/08/2023 20:56 | 01/08/2023 21:16 | 01-Aug-23 | β | β | β | β | Patrick Flores | Olivia Wang | John Smith | >90 | Evening | 5 |
| ba903143-1e54-406c-b969-46c52f92e5df | Inbound | Cancellation | Not Needed | β | e8bed5a9-6933-4aff-9dc6-ccefd7dcde59 | β | 01/08/2023 10:30 | 01/08/2023 10:32 | 01-Aug-23 | β | β | β | β | Christopher Sanchez | Austin Johnson | Michael Lee | 0-30 | Morning | 5 |
Customer_ID, Order_ID, Product_ID β These donβt help the model and may cause memory issues.pd.get_dummies) and avoid creating too many dummy columns to prevent memory errors.β Preprocessing Done!
Shape after preprocessing: (85907, 3)
Explanation:
Shape = (Rows, Columns)
Rows = number of customer interactions
Columns = number of features (predictors) + target (CSAT Score)
X) and target (y).Features Shape: (85907, 2) β Predictor variables used for regression
Target Shape: (85907,) β CSAT Score values to predict
train_test_split() with a random state for reproducibility.Train/Test Split Completed
Training Samples: 68,725 β Used to train the regression model
Testing Samples: 17,182 β Used to evaluate model predictions
sklearn.linear_model.LinearRegression to train the model on the training set.Where:
Item_price: -0.000021 β Effect on CSAT Score per unit change in Item_price
connected_handling_time: 0.000140 β Effect on CSAT Score per unit change in connec
y_pred = model.predict(X_test)residual = y_test - y_predFormulas used:
MAE: 1.04 β Average absolute prediction error
MSE: 1.87 β Average squared prediction error
RMSE: 1.37 β Standard deviation of prediction errors
Explained Variance: 0.01 β Portion of variance explained by model
| Index | Actual CSAT | Predicted CSAT | Residual |
|---|---|---|---|
| 67871 | 5 | 4.260457 | 0.739543 |
| 40187 | 5 | 4.278021 | 0.721979 |
| 60075 | 5 | 4.260457 | 0.739543 |
| 69560 | 5 | 4.260457 | 0.739543 |
| 2605 | 5 | 4.275076 | 0.724924 |
| 73327 | 5 | 4.260457 | 0.739543 |
| 4382 | 1 | 4.260457 | -3.260457 |
| 10405 | 5 | 4.260457 | 0.739543 |
| 24494 | 5 | 4.260457 | 0.739543 |
| 5473 | 1 | 4.260457 | -3.260457 |
"Build your first baseline regression model."
This milestone demonstrates the first working predictive model for your semester-long project and sets the foundation for further analysis and model optimization.