Week 4 of the Data Analytics project marks a critical transition from basic data preparation toward understanding the statistical relationships among variables. The main goal of this phase was to explore how different features within the dataset relate to one another and, more importantly, how they influence the target variable โ typically representing customer satisfaction or a performance score.
This weekโs analytical task focused on correlation analysis, a fundamental step before predictive modeling. By computing the correlation coefficients between numerical features, you identified patterns of association that help in selecting relevant predictors and removing redundant or non-informative variables. Understanding these relationships is essential for improving model accuracy and interpretability in later stages such as regression or classification.
The workflow began with loading and cleaning the dataset (Customer_support_data.csv), ensuring no missing or inconsistent values remained.
Categorical features were encoded using label encoding to make them suitable for numerical computations.
Next, the correlation matrix was generated to visually and numerically compare inter-variable relationships.
Using Seabornโs heatmap visualization, the strongest and weakest correlations were identified with a color-coded representation for easier interpretation.
To deepen conceptual understanding, a manual correlation computation was performed for one of the most significant feature pairs. This step-by-step calculation demonstrated how correlation values are mathematically derived using mean-centered values and standard deviations, thereby reinforcing both statistical intuition and technical accuracy.
The week concluded by highlighting the Top 3 predictive features most correlated with the target variable. These insights serve as a foundation for Week 5, where these features will be utilized to build predictive models that forecast customer satisfaction or business outcomes. In essence, Week 4 established a data-driven pathway for selecting variables that matter โ paving the way for robust model development and validation in subsequent stages.
In summary: Week 4 successfully bridged exploratory data analysis with predictive modeling preparation โ transforming raw data into actionable insights through systematic correlation assessment and statistical reasoning.
Customer_support_data.csv).Automatically identified target: CSAT Score
| Column Index | Column Name |
|---|---|
| 0 | Unique id |
| 1 | channel_name |
| 2 | category |
| 3 | Sub-category |
| 4 | Customer Remarks |
| 5 | Order_id |
| 6 | order_date_time |
| 7 | Issue_reported at |
| 8 | issue_responded |
| 9 | Survey_response_Date |
| 10 | Customer_City |
| 11 | Product_category |
| 12 | Item_price |
| 13 | connected_handling_time |
| 14 | Agent_name |
| 15 | Supervisor |
| 16 | Manager |
| 17 | Tenure Bucket |
| 18 | Agent Shift |
| 19 | CSAT Score |
| Unique id | channel_name | category | Sub-category | Customer Remarks | Order_id | order_date_time | Issue_reported at | issue_responded | Survey_response_Date | Customer_City | Product_category | Item_price | connected_handling_time | Agent_name | Supervisor | Manager | Tenure Bucket | Agent Shift | CSAT Score |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7e9ae164-6a8b-4521-a2d4-58f7c9fff13f | Outcall | Product Queries | Life Insurance | โ | c27c9bb4-fa36-4140-9f1f-21009254ffdb | โ | 01/08/2023 11:13 | 01/08/2023 11:47 | 01-Aug-23 | โ | โ | โ | โ | Richard Buchanan | Mason Gupta | Jennifer Nguyen | On Job Training | Morning | 5 |
| b07ec1b0-f376-43b6-86df-ec03da3b2e16 | Outcall | Product Queries | Product Specific Information | โ | d406b0c7-ce17-4654-b9de-f08d421254bd | โ | 01/08/2023 12:52 | 01/08/2023 12:54 | 01-Aug-23 | โ | โ | โ | โ | Vicki Collins | Dylan Kim | Michael Lee | >90 | Morning | 5 |
| 200814dd-27c7-4149-ba2b-bd3af3092880 | Inbound | Order Related | Installation/demo | โ | c273368d-b961-44cb-beaf-62d6fd6c00d5 | โ | 01/08/2023 20:16 | 01/08/2023 20:38 | 01-Aug-23 | โ | โ | โ | โ | Duane Norman | Jackson Park | William Kim | On Job Training | Evening | 5 |
| eb0d3e53-c1ca-42d3-8486-e42c8d622135 | Inbound | Returns | Reverse Pickup Enquiry | โ | 5aed0059-55a4-4ec6-bb54-97942092020a | โ | 01/08/2023 20:56 | 01/08/2023 21:16 | 01-Aug-23 | โ | โ | โ | โ | Patrick Flores | Olivia Wang | John Smith | >90 | Evening | 5 |
| ba903143-1e54-406c-b969-46c52f92e5df | Inbound | Cancellation | Not Needed | โ | e8bed5a9-6933-4aff-9dc6-ccefd7dcde59 | โ | 01/08/2023 10:30 | 01/08/2023 10:32 | 01-Aug-23 | โ | โ | โ | โ | Christopher Sanchez | Austin Johnson | Michael Lee | 0-30 | Morning | 5 |
Total missing values after cleaning: 0
Encoded 17 categorical columns using LabelEncoder.
| Index | Categorical Column |
|---|---|
| 0 | Unique id |
| 1 | channel_name |
| 2 | category |
| 3 | Sub-category |
| 4 | Customer Remarks |
| 5 | Order_id |
| 6 | order_date_time |
| 7 | Issue_reported at |
| 8 | issue_responded |
| 9 | Survey_response_Date |
| 10 | Customer_City |
| 11 | Product_category |
| 12 | Agent_name |
| 13 | Supervisor |
| 14 | Manager |
| 15 | Tenure Bucket |
| 16 | Agent Shift |
| Unique id | channel_name | category | Sub-category | Customer Remarks | Order_id | order_date_time | Issue_reported at | issue_responded | Survey_response_Date | Customer_City | Product_category | Item_price | connected_handling_time | Agent_name | Supervisor | Manager | Tenure Bucket | Agent Shift | CSAT Score | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Unique id | 1.00 | 0.00 | -0.01 | 0.00 | -0.01 | 0.00 | -0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -0.00 | -0.00 | -0.00 | 0.00 | 0.00 | -0.00 | -0.01 | 0.00 |
| channel_name | 0.00 | 1.00 | 0.02 | 0.03 | 0.00 | -0.01 | -0.02 | 0.06 | 0.06 | 0.06 | -0.02 | -0.06 | -0.04 | 0.02 | 0.00 | -0.01 | 0.03 | 0.03 | -0.03 | 0.03 |
| category | -0.01 | 0.02 | 1.00 | 0.39 | -0.00 | -0.02 | -0.04 | -0.01 | -0.01 | -0.01 | -0.04 | -0.07 | -0.09 | -0.00 | -0.01 | 0.04 | -0.02 | -0.01 | 0.01 | 0.08 |
| Sub-category | 0.00 | 0.03 | 0.39 | 1.00 | 0.01 | -0.02 | -0.02 | 0.01 | 0.02 | 0.01 | -0.03 | -0.04 | -0.07 | 0.01 | 0.00 | 0.02 | -0.00 | -0.01 | -0.00 | 0.02 |
| Customer Remarks | -0.01 | 0.00 | -0.00 | 0.01 | 1.00 | 0.01 | 0.01 | -0.00 | -0.00 | -0.00 | -0.00 | 0.01 | 0.01 | -0.00 | 0.00 | -0.00 | 0.00 | -0.00 | 0.01 | -0.09 |
| Order_id | 0.00 | -0.01 | -0.02 | -0.02 | 0.01 | 1.00 | 0.04 | 0.11 | 0.11 | 0.11 | 0.05 | 0.10 | 0.05 | -0.00 | -0.00 | -0.00 | 0.01 | 0.03 | -0.01 | -0.01 |
| order_date_time | -0.00 | -0.02 | -0.04 | -0.02 | 0.01 | 0.04 | 1.00 | 0.01 | -0.01 | -0.01 | 0.08 | 0.20 | 0.08 | -0.00 | 0.00 | -0.00 | -0.00 | -0.01 | 0.00 | -0.04 |
| Issue_reported at | 0.00 | 0.06 | -0.01 | 0.01 | -0.00 | 0.11 | 0.01 | 1.00 | 0.98 | 0.98 | -0.02 | -0.03 | -0.04 | -0.00 | -0.00 | -0.04 | 0.10 | 0.18 | -0.01 | 0.03 |
| issue_responded | 0.00 | 0.06 | -0.01 | 0.02 | -0.00 | 0.11 | -0.01 | 0.98 | 1.00 | 1.00 | -0.02 | -0.04 | -0.04 | -0.00 | -0.00 | -0.04 | 0.10 | 0.18 | -0.00 | 0.03 |
| Survey_response_Date | 0.00 | 0.06 | -0.01 | 0.01 | -0.00 | 0.11 | -0.01 | 0.98 | 1.00 | 1.00 | -0.02 | -0.04 | -0.04 | -0.00 | -0.00 | -0.04 | 0.10 | 0.18 | 0.00 | 0.03 |
| Feature | Correlation Value |
|---|---|
| category | 0.077 |
| Issue_reported at | 0.033 |
| Survey_response_Date | 0.032 |
These features are the strongest (positive) Pearson correlations with the CSAT Score target in this dataset. Use these as priority predictors when building models.
Feature explained: category vs Target: CSAT Score
| category | CSAT Score | (X - Xฬ) | (Y - ศฒ) | (X - Xฬ)*(Y - ศฒ) | (X - Xฬ)ยฒ | (Y - ศฒ)ยฒ |
|---|---|---|---|---|---|---|
| 8 | 5 | 0.042255 | 0.757843 | 0.032023 | 0.001785 | 0.574326 |
| 8 | 5 | 0.042255 | 0.757843 | 0.032023 | 0.001785 | 0.574326 |
| 5 | 5 | -2.957745 | 0.757843 | -2.241506 | 8.748256 | 0.574326 |
| 10 | 5 | 2.042255 | 0.757843 | 1.547708 | 4.170805 | 0.574326 |
| 1 | 5 | -6.957745 | 0.757843 | -5.272877 | 48.410216 | 0.574326 |
| 10 | 5 | 2.042255 | 0.757843 | 1.547708 | 4.170805 | 0.574326 |
| 8 | 5 | 0.042255 | 0.757843 | 0.032023 | 0.001785 | 0.574326 |
| 10 | 5 | 2.042255 | 0.757843 | 1.547708 | 4.170805 | 0.574326 |
| 10 | 5 | 2.042255 | 0.757843 | 1.547708 | 4.170805 | 0.574326 |
| 11 | 5 | 3.042255 | 0.757843 | 2.305551 | 9.255315 | 0.574326 |
The manual computation confirms the Pearson correlation reported by pandas. This validates the calculation and helps build statistical intuition.
This Weekโs Milestone (as per course outline):
โIdentify key predictive variables.โ
๐ Final Week 4 Project Milestone:
Key predictive features identified for the modeling phase.
(You now know which independent variables strongly influence your target variable.)