Week 11 : Natural Language Processing (NLP)
1. Introduction
In Week 11, our focus is on building the **Customer Satisfaction Recommendation Model** using text data collected from customer surveys and order feedback.
This week emphasizes **data preprocessing, feature extraction using TF-IDF**, and preparing **final features for the recommendation engine**.
We aim to transform raw customer remarks into structured data suitable for machine learning models.
2. Objectives
- Understand the dataset and explore relevant columns for Week 11.
- Preprocess textual data for machine learning, including cleaning, tokenization, and normalization.
- Apply TF-IDF vectorization to extract key features from customer remarks.
- Prepare final feature set by combining TF-IDF features with categorical and numerical columns.
- Demonstrate outputs and key metrics for the recommendation model readiness.
3. Week 11 Tasks / Assignment
- Load and explore the dataset, focusing on columns like Customer Remarks, Product Category, and CSAT Score.
- Perform text preprocessing: remove stopwords, punctuation, lowercase conversion, and stemming.
- Generate TF-IDF features from cleaned customer remarks.
- Prepare the final feature table for model input.
- Visualize sample outputs: dataset columns, processed text, TF-IDF features, and final features.
- Calculate key metrics such as number of records processed, number of unique words in TF-IDF, and feature matrix size.
4. Outputs
4.1 Dataset Columns
| Index | Column Name |
| 0 | Unique id |
| 1 | channel_name |
| 2 | category |
| 3 | Sub-category |
| 4 | Customer Remarks |
| 5 | Order_id |
| 6 | order_date_time |
| 7 | Issue_reported at |
| 8 | issue_responded |
| 9 | Survey_response_Date |
| 10 | Customer_City |
| 11 | Product_category |
| 12 | Item_price |
| 13 | connected_handling_time |
| 14 | Agent_name |
| 15 | Supervisor |
| 16 | Manager |
| 17 | Tenure Bucket |
| 18 | Agent Shift |
| 19 | CSAT Score |
4.2 Sample Text Data
| Customer Remarks |
Issue_reported at |
| Item delayed in delivery | 01/08/2023 11:13 |
| Received wrong product | 01/08/2023 12:52 |
| Packaging damaged | 01/08/2023 20:16 |
| Product not working | 01/08/2023 20:56 |
| Excellent service | 01/08/2023 10:30 |
4.3 Processed Text Sample
| Customer Remarks |
Processed_Remarks |
| Item delayed in delivery | item delay deliveri |
| Received wrong product | receiv wrong product |
| Packaging damaged | packag damag |
| Product not working | product work |
| Excellent service | excel servic |
4.4 TF-IDF Features (Sample)
| Unique id | item | delay | deliveri | receiv | wrong | product |
| 7e9ae164 | 1 | 1 | 1 | 0 | 0 | 0 |
| b07ec1b0 | 0 | 0 | 0 | 1 | 1 | 1 |
| 200814dd | 0 | 0 | 0 | 0 | 0 | 0 |
4.5 Final Features for Recommendation Model (Sample)
| Affiliates | Books & General merchandise | Electronics | Furniture | GiftCard | Home |
| 0 | 5 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 |
5. Key Metrics
- Total records processed: 50,000+
- Number of unique words (TF-IDF): 8,500+
- Feature matrix size for recommendation model: 50,000 x 100
- Columns used for modeling: Customer Remarks, Product Category, CSAT Score, TF-IDF Features
6. Week 11 Milestone
- Completed data exploration and preprocessing.
- Generated TF-IDF feature matrix from customer remarks.
- Prepared final feature table ready for input into recommendation model.
- Visualized sample outputs to verify correctness.
- Documented key metrics and insights for Week 11.