Week 11 : Natural Language Processing (NLP)

1. Introduction

In Week 11, our focus is on building the **Customer Satisfaction Recommendation Model** using text data collected from customer surveys and order feedback. This week emphasizes **data preprocessing, feature extraction using TF-IDF**, and preparing **final features for the recommendation engine**. We aim to transform raw customer remarks into structured data suitable for machine learning models.

2. Objectives

Understand the dataset and explore relevant columns for Week 11.
Preprocess textual data for machine learning, including cleaning, tokenization, and normalization.
Apply TF-IDF vectorization to extract key features from customer remarks.
Prepare final feature set by combining TF-IDF features with categorical and numerical columns.
Demonstrate outputs and key metrics for the recommendation model readiness.

3. Week 11 Tasks / Assignment

Load and explore the dataset, focusing on columns like Customer Remarks, Product Category, and CSAT Score.
Perform text preprocessing: remove stopwords, punctuation, lowercase conversion, and stemming.
Generate TF-IDF features from cleaned customer remarks.
Prepare the final feature table for model input.
Visualize sample outputs: dataset columns, processed text, TF-IDF features, and final features.
Calculate key metrics such as number of records processed, number of unique words in TF-IDF, and feature matrix size.

4. Outputs

4.1 Dataset Columns

Index	Column Name
0	Unique id
1	channel_name
2	category
3	Sub-category
4	Customer Remarks
5	Order_id
6	order_date_time
7	Issue_reported at
8	issue_responded
9	Survey_response_Date
10	Customer_City
11	Product_category
12	Item_price
13	connected_handling_time
14	Agent_name
15	Supervisor
16	Manager
17	Tenure Bucket
18	Agent Shift
19	CSAT Score

4.2 Sample Text Data

Customer Remarks	Issue_reported at
Item delayed in delivery	01/08/2023 11:13
Received wrong product	01/08/2023 12:52
Packaging damaged	01/08/2023 20:16
Product not working	01/08/2023 20:56
Excellent service	01/08/2023 10:30

4.3 Processed Text Sample

Customer Remarks	Processed_Remarks
Item delayed in delivery	item delay deliveri
Received wrong product	receiv wrong product
Packaging damaged	packag damag
Product not working	product work
Excellent service	excel servic

4.4 TF-IDF Features (Sample)

Unique id	item	delay	deliveri	receiv	wrong	product
7e9ae164	1	1	1	0	0	0
b07ec1b0	0	0	0	1	1	1
200814dd	0	0	0	0	0	0

4.5 Final Features for Recommendation Model (Sample)

Affiliates	Books & General merchandise	Electronics	Furniture	GiftCard	Home
0	5	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0

5. Key Metrics

Total records processed: 50,000+
Number of unique words (TF-IDF): 8,500+
Feature matrix size for recommendation model: 50,000 x 100
Columns used for modeling: Customer Remarks, Product Category, CSAT Score, TF-IDF Features

6. Week 11 Milestone

Completed data exploration and preprocessing.
Generated TF-IDF feature matrix from customer remarks.
Prepared final feature table ready for input into recommendation model.
Visualized sample outputs to verify correctness.
Documented key metrics and insights for Week 11.