Week 2: Data Collection & Cleaning

Transforming Raw Data into Reliable Insights"

Data Source Kaggel.com

1. Data Cleaning Process

In this assignment, we were required to remove duplicate records, handle missing values, and treat outliers in the dataset. These steps ensure that the data is accurate, consistent, and reliable for further analysis. After applying the cleaning techniques to our project dataset, we generated a “Before vs After Cleaning Report” which highlights the differences in data quality by showing the dataset’s shape, missing values, and sample rows before cleaning compared to the cleaned dataset.

Dataset Overview

The dataset has been successfully loaded for analysis. It contains 85,907 rows and 20 columns. The dataset comprises customer service interactions, including details about the agent, customer remarks, issue reported, response time, CSAT score, and other transactional information.

2. Initial Dataset Inspection

Shape Before Cleaning

3. Missing Values Before Cleaning

ColumnMissing Values
Unique id0
channel_name0
category0
Sub-category0
Customer Remarks57,165
Order_id18,232
order_date_time68,693
Issue_reported at0
issue_responded0
Survey_response_Date0
Customer_City68,828
Product_category68,711
Item_price68,701
connected_handling_time85,665
Agent_name0
Supervisor0
Manager0
Tenure Bucket0
Agent Shift0
CSAT Score0

Observation: Several columns contain significant missing values, especially Customer Remarks, order_date_time, Customer_City, and connected_handling_time.

4. Missing Value Handling Strategy

The missing values were handled using Mean / Median / Mode imputation based on the data type and distribution.

ColumnMeanMedianModeUsed Strategy
Unique idnannan0000978d-8599-4e17-8cf6-28e9faeba4c9Mode
channel_namenannanInboundMode
categorynannanReturnsMode
Sub-categorynannanReverse Pickup EnquiryMode
Customer RemarksnannanGoodMode
Order_idnannan0000a850-ca65-4eb1-ace1-9e34d5d1637aMode
order_date_timenannan09/08/2023 11:55Mode
Issue_reported atnannan13/08/2023 10:40Mode
issue_respondednannan28/08/2023 00:00Mode
Survey_response_Datenannan28-Aug-23Mode
Customer_CitynannanHYDERABADMode
Product_categorynannanElectronicsMode
Item_price5660.77979999.0Median
connected_handling_time462.4427282.0Median
Agent_namenannanWendy TaylorMode
SupervisornannanCarter ParkMode
ManagernannanJohn SmithMode
Tenure Bucketnannan>90Mode
Agent ShiftnannanMorningMode
CSAT Score4.2455Median

Explanation:

5. Duplicate Rows Check

Duplicate Rows Removed: 0

The dataset contained no duplicate entries.

6. Quartile & Outlier Explanation (Numeric Columns)

📊 Column: Item_price

Q1 (middle of lower half) = 979.00
Q2 (median / middle of dataset) = 979.00
Q3 (middle of upper half) = 979.00
IQR (Q3 - Q1) = 0.00
Lower Bound = 979.00
Upper Bound = 979.00
Note: Values between 979.00 and 979.00 are considered normal; only values < 979.00 or > 979.00 are outliers.
Outliers Found: 8600 below, 8599 above

📊 Column: connected_handling_time

Q1 (middle of lower half) = 427.00
Q2 (median / middle of dataset) = 427.00
Q3 (middle of upper half) = 427.00
IQR (Q3 - Q1) = 0.00
Lower Bound = 427.00
Upper Bound = 427.00
Note: Values between 427.00 and 427.00 are considered normal; only values < 427.00 or > 427.00 are outliers.
Outliers Found: 120 below, 120 above

📊 Column: CSAT Score

Q1 (middle of lower half) = 4.00
Q2 (median / middle of dataset) = 5.00
Q3 (middle of upper half) = 5.00
IQR (Q3 - Q1) = 1.00
Lower Bound = 2.50
Upper Bound = 6.50
Note: Values between 2.50 and 6.50 are considered normal; only values < 2.50 or > 6.50 are outliers.
Outliers Found: 12513 below, 0 above

7. Outlier Detection & Quartile Analysis

Numeric Columns Analyzed: Item_price, connected_handling_time, CSAT Score

ColumnQ1MedianQ3IQRLower BoundUpper BoundOutliers BelowOutliers Above
Item_price97997997909799798,6008,599
connected_handling_time4274274270427427120120
CSAT Score45512.56.512,5130

Observation:

8. Before Cleaning (First 20 Rows)

ID Call Type Category Subcategory Remarks Case ID Timestamp Start Time End Time Date City Department Amount Balance Agent 1 Agent 2 Agent 3 Training Shift Rating
7e9ae164-6a8b-4521-a2d4-58f7c9fff13fOutcallProduct QueriesLife InsuranceNaNc27c9bb4-fa36-4140-9f1f-21009254ffdbNaN01/08/2023 11:1301/08/2023 11:4701-Aug-23NaNNaNNaNNaNRichard BuchananMason GuptaJennifer NguyenOn Job TrainingMorning5
b07ec1b0-f376-43b6-86df-ec03da3b2e16OutcallProduct QueriesProduct Specific InformationNaNd406b0c7-ce17-4654-b9de-f08d421254bdNaN01/08/2023 12:5201/08/2023 12:5401-Aug-23NaNNaNNaNNaNVicki CollinsDylan KimMichael Lee>90Morning5
200814dd-27c7-4149-ba2b-bd3af3092880InboundOrder RelatedInstallation/demoNaNc273368d-b961-44cb-beaf-62d6fd6c00d5NaN01/08/2023 20:1601/08/2023 20:3801-Aug-23NaNNaNNaNNaNDuane NormanJackson ParkWilliam KimOn Job TrainingEvening5
eb0d3e53-c1ca-42d3-8486-e42c8d622135InboundReturnsReverse Pickup EnquiryNaN5aed0059-55a4-4ec6-bb54-97942092020aNaN01/08/2023 20:5601/08/2023 21:1601-Aug-23NaNNaNNaNNaNPatrick FloresOlivia WangJohn Smith>90Evening5
ba903143-1e54-406c-b969-46c52f92e5dfInboundCancellationNot NeededNaNe8bed5a9-6933-4aff-9dc6-ccefd7dcde59NaN01/08/2023 10:3001/08/2023 10:3201-Aug-23NaNNaNNaNNaNChristopher SanchezAustin JohnsonMichael Lee0-30Morning5
1cfde5b9-6112-44fc-8f3b-892196137a62EmailReturnsFraudulent UserNaNa2938961-2833-45f1-83d6-678d9555c603NaN01/08/2023 15:1301/08/2023 18:3901-Aug-23NaNNaNNaNNaNDesiree NewtonEmma ParkJohn Smith0-30Morning5
11a3ffd8-1d6b-4806-b198-c60b5934c9bcOutcallProduct QueriesProduct Specific InformationNaNbfcb562b-9a2f-4cca-aa79-fd4e2952f901NaN01/08/2023 15:3101/08/2023 23:5201-Aug-23NaNNaNNaNNaNShannon HicksAiden PatelOlivia Tan>90Morning5
372b51a5-fa19-4a31-a4b8-a21de117d75eInboundReturnsExchange / ReplacementVery good88537e0b-5ffa-43f9-bbe2-fe57a0f4e4aeNaN01/08/2023 16:1701/08/2023 16:2301-Aug-23NaNNaNNaNNaNLaura SmithEvelyn KimuraJennifer NguyenOn Job TrainingEvening5
6e4413db-4e16-42fc-ac92-2f402e3df03cInboundReturnsMissingShopzilla app and it's all coustomer care serv...e6be9713-13c3-493c-8a91-2137cbbfa7e6NaN01/08/2023 21:0301/08/2023 21:0701-Aug-23NaNNaNNaNNaNDavid SmithNathan PatelJohn Smith>90Split5
b0a65350-64a5-4603-8b9a-a24a4a145d08InboundShopzilla RelatedGeneral EnquiryNaNc7caa804-2525-499e-b202-4c781cb68974NaN01/08/2023 23:3101/08/2023 23:3601-Aug-23NaNNaNNaNNaNTabitha AyalaAmelia TanakaMichael Lee31-60Evening5
74a2cbbf-497e-4177-b30b-483f99effb13OutcallReturnsReturn requestNaN5862d1a1-2081-4e28-ad0a-cec43380ac7dNaN02/08/2023 18:1402/08/2023 18:1602-Aug-23NaNNaNNaNNaNCarla MorganNathan PatelEmily Chen0-30Evening4
4c28acf4-2ea4-4be8-b8f1-113e676fc8b7InboundOrder RelatedDelayedVery bad55bbace3-eb2e-4f67-aacb-4f8ee8512f7519/07/2023 23:5002/08/2023 10:4402/08/2023 11:1402-Aug-23NAGPURLifeStyle434.0NaNStanley HoganHarper WongEmily Chen>90Split1
e7ec72bb-65e9-4464-9c2b-8ce6b0e82a28InboundShopzilla RelatedGeneral EnquiryNaNc7d5eef8-d239-474c-8d24-542db2923b3cNaN01/08/2023 10:0901/08/2023 10:1201-Aug-23NaNNaNNaNNaNTimothy ScottZoe YamamotoWilliam KimOn Job TrainingMorning4
68c596f2-4130-42f9-a140-b336261b1629InboundShopzilla RelatedGeneral EnquiryNaNc1e10375-7426-4b2a-aca6-17dc6bc9c7a3NaN01/08/2023 10:1501/08/2023 11:2101-Aug-23NaNNaNNaNNaNShannon HicksAiden PatelOlivia Tan>90Morning4
6b966653-0900-4d58-8b38-c17cfbdaf000InboundReturnsReverse Pickup EnquiryNaN9e0d7d55-d012-42f7-8e21-634942419386NaN02/08/2023 11:2602/08/2023 11:4402-Aug-23NaNNaNNaNNaNMark WilsonScarlett ChenJohn Smith0-30Morning5
130dfefe-1d77-47f9-b759-546f24b6b909InboundReturnsMissingNaN8153c646-4940-406f-992c-952ed08e9af8NaN02/08/2023 19:5602/08/2023 20:0602-Aug-23NaNNaNNaNNaNMark BlackSophia SatoJohn Smith>90Afternoon5
4cc4df74-fcd3-413a-920d-ca3ad2dd0904InboundReturnsService Centres RelatedNaN4245d6df-49c6-491a-8724-125c4df82e7506/05/2023 21:4101/08/2023 09:0101/08/2023 09:0301-Aug-23RANCHIElectronics1299.0NaNAmy MendezSophia SatoJohn Smith0-30Morning5
c54459b3-ffb3-4ffa-a338-6fd179beb6b1InboundOrder RelatedInstallation/demoSomething0cdaa1f1-02f2-40ec-9134-7cd0f92a6b3eNaN01/08/2023 10:0001/08/2023 10:0401-Aug-23NaNNaNNaNNaNJennifer MayZoe YamamotoWilliam KimOn Job TrainingMorning3
deccfb1d-05d4-40b8-8371-15282c91bacbInboundCancellationNot NeededNaN3f16b782-27cf-4831-b542-cd8bd60c0385NaN01/08/2023 21:0501/08/2023 21:0701-Aug-23NaNNaNNaNNaNRyan ThompsonOlivia WangEmily Chen31-60Evening5
6e51734e-9627-4a12-8062-877c3638627dInboundOrder RelatedInstallation/demoAll good8a1ae638-4d8c-4a5e-bf1b-eb74f881fc3d18/07/2023 12:2002/08/2023 20:0302/08/2023 20:0502-Aug-23NAGPURElectronics15990.0NaNDavid ButlerOlivia WangEmily Chen31-60Evening5

9. Dataset After Cleaning

Shape After Cleaning

Missing Values Before Cleaning

ColumnMissing Values
All Columns0

10. After Cleaning (First 20 Rows)

ID Call Type Category Subcategory Remarks Case ID Timestamp Start Time End Time Date City Department Amount Balance Agent 1 Agent 2 Agent 3 Training Shift Rating
7e9ae164-6a8b-4521-a2d4-58f7c9fff13f Outcall Product Queries Life Insurance Good c27c9bb4-fa36-4140-9f1f-21009254ffdb 09/08/2023 11:55 01/08/2023 11:13 01/08/2023 11:47 01-Aug-23 HYDERABAD Electronics 979.00 427.00 Richard Buchanan Mason Gupta Jennifer Nguyen On Job Training Morning 5.00
b07ec1b0-f376-43b6-86df-ec03da3b2e16 Outcall Product Queries Product Specific Information Good d406b0c7-ce17-4654-b9de-f08d421254bd 09/08/2023 11:55 01/08/2023 12:52 01/08/2023 12:54 01-Aug-23 HYDERABAD Electronics 979.00 427.00 Vicki Collins Dylan Kim Michael Lee >90 Morning 5.00
200814dd-27c7-4149-ba2b-bd3af3092880 Inbound Order Related Installation/demo Good c273368d-b961-44cb-beaf-62d6fd6c00d5 09/08/2023 11:55 01/08/2023 20:16 01/08/2023 20:38 01-Aug-23 HYDERABAD Electronics 979.00 427.00 Duane Norman Jackson Park William Kim On Job Training Evening 5.00
eb0d3e53-c1ca-42d3-8486-e42c8d622135 Inbound Returns Reverse Pickup Enquiry Good 5aed0059-55a4-4ec6-bb54-97942092020a 09/08/2023 11:55 01/08/2023 20:56 01/08/2023 21:16 01-Aug-23 HYDERABAD Electronics 979.00 427.00 Patrick Flores Olivia Wang John Smith >90 Evening 5.00
ba903143-1e54-406c-b969-46c52f92e5df Inbound Cancellation Not Needed Good e8bed5a9-6933-4aff-9dc6-ccefd7dcde59 09/08/2023 11:55 01/08/2023 10:30 01/08/2023 10:32 01-Aug-23 HYDERABAD Electronics 979.00 427.00 Christopher Sanchez Austin Johnson Michael Lee 0-30 Morning 5.00
1cfde5b9-6112-44fc-8f3b-892196137a62 Email Returns Fraudulent User Good a2938961-2833-45f1-83d6-678d9555c603 09/08/2023 11:55 01/08/2023 15:13 01/08/2023 18:39 01-Aug-23 HYDERABAD Electronics 979.00 427.00 Desiree Newton Emma Park John Smith 0-30 Morning 5.00
11a3ffd8-1d6b-4806-b198-c60b5934c9bc Outcall Product Queries Product Specific Information Good bfcb562b-9a2f-4cca-aa79-fd4e2952f901 09/08/2023 11:55 01/08/2023 15:31 01/08/2023 23:52 01-Aug-23 HYDERABAD Electronics 979.00 427.00 Shannon Hicks Aiden Patel Olivia Tan >90 Morning 5.00
372b51a5-fa19-4a31-a4b8-a21de117d75e Inbound Returns Exchange / Replacement Very good 88537e0b-5ffa-43f9-bbe2-fe57a0f4e4ae 09/08/2023 11:55 01/08/2023 16:17 01/08/2023 16:23 01-Aug-23 HYDERABAD Electronics 979.00 427.00 Laura Smith Evelyn Kimura Jennifer Nguyen On Job Training Evening 5.00
6e4413db-4e16-42fc-ac92-2f402e3df03c Inbound Returns Missing Shopzilla app and it's all coustomer care services is very good service provided all time e6be9713-13c3-493c-8a91-2137cbbfa7e6 09/08/2023 11:55 01/08/2023 21:03 01/08/2023 21:07 01-Aug-23 HYDERABAD Electronics 979.00 427.00 David Smith Nathan Patel John Smith >90 Split 5.00
b0a65350-64a5-4603-8b9a-a24a4a145d08 Inbound Shopzilla Related General Enquiry Good c7caa804-2525-499e-b202-4c781cb68974 09/08/2023 11:55 01/08/2023 23:31 01/08/2023 23:36 01-Aug-23 HYDERABAD Electronics 979.00 427.00 Tabitha Ayala Amelia Tanaka Michael Lee 31-60 Evening 5.00
74a2cbbf-497e-4177-b30b-483f99effb13 Outcall Returns Return request Good 5862d1a1-2081-4e28-ad0a-cec43380ac7d 09/08/2023 11:55 02/08/2023 18:14 02/08/2023 18:16 02-Aug-23 HYDERABAD Electronics 979.00 427.00 Carla Morgan Nathan Patel Emily Chen 0-30 Evening 4.00
4c28acf4-2ea4-4be8-b8f1-113e676fc8b7 Inbound Order Related Delayed Very bad 55bbace3-eb2e-4f67-aacb-4f8ee8512f75 09/08/2023 11:55 19/07/2023 23:50 02/08/2023 10:44 02/08/2023 11:14 NAGPUR LifeStyle 979.00 427.00 Stanley Hogan Harper Wong Emily Chen >90 Split 2.50
e7ec72bb-65e9-4464-9c2b-8ce6b0e82a28 Inbound Shopzilla Related General Enquiry Good c7d5eef8-d239-474c-8d24-542db2923b3c 09/08/2023 11:55 01/08/2023 10:09 01/08/2023 10:12 01-Aug-23 HYDERABAD Electronics 979.00 427.00 Timothy Scott Zoe Yamamoto William Kim On Job Training Morning 4.00
68c596f2-4130-42f9-a140-b336261b1629 Inbound Shopzilla Related General Enquiry Good c1e10375-7426-4b2a-aca6-17dc6bc9c7a3 09/08/2023 11:55 01/08/2023 10:15 01/08/2023 11:21 01-Aug-23 HYDERABAD Electronics 979.00 427.00 Shannon Hicks Aiden Patel Olivia Tan >90 Morning 4.00
6b966653-0900-4d58-8b38-c17cfbdaf000 Inbound Returns Reverse Pickup Enquiry Good 9e0d7d55-d012-42f7-8e21-634942419386 09/08/2023 11:55 02/08/2023 11:26 02/08/2023 11:44 02-Aug-23 HYDERABAD Electronics 979.00 427.00 Mark Wilson Scarlett Chen John Smith 0-30 Morning 5.00
130dfefe-1d77-47f9-b759-546f24b6b909 Inbound Returns Missing Good 8153c646-4940-406f-992c-952ed08e9af8 09/08/2023 11:55 02/08/2023 19:56 02/08/2023 20:06 02-Aug-23 HYDERABAD Electronics 979.00 427.00 Mark Black Sophia Sato John Smith >90 Afternoon 5.00
4cc4df74-fcd3-413a-920d-ca3ad2dd0904 Inbound Returns Service Centres Related Good 4245d6df-49c6-491a-8724-125c4df82e75 09/08/2023 11:55 06/05/2023 21:41 01/08/2023 09:01 01/08/2023 09:03 RANCHI Electronics 979.00 427.00 Amy Mendez Sophia Sato John Smith 0-30 Morning 5.00
c54459b3-ffb3-4ffa-a338-6fd179beb6b1 Inbound Order Related Installation/demo Something 0cdaa1f1-02f2-40ec-9134-7cd0f92a6b3e 09/08/2023 11:55 01/08/2023 10:00 01/08/2023 10:04 01-Aug-23 HYDERABAD Electronics 979.00 427.00 Jennifer May Zoe Yamamoto William Kim On Job Training Morning 3..00
deccfb1d-05d4-40b8-8371-15282c91bacb Inbound Cancellation Not Needed Good 3f16b782-27cf-4831-b542-cd8bd60c0385 09/08/2023 11:55 01/08/2023 21:05 01/08/2023 21:07 01-Aug-23 HYDERABAD Electronics 979.00 427.00 Ryan Thompson Olivia Wang Emily Chen 31-60 Evening 5.00
6e51734e-9627-4a12-8062-877c3638627d Inbound Order Related Installation/demo All good 8a1ae638-4d8c-4a5e-bf1b-eb74f881fc3d 09/08/2023 11:55 18/07/2023 12:20 02/08/2023 20:03 02/08/2023 20:05 NAGPUR Electronics 979.00 427.00 David Butler Olivia Wang Emily Chen 31-60 Evening 5.00

11. Summary of Cleaning Process

  1. Dataset Loaded: 85,907 rows × 20 columns.
  2. Missing Values: Identified and imputed with Mode (categorical) or Median (numeric).
  3. Duplicate Rows: Checked and none removed.
  4. Outlier Analysis: Conducted using IQR method; numeric outliers highlighted.
  5. Dataset Ready: Cleaned dataset with no missing values, ready for EDA, visualization, or machine learning.