This report presents my practical learning experience and progress during the first eight weeks of the Data Science – AI (Practical, Project-Oriented Course). During this period I focused on understanding and implementing the fundamental stages of the data-science workflow: data collection, cleaning, preprocessing, visualization, and statistical analysis.
In the initial weeks I worked with real-world datasets to remove duplicates, handle missing values, and treat outliers using Pandas and NumPy. I created insightful visualizations with Matplotlib and Seaborn to interpret trends and applied statistical concepts (mean, median, mode, variance, correlation) to identify meaningful relationships.
As the course advanced I implemented supervised learning techniques — including Linear Regression, Logistic Regression, and Random Forest — to build baseline predictive models and learned model training and evaluation using metrics such as MAE, RMSE, and accuracy.
By Week 8 I explored unsupervised learning concepts like clustering and PCA (dimensionality reduction) to discover hidden patterns. These activities provided a solid foundation for the next course phase: deep learning, NLP, and model deployment.
During Week 1, I focused on the fundamentals of Data Science and set up the development environment for the course project. The goal was to prepare a reproducible workspace and choose the dataset for the semester-long project.
Perform detailed data cleaning and preprocessing: handle missing values, remove duplicates, treat outliers, and prepare a “before vs after cleaning” report along with basic feature summaries and visualizations.
During Week 2, I focused on collecting, cleaning, and preparing the dataset for analysis in the E-commerce Recommendation System project. The main goal was to ensure that the data was accurate, consistent, and ready for exploratory analysis and modeling.
DataScience-AI-Project.
The dataset is now fully collected, cleaned, and prepared for analysis. This milestone ensures that all subsequent exploratory data analysis and modeling steps can be performed efficiently and accurately.
In Week 3, I will perform detailed exploratory data analysis (EDA) and feature engineering, identify key variables for the recommendation system, and start preparing the data for building models.
During Week 3, I conducted an in-depth Exploratory Data Analysis (EDA) on the E-commerce Customer Service Satisfaction dataset to discover patterns, correlations, and insights. Visualizations and statistical summaries were created to guide feature engineering and model development.
DataScience-AI-Project for version control and collaboration.
A total of nine visualizations were produced during EDA. The five most informative visualizations are shown below; the full set is available in the project notebook on GitHub repository.
Completed EDA and identified key features and patterns influencing customer satisfaction. These insights will drive feature engineering and model selection in Week 4.
Proceed to feature engineering and selection, apply encoding and normalization where needed, split the dataset, and begin baseline model experiments.
During Week 4, I focused on feature engineering, data preprocessing, and building baseline models. The main goal was to prepare the dataset for machine learning algorithms by transforming, encoding, and normalizing features, then train initial models to evaluate performance.
Completed feature engineering and established baseline models. The dataset is now ready for further tuning and advanced modeling in Week 5.
Two key correlation heatmaps were produced during feature engineering and baseline modeling. These visualizations highlight relationships between numerical and engineered features; the full set of graphs is available in the Week 4 project notebook on GitHub repository.
Perform hyperparameter tuning on selected models, experiment with ensemble methods such as Gradient Boosting and XGBoost, analyze feature importance, and refine the feature set further. Prepare detailed visualizations of model metrics and predictions for reporting.
Week 5 focused on implementing multiple machine learning algorithms to predict customer satisfaction using the engineered dataset. The main objective was to compare different supervised learning models, analyze their performance, and understand which algorithm best fits the data distribution. Both classification accuracy and interpretability were key considerations.
Successfully trained and compared multiple machine learning algorithms on the processed dataset. Random Forest achieved the highest accuracy with balanced recall and precision, marking an important step toward model optimization in Week 6.
Two correlation graphs were generated this week to visualize relationships between key features and the target variable. These visuals helped in selecting features with strong predictive power before training machine learning models, GitHub.
Proceed with hyperparameter tuning and cross-validation to optimize the best-performing model. Evaluate additional ensemble methods and validate results on unseen data for final deployment readiness.
During Week 6, I focused on supervised classification methods. The objective was to implement baseline classifiers, handle preprocessing for classification tasks, and compare initial model performances to select a baseline model for tuning.
Baseline classification models implemented and evaluated. Chosen baseline model (e.g., Random Forest) set for detailed evaluation and tuning in Week 7.
Confusion matrices for key models and a comparison plot/table of accuracy, precision, recall, and F1 scores. Full visuals available in the Week 6 notebook on GitHub repository.
Conduct detailed model evaluation (confusion matrix, ROC/AUC, precision-recall), select the primary evaluation metric, and start hyperparameter tuning and resampling strategies.
During Week 7, the focus was on evaluating the performance of the baseline classification model developed in Week 6. The goal was to understand which metrics are most suitable for the project and to analyze model strengths and weaknesses.
Evaluated baseline model performance, decided primary evaluation metrics, and identified the top features influencing model predictions.
Two key visualizations are shown below: accuracy comparison and top 10 feature importance from Random Forest. Full visuals available in the Week 7 notebook on GitHub.
Apply Unsupervised Learning techniques (K-Means, PCA) to explore hidden structures within the dataset and extract additional insights for feature enrichment.
In Week 8, the focus was on **unsupervised learning techniques** to discover hidden patterns in the dataset. Key activities included applying clustering algorithms and Principal Component Analysis (PCA) for segmentation and visualization of data without using target labels.
Completed unsupervised analysis and identified meaningful clusters, which can now be used as additional features in supervised modeling for improved predictions.
Two key visualizations are shown below: PCA scatter plot and cluster distribution. Full results are available in the Week 8 notebook on GitHub repository.
In Week-9, the plan is to apply Neural Network (ANN) techniques based on the enriched dataset (including cluster‑derived features), compare performance with earlier models, and evaluate suitability of deep learning for your project.