E-Commerce Product Title Classification
PythonScikit-learnNLPTF-IDFSVCLogistic RegressionNaive Bayes
This project explores scalable machine learning methods for classifying e-commerce product titles into 248 categories using a dataset of over 1.4 million entries. I implemented a preprocessing pipeline with TF-IDF vectorization, stopword removal, normalization, and feature selection to the top 10,000 tokens. I evaluated multiple classifiers, including Linear SVC, Logistic Regression, and Multinomial Naive Bayes, with class-weighted adjustments to handle imbalance. The study showed that Linear SVC consistently performed best on large-scale product categorization tasks, demonstrating the viability of ML for text classification in e-commerce systems.
Case study not yet available.