This project is a machine learning-based email spam detection system that classifies emails as spam or ham (not spam). It leverages Natural Language Processing (NLP) techniques and various machine learning algorithms to improve email security by filtering unwanted messages with high accuracy.
- Data preprocessing and cleaning using NLP techniques.
- Feature extraction using TF-IDF and CountVectorizer.
- Implementation of multiple machine learning models (Naïve Bayes, SVM, Random Forest, etc.).
- Model evaluation using accuracy, precision, recall, and F1-score.
- Visualization of results using Matplotlib and Seaborn.
The dataset used in this project is sourced from public spam datasets. It contains labeled email data (spam and ham) for training and testing purposes.
- Programming Language: Python
- Libraries: Pandas, NumPy, Scikit-learn, NLTK, Matplotlib, Seaborn, WordCloud
- Machine Learning Models: Naïve Bayes, SVM, Random Forest, Logistic Regression
You can also run the Jupyter Notebook using VS Code by adding the official Extension of Jupyter Notebook.
- Open the Jupyter Notebook:
- Load and preprocess the dataset.
- Train different machine learning models and compare their performance.
- Evaluate the model using classification metrics.
- Visualize results with Matplotlib and Seaborn.
The implemented machine learning models achieve high accuracy in classifying emails as spam or ham. The best-performing model can be selected based on evaluation metrics.
Contributions are welcome! If you find any issues or want to enhance the project, feel free to submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.