A simple linear regression project in Python for educational purposes. The goal is to predict the price of a car based on its mileage using linear regression, and to visualize the results and the model's confidence.
- Reads car mileage and price data from CSV datasets.
- Learns optimal parameters (
theta0
,theta1
) using gradient descent. - Normalizes input data for effective training.
- Exports learned parameters to a CSV file.
- Predicts car prices for any given mileage.
- Calculates R² (coefficient of determination) to indicate model confidence.
- Visualizes the regression line, cost function evolution, and confidence score.
ft_linear_regression/
│
├── datasets/
│ ├── data.csv # Default dataset
│ ├── big_data.csv
│ ├── negative_data.csv
│ ├── nonlinear_data.csv
│ ├── perfectpositive_data.csv
│ ├── small_data.csv
│ └── variance_data.csv
│
├── model.py # Main file: trains, saves, and visualizes the model
├── estimate.py # Script to estimate a price given a mileage
├── confidence.py # Calculates R² confidence score
└── thetas.csv # Saved learned parameters after running model.py
Simply run:
python model.py
- Trains the linear regression model on
datasets/data.csv
. - Saves the learned parameters (
thetas.csv
). - Displays a plot with the data, regression line, confidence score (R²), and cost vs epochs.
After training, estimate a car price by mileage:
python estimate.py
- Enter the mileage when prompted.
- Outputs the predicted price.
The R² (confidence) score is printed on the regression plot, indicating how well the model fits the data.
You can swap out the dataset by changing the data_path
variable in model.py
or by providing your own CSV file in the datasets/
folder. The CSV must have columns: km,price
.
- Python 3.x
- pandas
- numpy
- matplotlib
Install dependencies with:
pip install pandas numpy matplotlib
- The model uses simple linear regression (1 feature: km).
- Data is normalized for stable and faster learning.