A complete machine learning pipeline with data generation, transformation, model training, and inference API.
.
├── config.yaml # Pipeline configuration
├── requirements.txt # Python dependencies
├── src/
│ ├── data_ingestion.py # Data loading
│ ├── data_transformation.py # Feature engineering
│ ├── model_training.py # Model training & tracking
│ ├── inference_api.py # FastAPI endpoint
│ ├── pipeline.py # Main orchestrator
│ └── generate_sample_data.py # Sample data generator
├── docs/
│ └── pipeline_development.md # Development journey and issues
└── tests/ # Test files
sudo apt update
sudo apt install python3-full python3-dev build-essential python3-venv
# Create virtual environment
python3 -m venv venv --clear
source venv/bin/activate
# Install dependencies
pip install --break-system-packages --no-cache-dir -r requirements.txt
- Generate sample data:
python3 src/generate_sample_data.py
- Run the pipeline:
python3 src/pipeline.py
- Start the inference API:
uvicorn src.inference_api:app --host 0.0.0.0 --port 8000
- Test the API:
python3 src/test_api.py
- Create a new branch for features/fixes
- Make changes and test locally
- Commit changes with descriptive messages
- Create pull request for review
- Merge after approval
See docs/pipeline_development.md for detailed documentation of issues encountered and their solutions.
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.