Project seeks to build upon the data and model from the Kaggle Store Item Demand Forecasting Challenge.
- Kaggle Account
- Python 3.12+
- Setup kaggle CLI
pip install kaggle
- Create API token in
https://www.kaggle.com/settings
Optional
- Tailscale
To keep things tidy the project structure is split by funtion:
root
input
(input data)model
(compiled models)output
(model output data)tests
(any unittest)src
(source code for model deployment)analysis
(any interactive jupyter notebook or other exploratory work)infra
(any project specific deployment code)docs
(extra project documentation)
kaggle competitions download -c demand-forecasting-kernels-only`
unzip demand-forecasting-kernels-only.zip -d inputs
- Fixed Facebook Prohet import in jupyter notebook
- Created multistage docker container to minimize container footprint. This helps scale up deployments faster in a container ochestrators and minimizes vulnerabilities.
- Wrapped latest FastAPI version around model and created predict endpoints, and status endpoint.
- Add isort to organize imports and ran on the src and analysis directory.
Multi-stage Build
Used a multistage build with uv
pacakge manager. The first stage isntalls all the python packages and system libraries and the runtime stage copies them from the build stage.
Source Transfer Notes There is a standard convetion that is assume for model and application files in the repo, assuming all futures models have the same structure we can deploy them with the same Dockerfile. We can make it more generic by using ARGs to set the model file name and paths if mantianing the same repo structure across projects is not possible.
k8s is a common container orchestrator with a healthy ecosystem.
For repeatability and demo purposes I used minikube with Tailscale ingress to deploy the model and makei t accesible to the public internet. This can also be restirtied to a intranet as well.
The infra
folder contains a makefile that creates a minikube cluster and sets up the tailscale operator.
The operator can deploy ingress to your tailnet, and there is a flag to make tag them with as a funnel funnel
for public access. This isn't a production grade production endpoint, but it is delivered over TLS.
You will need a Tailscale OAUTH Client ID and Client Secret from here, https://login.tailscale.com/admin/settings/oauth with device
, and auth key
write permissions.
Install minikube and create cluster for this project
make minikube
Install the Tailscale operator
TS_CLIENT_ID=your_client_id TS_CLIENT_SECRET=your_client_secret make install-tailscale-operator
Used terraform for IaC (Infrastructure as Code) to deploy the model inside the local minikube cluster. Since this is a public repo, the image doens't need authentication to pull, but I added terraform to show how pull secrets can be used for private registries.
Deploy API
Create a terraform.tfvars file with the follow populated if pulling from a private GitHub repository.
github_username = "xxxxxx"
github_token = "xxxxxxxxx"
make apply-terraform
Demonstration public internet facing API is runing on my local device on a minkikube cluster. It the links return a 404 it maybe need to be restarted.
https://sales-forecaster.tigris-vibes.ts.net/
API Docs
https://sales-forecaster.tigris-vibes.ts.net/docs
API Predict
https://sales-forecaster.tigris-vibes.ts.net/predict
API Status
https://sales-forecaster.tigris-vibes.ts.net/status
Input validation: Item ID and Store ID validation based on what was in the training data values, and will return a errors message if the Store ID or Itme ID has never been seen before.
Exampe:
Input
{
"date": "2017-01-01",
"store": 1000,
"item": 1000
}
Response
{
"detail": [
{
"type": "value_error",
"loc": [
"body",
"store"
],
"msg": "Value error, Invalid store ID: 1000",
"input": 1000,
"ctx": {
"error": {}
}
},
{
"type": "value_error",
"loc": [
"body",
"item"
],
"msg": "Value error, Invalid item ID: 1000",
"input": 1000,
"ctx": {
"error": {}
}
}
]
}
Restart the minikube cluster if it's not running and stopped container hasn't been removed.
minikube start -p forecast-model-cluster
You can use k0s to setup a small single stack cluster for home use. Tailscale operator will still need to be installed and setup. However this can run on a headless machine or cloud VM and provide a more robust platform to allow others access either privately through the tailnet of publicly via the funnel. However, funnl doesn't provide a robust front door and it's highly suggested to move to some like a cloudflare tunnel or equivalent to provide front door security and firewall protections.
Commits use https://www.conventionalcommits.org/en/v1.0.0/