Add prediction metrics tracking documentation (#472)

vishalbollu · ospillinger · commit 1c0f655d6fe4 · 2019-09-17T16:18:11.000-07:00
diff --git a/docs/deployments/apis.md b/docs/deployments/apis.md
@@ -10,9 +10,9 @@ Serve models at scale.
   model: <string>  # path to an exported model (e.g. s3://my-bucket/exported_model)
   model_format: <string>  # model format, must be "tensorflow" or "onnx" (default: "onnx" if model path ends with .onnx, "tensorflow" if model path ends with .zip or is a directory)
   request_handler: <string>  # path to the request handler implementation file, relative to the cortex root
-  tf_signature_key: <string> # name of the signature def to use for prediction (required if your model has more than one signature def)
+  tf_signature_key: <string>  # name of the signature def to use for prediction (required if your model has more than one signature def)
   tracker:
-    key: <string>  # json key to track if the response payload is a dictionary
+    key: <string>  # key to track (required if the response payload is a JSON object)
     model_type: <string>  # model type, must be "classification" or "regression"
   compute:
     min_replicas: <int>  # minimum number of replicas (default: 1)
@@ -43,6 +43,10 @@ Request handlers are used to decouple the interface of an API endpoint from its
 
 See [request handlers](request-handlers.md) for a detailed guide.
 
+## Prediction Monitoring
+
+`tracker` can be configured to collect API prediction metrics and display real-time stats in `cortex get <api_name>`. The tracker looks for scalar values in the response payload (after the execution of the `post_inference` request handler, if provided). If the response payload is a JSON object, `key` can be set to extract the desired scalar value. For regression models, the tracker should be configured with `model_type: regression` to collect float values and display regression stats such as min, max and average. For classification models, the tracker should be configured with `model_type: classification` to collect integer or string values and display the class distribution.
+
 ## Debugging
 
 You can log more information about each request by adding a `?debug=true` parameter to your requests. This will print:
@@ -52,10 +56,10 @@ You can log more information about each request by adding a `?debug=true` parame
 3. The value after running inference
 4. The value after running the `post_inference` function (if applicable)
 
-## Autoscaling replicas
+## Autoscaling Replicas
 
 Cortex adjusts the number of replicas that are serving predictions by monitoring the compute resource usage of each API. The number of replicas will be at least `min_replicas` and no more than `max_replicas`.
 
-## Autoscaling nodes
+## Autoscaling Nodes
 
 Cortex spins up and down nodes based on the aggregate resource requests of all APIs. The number of nodes will be at least `$CORTEX_NODES_MIN` and no more than `$CORTEX_NODES_MAX` (configured during installation and modifiable via the [AWS console](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-manual-scaling.html)).
diff --git a/examples/iris-classifier/cortex.yaml b/examples/iris-classifier/cortex.yaml
@@ -5,23 +5,33 @@
   name: tensorflow
   model: s3://cortex-examples/iris/tensorflow
   request_handler: handlers/tensorflow.py
+  tracker:
+    model_type: classification
 
 - kind: api
   name: pytorch
   model: s3://cortex-examples/iris/pytorch.onnx
   request_handler: handlers/pytorch.py
+  tracker:
+    model_type: classification
 
 - kind: api
   name: keras
   model: s3://cortex-examples/iris/keras.onnx
   request_handler: handlers/keras.py
+  tracker:
+    model_type: classification
 
 - kind: api
   name: xgboost
   model: s3://cortex-examples/iris/xgboost.onnx
   request_handler: handlers/xgboost.py
+  tracker:
+    model_type: classification
 
 - kind: api
   name: sklearn
   model: s3://cortex-examples/iris/sklearn.onnx
   request_handler: handlers/sklearn.py
+  tracker:
+    model_type: classification