You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/deployments/apis.md
+8-4
Original file line number
Diff line number
Diff line change
@@ -10,9 +10,9 @@ Serve models at scale.
10
10
model: <string> # path to an exported model (e.g. s3://my-bucket/exported_model)
11
11
model_format: <string> # model format, must be "tensorflow" or "onnx" (default: "onnx" if model path ends with .onnx, "tensorflow" if model path ends with .zip or is a directory)
12
12
request_handler: <string> # path to the request handler implementation file, relative to the cortex root
13
-
tf_signature_key: <string> # name of the signature def to use for prediction (required if your model has more than one signature def)
13
+
tf_signature_key: <string> # name of the signature def to use for prediction (required if your model has more than one signature def)
14
14
tracker:
15
-
key: <string> #json key to track if the response payload is a dictionary
15
+
key: <string> # key to track (required if the response payload is a JSON object)
16
16
model_type: <string> # model type, must be "classification" or "regression"
17
17
compute:
18
18
min_replicas: <int> # minimum number of replicas (default: 1)
@@ -43,6 +43,10 @@ Request handlers are used to decouple the interface of an API endpoint from its
43
43
44
44
See [request handlers](request-handlers.md) for a detailed guide.
45
45
46
+
## Prediction Monitoring
47
+
48
+
`tracker` can be configured to collect API prediction metrics and display real-time stats in `cortex get <api_name>`. The tracker looks for scalar values in the response payload (after the execution of the `post_inference` request handler, if provided). If the response payload is a JSON object, `key` can be set to extract the desired scalar value. For regression models, the tracker should be configured with `model_type: regression` to collect float values and display regression stats such as min, max and average. For classification models, the tracker should be configured with `model_type: classification` to collect integer or string values and display the class distribution.
49
+
46
50
## Debugging
47
51
48
52
You can log more information about each request by adding a `?debug=true` parameter to your requests. This will print:
@@ -52,10 +56,10 @@ You can log more information about each request by adding a `?debug=true` parame
52
56
3. The value after running inference
53
57
4. The value after running the `post_inference` function (if applicable)
54
58
55
-
## Autoscaling replicas
59
+
## Autoscaling Replicas
56
60
57
61
Cortex adjusts the number of replicas that are serving predictions by monitoring the compute resource usage of each API. The number of replicas will be at least `min_replicas` and no more than `max_replicas`.
58
62
59
-
## Autoscaling nodes
63
+
## Autoscaling Nodes
60
64
61
65
Cortex spins up and down nodes based on the aggregate resource requests of all APIs. The number of nodes will be at least `$CORTEX_NODES_MIN` and no more than `$CORTEX_NODES_MAX` (configured during installation and modifiable via the [AWS console](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-manual-scaling.html)).
0 commit comments