Skip to content

DirectAI/simple-data-free-model-server

Repository files navigation

simple-data-free-model-server

This is an open source implementation of DirectAI's core service, provided both to give back to the open source community that we benefited greatly from, as well as to allow our clients to continue to have service in the event that we can no longer host our API.

We host zero-shot image models to allow clients to use computer vision at scale without having to collect / label lots of training data or train their own model. However, zero-shot models don't necessarily work out of the box for all cases. We introduce an algorithm for providing feedback to zero-shot models by extending the standard linear decision boundary in the model's embedding space into a two-stage nearest neighbors algorithm, which allows for much more fine-tuned control over what the model considers to belong in a particular class with minimal impact on runtime.

A hosted version of the Gradio frontend is available at sandbox.oss.directai.io, and a hosted version of the open source API is available at api.oss.directai.io, with auto-generated docs available at api.oss.directai.io/docs. WE MAKE NO GUARANTEES ABOUT UPTIME / AVAILABILITY OF THE HOSTED OPEN SOURCE IMPLEMENTATION. For a high uptime implementation, see our commercial offering at api.alpha.directai.io/docs.

Launching Production Service

  • Set your logging level preference in directai_fastapi/.env. See options on python's logging documentation. An empty string input defaults to logging.INFO.
  • docker compose build && docker compose up

Launching Integration Tests

  • docker compose -f testing-docker-compose.yml build && docker compose -f testing-docker-compose.yml up

Hardware Requirements

This repository is designed to require access to an Nvidia GPU with Ampere architecture. The Ampere architecture is used by the flash attention integration in the object detector. However, it could be modified to run on older Nvidia GPUs or on CPU. Feel free to submit a pull request or raise an issue if you need that support!

Running Offline Batch Classification

We've built infrastructure to make it easy to quickly run an arbitrary classifier against a dataset. If your images are organized like so:

/dataset_directory
│
├── image1.jpg
├── image2.jpg
├── image3.jpg
├── ...
└── imageN.jpg

and you have a JSON file defining the image classifier you'd like to run at classifier_config.json, you can dump classification labels to an output.csv via:

  • docker-compose build && docker-compose run local_fastapi python classify_directory.py --root=dataset_directory --classifier_json_file=classifier_config.json --output_file=output.csv

Make sure that all the files are mounted within the Docker container. You can do that by either modifying the volumes specified in docker-compose.yml, or by placing them all within the .cache directory which is mounted by default.

If your images have labels and are organized like so:

/dataset_directory
│
├── /label1
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ...
│
├── /label2
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ...
│
└── /labelN
    ├── image1.jpg
    ├── image2.jpg
    └── ...

You can run an evaluation against the labels by running the command

  • docker compose build && docker compose run local_fastapi python classify_directory.py --root=dataset_directory --classifier_json_file=classifier_config.json --eval_only=True

If you want to run classifications on a custom dataset, you can either use our API or build a custom Ray Dataset and use the utilities defined in batch_processing.py.

A Quick Start for Self-Hosting on AWS

To launch a self-hosted version of this service in AWS, we'll spin up a fresh EC2 instance. Choose "Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.4 (Ubuntu 22.04)" as your AMI and g5.xlarge as your instance size. After that you should be able to just run:

git clone https://github.com/DirectAI/simple-data-free-model-server
cd simple-data-free-model-server
docker compose build && docker compose up

Method TLDR

This repository presents the idea of a semantic nearest neighbors for building custom decision boundaries with late-fusion zero-shot models. We use CLIP and OWL-ViT-ST as our base late-stage zero-shot image classifier and object detector, respectively. See the code for implementation details.

Image classifier

The standard approach for a late-stage zero-shot image classifier is to take a set of n labels, embed them via the associated language model, and then append those embeddings to generate a linear classification layer on top of the image embedding from the associated image model. This head can also be interpreted as a nearest neighbors layer, as the predicted class is just the class with text embedding most similar to the image embedding.

Let $d(t_i, v) > 0$ be the relevancy function between text embedding $i$ and the image embedding $v$. Then our class prediction for $v$ (denoted $f(v)$ ) via the traditional approach is:

$f(v) = \text{argmax}_i d(t_i, v)$

We define a meta class which contains both positive and negative examples. A score is computed for the meta class based on a goodness-of-fit estimate for the image given the positive and negative examples in the class. Then, the meta class with the highest score is predicted to be the true class.

In the simplest case of a single positive example per meta class, this is the same as the traditional zero-shot approach. To extend the paradigm to the case where there are multiple positive examples per meta class, we can run the traditional zero-shot approach over the set of all positive examples and then let the predicted class be the meta class which includes the highest-scoring positive example. This can be viewed as an n-way nearest neighbors problem where the samples are semantically-relevant text embeddings, and the correct class is the one with the most semantically-relevant example.

Let $m(i)$ be the meta class associated with sample $i$. Then we have:

$f(v) = m(\text{argmax}_i d(t_i, v))$

We can reinterpret the above as a two-stage process. Instead of running an n-way nearest neighbors problem, we can take for each meta class the max of the relevancy scores of the provided examples, and then take the argmax over this meta class level score.

Let $S_j$ refer to the samples associated with the $j$ th meta class. Then:

$f(v) = \text{argmax}_j \text{max}_{t_i \in S_j} d(t_i, v)$

To extend this to allow for negative examples, we can replace the first stage max with a two-class nearest neighbors boundary between the positive and negative examples. Then, we can let our meta class score be the score of the most relevant example if that example is positive, and the negative score if that example is negative. In other words, a meta class that has more relevant positive examples will result in a higher score, and a meta class with more relevant negative examples will result in a lower score.

Let $P_j$ and $N_j$ refer to the positive and negative examples for meta class $j$, respectively. In other words, if $t_i \in P_j$, then the $i$ th text example is a positive example for meta class $j$. Then, let $t_j$ refer to the example in $P_j \cup N_j$ that is most relevant, and let $\hat{d}(j, v)$ be the result of the two-class nearest neighbor problem run for each meta class $j$. We have:

$t_j = \text{argmax}_{t_i \in P_j \cup N_j} d(t_i, v)$

$\hat{d}(j, v) = d(t_j, v)$ if $t_j \in P_j$ else $-d(t_j, v)$

$f(v) = \text{argmax}_j \hat{d}(j, v)$

Note that if there are no negative examples, this is the same as the previous case, and if there are no negative examples and exactly one positive example per meta class this is equivalent to the traditional method. Also note that if the most relevant example for all of the meta classes is a negative example, then this function attempts to choose the 'least irrelevant' prediction.

This is our final prediction function, a two-layered nearest neighbors problem that incorporates positive and negative semantic evidence into its decision boundary. This can be run efficiently with an optimized scatter max function.

Object Detector

To extend late-fusion zero-shot object detectors to be able to incorporate positive and negative examples, we first compute for each meta class and each proposed bounding box the $\hat{d}(j, v)$ as defined above. However, this is not sufficient to incorporate negative samples into the prediction, as an object may have overlapping bounding boxes, some of which score highly on the negative examples for a class while others score highly on the positive examples. To address this, we extend $\hat{d}$ to include NMS between neighboring boxes. As we'll be running NMS many times over the same set of boxes with this approach, we pre-compute the IoU graph and reuse the same graph on each NMS instance. We then run a final NMS between meta classes with the per-class boxes that survived being suppressed by the NMS with the negative examples, and then threshold on detection confidence as usual.

In other words, we first run a two-way object detection problem for each meta class between its positive and negative examples, and then run an $n$-way object detection problem between the survivors from the previous problem for each meta class. This can be done efficiently by using an optimized scatter max function and by caching the IoU graph for reuse between NMS subproblems. In the absence of any negative examples, this is the same as assigning each box's score for a meta class the max of the relevancy scores for each positive example belonging to that meta class and then running NMS. If there are no negative examples and exactly one positive example per class, this is the same as the traditional method.

On Pre-Commit Hooks

  • Make sure you run pip install pre-commit followed by pre-commit install before attempting to commit to the repo.

Acknowledgements

Special thanks to Neo for funding DirectAI! Thank you to OpenCLIP and Huggingface for providing the model implementations that we use here.

Contact

If you have any questions or comments, raise an issue or reach out to Isaac Robinson at [email protected].

Contributing

If you're interested in contributing, raise an issue or email Isaac and we'll write a contributing guide!

Citing

If you find this useful for your work, please cite this repository!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published