Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 94 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,35 +4,25 @@
![Tensorflow](https://img.shields.io/badge/tensorflow-v2.9.0+-success.svg)
[![Contributions Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/keras-team/keras-cv/issues)

# Vision
A computer vision library dedicated for auto-driving, robotics and on device applications.

# Mission

KerasCV is a layered repository consisting of core components and modeling components.
KerasCV is a library of modular computer vision oriented Keras components.
These components include models, layers, metrics, losses, callbacks, and utility
functions.

On the core components, it is made of modular building blocks (ops, functions, layers, metrics, losses, callbacks) that standardizes APIs for computer vision concepts such as data-augmentation pipeline, bounding boxes, keypoints, point clouds, feature pyramid network, etc, so applied computer vision engineers can leverage to quickly assemble production-grade, state-of-the-art
training and inference pipelines for common tasks such as image classification, object detection and segmentation, image data augmentation, etc.

On the modeling components, it provides the most widely used models for each task such as ResNet family, MobileNet family, transformer-based models, anchor-based and anchor-free meta architectures, unet models, that are built on top of core components, highly composable and compatible with the Keras trainer (`model.fit`). It aims to provide pre-built models that are mixed-precision compatible, QAT compatible, and xla compilable during training, and generic model optimization tools for deployment on devices such as onboard GPUs, mobile, edge chips.

KerasCV provides the following values for users:
- modular mid-level APIs and composable meta architectures
- mixed-precision and xla enabled components
- highly optimized, quantization aware training (QAT) enabled models, compatible between GPUs and TPUs.
- reproducible training results and leaderboard
- useful tools for evaluation, visualization and explanation.
- source for inference conversion (TFLite, edge devices, TensorRT, etc) and optimization at model level.
KerasCV's primary goal is to provide a coherent, elegant, and pleasant API to train state of the art computer vision models.
Users should be able to train state of the art models using only `Keras`, `KerasCV`, and TensorFlow core (i.e. `tf.data`) components.

KerasCV can be understood as a horizontal extension of the Keras API: the components are new first-party
Keras objects (layers, metrics, etc) that are too specialized to be added to core Keras, but that receive
the same level of polish and backwards compatibility guarantees as the rest of the Keras API and that
are maintained by the Keras team itself.
Keras objects (layers, metrics, etc.) that are too specialized to be added to core Keras. They receive the same level of polish and backwards compatibility guarantees as the core Keras API, and they are maintained by the Keras team.

KerasCV's primary goal is to provide a coherent, elegant, and pleasant API to train state of the art computer vision models.
Users should be able to train state of the art models using only `Keras`, `KerasCV`, and TensorFlow core (i.e. `tf.data`) components.
Our APIs assist in common computer vision tasks such as data-augmentation, classification, object detection, image generation, and more.
Applied computer vision engineers can leverage KerasCV to quickly assemble production-grade, state-of-the-art training and inference pipelines for all of these common tasks.

In addition to API consistency, KerasCV components aim to be mixed-precision compatible, QAT compatible, XLA compilable, and TPU compatible.
We also aim to provide generic model optimization tools for deployment on devices such as onboard GPUs, mobile, and edge chips.

Different from Keras IO, this product focus on meta architectures and training scripts to help users reproduce result from open datasets.

To learn more about the future project direction, please check the [roadmap](.github/ROADMAP.md).

Expand All @@ -42,6 +32,61 @@ To learn more about the future project direction, please check the [roadmap](.gi
- [Roadmap](.github/ROADMAP.md)
- [API Design Guidelines](.github/API_DESIGN.md)

## Quickstart

Create a preprocessing pipeline:

```python
import keras_cv
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds

augmenter = keras_cv.layers.Augmenter(
layers=[
keras_cv.layers.RandomFlip(),
keras_cv.layers.RandAugment(value_range=(0, 255)),
keras_cv.layers.CutMix(),
keras_cv.layers.MixUp()
]
)

def augment_data(images, labels):
labels = tf.one_hot(labels, 3)
inputs = {"images": images, "labels": labels}
outputs = augmenter(inputs)
return outputs['images'], outputs['labels']
```

Augment a `tf.data.Dataset`:

```python
dataset = tfds.load('rock_paper_scissors', as_supervised=True, split='train')
dataset = dataset.batch(64)
dataset = dataset.map(augment_data, num_parallel_calls=tf.data.AUTOTUNE)
```

Create a model:

```python
densenet = keras_cv.models.DenseNet121(
include_rescaling=True,
include_top=True,
classes=3
)
densenet.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
```

Train your model:

```python
densenet.fit(dataset)
```

## Contributors
If you'd like to contribute, please see our [contributing guide](.github/CONTRIBUTING.md).

Expand All @@ -52,7 +97,7 @@ but also for active development for feature delivery. To achieve this, here is t
process for how to contribute to this repository:

1) Contributors are always welcome to help us fix an issue, add tests, better documentation.
2) If contributors would like to create a backbone, we usually require a pre-trained weight
2) If contributors would like to create a backbone, we usually require a pre-trained weight set
with the model for one dataset as the first PR, and a training script as a follow-up. The training script will preferrably help us reproduce the results claimed from paper. The backbone should be generic but the training script can contain paper specific parameters such as learning rate schedules and weight decays. The training script will be used to produce leaderboard results.
Exceptions apply to large transformer-based models which are difficult to train. If this is the case,
contributors should let us know so the team can help in training the model or providing GCP resources.
Expand All @@ -67,14 +112,27 @@ Thank you to all of our wonderful contributors!
</a>

## Pretrained Weights
Many models in KerasCV come with pre-trained weights. With the exception of StableDiffusion,
all of these weights are trained using Keras and KerasCV components and training scripts in this
repository. Models may not be trained with the same parameters or preprocessing pipeline
described in their original papers. Performance metrics for pre-trained weights can be found
in the training history for each task. For example, see ImageNet classification training
history for backbone models [here](examples/training/classification/imagenet/training_history.json).
All results are reproducible using the training scripts in this repository. Pre-trained weights
operate on images that have been rescaled using a simple `1/255` rescaling layer.
Many models in KerasCV come with pre-trained weights.
With the exception of StableDiffusion and the standard Vision Transformer, all of these weights are trained using Keras and
KerasCV components and training scripts in this repository.
While some models are not trained with the same parameters or preprocessing pipeline
as defined in their original publications, the KerasCV team ensures strong numerical performance.
Performance metrics for the provided pre-trained weights can be found
in the training history for each documented task.
An example of this can be found in the ImageNet classification training
[history for backbone models](examples/training/classification/imagenet/training_history.json).
All results are reproducible using the training scripts in this repository.

Historically, many models have been trained on image datasets rescaled via manually
crafted normalization schemes.
The most common variant of manually crafted normalization scheme is subtraction of the
imagenet mean pixel followed by standard deviation normalization based on the imagenet
pixel standard deviation.
This scheme is an artifact of the days of manual feature engineering, but is no longer
required to score state of the art scores using modern deep learning architectures.
Due to this, KerasCV is standardized to operate on images that have been rescaled using
a simple `1/255` rescaling layer.
This can be seen in all KerasCV training pipelines and code examples.

## Custom Ops
Note that in some the 3D Object Detection layers, custom TF ops are used. The
Expand All @@ -85,8 +143,9 @@ If you'd like to use these custom ops, you can install from source using the
instructions below.

### Installing KerasCV with Custom Ops from Source
Installing from source requires the [Bazel](https://bazel.build/) build system
(version >= 5.4.0).

Installing custom ops from source requires the [Bazel](https://bazel.build/) build
system (version >= 5.4.0). Steps to install Bazel can be [found here](https://github.com/keras-team/keras/blob/v2.11.0/.devcontainer/Dockerfile#L21-L23).

```
git clone https://github.com/keras-team/keras-cv.git
Expand All @@ -111,8 +170,9 @@ and Windows.
KerasCV provides access to pre-trained models via the `keras_cv.models` API.
These pre-trained models are provided on an "as is" basis, without warranties
or conditions of any kind.
The following underlying models are provided by third parties, and subject to separate licenses:
StableDiffusion
The following underlying models are provided by third parties, and are subject to separate
licenses:
StableDiffusion, Vision Transfomer

## Citing KerasCV

Expand Down