Skip to content
This repository was archived by the owner on Aug 25, 2024. It is now read-only.

WIP:Adding tensorflow regression model #226

Merged
merged 22 commits into from
Oct 25, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
there are more than 5 issues of high severity and confidence.
- dev service got the ability to run a single operation in a standalone fashion.
- About page to docs.
- Tensorflow DNNEstimator based regression model.
### Changed
- feature/codesec became it's own branch, binsec
- BaseOrchestratorContext `run_operations` strict is default to true. With
Expand Down Expand Up @@ -68,6 +69,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
prediction as well. Models are not responsible for calling the predicted
method on the repo. This will ease the process of making predict feature
specific.
- Updated Tensorflow model README.md to include usage of regression model
### Fixed
- Docs get version from dffml.version.VERSION.
- FileSource zipfiles are wrapped with TextIOWrapper because CSVSource expects
Expand Down
3 changes: 1 addition & 2 deletions dffml/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -408,8 +408,7 @@ class PredictAll(EvaluateAll, MLCMD):
"""Predicts for all sources"""

async def predict(self, mctx, sctx, repos):
async for repo, value, confidence in mctx.predict(repos):
repo.predicted(value, confidence)
async for repo in mctx.predict(repos):
yield repo
if self.update:
await sctx.update(repo)
Expand Down
209 changes: 168 additions & 41 deletions docs/plugins/dffml_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,22 @@ dffml_model_tensorflow
pip install dffml-model-tensorflow


.. note::

It's important to keep the hidden layer config and feature config the same
across invocations of train, predict, and accuracy methods.

Models are saved under the ``directory`` parameter in subdirectories named
after the hash of their feature names and hidden layer config. Which means
if any of those parameters change between invocations, it's being told to
look for a different saved model.

tfdnnc
~~~~~~

*Core*

Implemented using Tensorflow's DNNClassifier. Models are saved under the
``directory`` in subdirectories named after the hash of their feature names.
Implemented using Tensorflow's DNNClassifier.

.. code-block:: console

Expand All @@ -33,49 +42,49 @@ Implemented using Tensorflow's DNNClassifier. Models are saved under the
$ sed -i 's/.*setosa,versicolor,virginica/SepalLength,SepalWidth,PetalLength,PetalWidth,classification/g' *.csv
$ head iris_training.csv
$ dffml train \
-model tfdnnc \
-model-epochs 3000 \
-model-steps 20000 \
-model-classification classification \
-model-classifications 0 1 2 \
-model-clstype int \
-sources iris=csv \
-source-filename iris_training.csv \
-features \
def:SepalLength:float:1 \
def:SepalWidth:float:1 \
def:PetalLength:float:1 \
def:PetalWidth:float:1 \
-log debug
-model tfdnnc \
-model-epochs 3000 \
-model-steps 20000 \
-model-classification classification \
-model-classifications 0 1 2 \
-model-clstype int \
-sources iris=csv \
-source-filename iris_training.csv \
-features \
def:SepalLength:float:1 \
def:SepalWidth:float:1 \
def:PetalLength:float:1 \
def:PetalWidth:float:1 \
-log debug
... lots of output ...
$ dffml accuracy \
-model tfdnnc \
-model-classification classification \
-model-classifications 0 1 2 \
-model-clstype int \
-sources iris=csv \
-source-filename iris_test.csv \
-features \
def:SepalLength:float:1 \
def:SepalWidth:float:1 \
def:PetalLength:float:1 \
def:PetalWidth:float:1 \
-log critical
-model tfdnnc \
-model-classification classification \
-model-classifications 0 1 2 \
-model-clstype int \
-sources iris=csv \
-source-filename iris_test.csv \
-features \
def:SepalLength:float:1 \
def:SepalWidth:float:1 \
def:PetalLength:float:1 \
def:PetalWidth:float:1 \
-log critical
0.99996233782
$ dffml predict all \
-model tfdnnc \
-model-classification classification \
-model-classifications 0 1 2 \
-model-clstype int \
-sources iris=csv \
-source-filename iris_test.csv \
-features \
def:SepalLength:float:1 \
def:SepalWidth:float:1 \
def:PetalLength:float:1 \
def:PetalWidth:float:1 \
-caching \
-log critical \
-model tfdnnc \
-model-classification classification \
-model-classifications 0 1 2 \
-model-clstype int \
-sources iris=csv \
-source-filename iris_test.csv \
-features \
def:SepalLength:float:1 \
def:SepalWidth:float:1 \
def:PetalLength:float:1 \
def:PetalWidth:float:1 \
-caching \
-log critical \
> results.json
$ head -n 33 results.json
[
Expand Down Expand Up @@ -147,6 +156,124 @@ Implemented using Tensorflow's DNNClassifier. Models are saved under the
- default: <class 'str'>
- Data type of classifications values (default: str)

tfdnnr
~~~~~~

*Core*

Implemented using Tensorflow's DNNEstimator.

Usage:

* predict: Name of the feature we are trying to predict or using for training.

Generating train and test data

* This creates files `train.csv` and `test.csv`,
make sure to take a BACKUP of files with same name in the directory
from where this command is run as it overwrites any existing files.

.. code-block:: console

$ cat > train.csv << EOF
Feature1,Feature2,TARGET
0.93,0.68,3.89
0.24,0.42,1.75
0.36,0.68,2.75
0.53,0.31,2.00
0.29,0.25,1.32
0.29,0.52,2.14
EOF
$ cat > test.csv << EOF
Feature1,Feature2,TARGET
0.57,0.84,3.65
0.95,0.19,2.46
0.23,0.15,0.93
EOF
$ dffml train \
-model tfdnnr \
-model-epochs 300 \
-model-steps 2000 \
-model-predict TARGET \
-model-hidden 8 16 8 \
-sources s=csv \
-source-readonly \
-source-filename train.csv \
-features \
def:Feature1:float:1 \
def:Feature2:float:1 \
-log debug
Enabling debug log shows tensorflow losses...
$ dffml accuracy \
-model tfdnnr \
-model-predict TARGET \
-model-hidden 8 16 8 \
-sources s=csv \
-source-readonly \
-source-filename test.csv \
-features \
def:Feature1:float:1 \
def:Feature2:float:1 \
-log critical
0.9468210011
$ echo -e 'Feature1,Feature2,TARGET\n0.21,0.18,0.84\n' | \
dffml predict all \
-model tfdnnr \
-model-predict TARGET \
-model-hidden 8 16 8 \
-sources s=csv \
-source-readonly \
-source-filename /dev/stdin \
-features \
def:Feature1:float:1 \
def:Feature2:float:1 \
-log critical
[
{
"extra": {},
"features": {
"Feature1": 0.21,
"Feature2": 0.18,
"TARGET": 0.84
},
"last_updated": "2019-10-24T15:26:41Z",
"prediction": {
"confidence": NaN,
"value": 1.1983429193496704
},
"src_url": 0
}
]

The ``NaN`` in ``confidence`` is the expected behaviour. (See TODO in
predict).

**Args**

- directory: String

- default: /home/user/.cache/dffml/tensorflow
- Directory where state should be saved

- steps: Integer

- default: 3000
- Number of steps to train the model

- epochs: Integer

- default: 30
- Number of iterations to pass over all repos in a source

- hidden: List of integers

- default: [12, 40, 15]
- List length is the number of hidden layers in the network. Each entry in the list is the number of nodes in that hidden layer

- predict: String

- Feature name holding truth value

dffml_model_scratch
-------------------

Expand Down
76 changes: 12 additions & 64 deletions model/tensorflow/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
# DFFML Models for Tensorflow Library

## About

DFFML models backed by Tensorflow.

## Demo

![Demo](https://github.com/intel/dffml/raw/master/docs/images/iris_demo.gif)
Expand All @@ -12,69 +8,21 @@ DFFML models backed by Tensorflow.
> may vary as this video shows accuracy being assessed against the training
> data. You should try it for yourself and see!

## Install

```console
virtualenv -p python3.7 .venv
. .venv/bin/activate
python3.7 -m pip install --user -U dffml[tensorflow]
```
## Documentation

## Usage

```console
wget http://download.tensorflow.org/data/iris_training.csv
wget http://download.tensorflow.org/data/iris_test.csv
head iris_training.csv
sed -i 's/.*setosa,versicolor,virginica/SepalLength,SepalWidth,PetalLength,PetalWidth,classification/g' *.csv
head iris_training.csv
dffml train \
-model tfdnnc \
-model-epochs 3000 \
-model-steps 20000 \
-model-classification classification \
-model-classifications 0 1 2 \
-model-clstype int \
-sources iris=csv \
-source-filename iris_training.csv \
-features \
def:SepalLength:float:1 \
def:SepalWidth:float:1 \
def:PetalLength:float:1 \
def:PetalWidth:float:1 \
-log debug
dffml accuracy \
-model tfdnnc \
-model-classification classification \
-model-classifications 0 1 2 \
-model-clstype int \
-sources iris=csv \
-source-filename iris_test.csv \
-features \
def:SepalLength:float:1 \
def:SepalWidth:float:1 \
def:PetalLength:float:1 \
def:PetalWidth:float:1 \
-log critical
dffml predict all \
-model tfdnnc \
-model-classification classification \
-model-classifications 0 1 2 \
-model-clstype int \
-sources iris=csv \
-source-filename iris_test.csv \
-features \
def:SepalLength:float:1 \
def:SepalWidth:float:1 \
def:PetalLength:float:1 \
def:PetalWidth:float:1 \
-caching \
-log critical \
> results.json
head -n 33 results.json
```
Documentation is hosted at https://intel.github.io/dffml/plugins/dffml_model.html#dffml-model-tensorflow

## License

DFFML Tensorflow Models are distributed under the terms of the
[MIT License](LICENSE).

## Legal

> This software is subject to the U.S. Export Administration Regulations and
> other U.S. law, and may not be exported or re-exported to certain countries
> (Cuba, Iran, Crimea Region of Ukraine, North Korea, Sudan, and Syria) or to
> persons or entities prohibited from receiving U.S. exports (including
> Denied Parties, Specially Designated Nationals, and entities on the Bureau
> of Export Administration Entity List or involved with missile technology or
> nuclear, chemical or biological weapons).
11 changes: 11 additions & 0 deletions model/tensorflow/dffml_model_tensorflow/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
"""
.. note::

It's important to keep the hidden layer config and feature config the same
across invocations of train, predict, and accuracy methods.

Models are saved under the ``directory`` parameter in subdirectories named
after the hash of their feature names and hidden layer config. Which means
if any of those parameters change between invocations, it's being told to
look for a different saved model.
"""
Loading