intel · johnandersen777 · Oct 25, 2019 · Oct 18, 2019 · Oct 18, 2019 · Oct 18, 2019
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -37,6 +37,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   there are more than 5 issues of high severity and confidence.
 - dev service got the ability to run a single operation in a standalone fashion.
 - About page to docs.
+- Tensorflow DNNEstimator based regression model.
 ### Changed
 - feature/codesec became it's own branch, binsec
 - BaseOrchestratorContext `run_operations` strict is default to true. With
@@ -68,6 +69,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   prediction as well. Models are not responsible for calling the predicted
   method on the repo. This will ease the process of making predict feature
   specific.
+- Updated Tensorflow model README.md to include usage of regression model
 ### Fixed
 - Docs get version from dffml.version.VERSION.
 - FileSource zipfiles are wrapped with TextIOWrapper because CSVSource expects

diff --git a/dffml/cli.py b/dffml/cli.py
@@ -408,8 +408,7 @@ class PredictAll(EvaluateAll, MLCMD):
     """Predicts for all sources"""
 
     async def predict(self, mctx, sctx, repos):
-        async for repo, value, confidence in mctx.predict(repos):
-            repo.predicted(value, confidence)
+        async for repo in mctx.predict(repos):
             yield repo
             if self.update:
                 await sctx.update(repo)

diff --git a/docs/plugins/dffml_model.rst b/docs/plugins/dffml_model.rst
@@ -17,13 +17,22 @@ dffml_model_tensorflow
     pip install dffml-model-tensorflow
 
 
+.. note::
+
+    It's important to keep the hidden layer config and feature config the same
+    across invocations of train, predict, and accuracy methods.
+
+    Models are saved under the ``directory`` parameter in subdirectories named
+    after the hash of their feature names and hidden layer config. Which means
+    if any of those parameters change between invocations, it's being told to
+    look for a different saved model.
+
 tfdnnc
 ~~~~~~
 
 *Core*
 
-Implemented using Tensorflow's DNNClassifier. Models are saved under the
-``directory`` in subdirectories named after the hash of their feature names.
+Implemented using Tensorflow's DNNClassifier.
 
 .. code-block:: console
 
@@ -33,49 +42,49 @@ Implemented using Tensorflow's DNNClassifier. Models are saved under the
     $ sed -i 's/.*setosa,versicolor,virginica/SepalLength,SepalWidth,PetalLength,PetalWidth,classification/g' *.csv
     $ head iris_training.csv
     $ dffml train \
-      -model tfdnnc \
-      -model-epochs 3000 \
-      -model-steps 20000 \
-      -model-classification classification \
-      -model-classifications 0 1 2 \
-      -model-clstype int \
-      -sources iris=csv \
-      -source-filename iris_training.csv \
-      -features \
-        def:SepalLength:float:1 \
-        def:SepalWidth:float:1 \
-        def:PetalLength:float:1 \
-        def:PetalWidth:float:1 \
-      -log debug
+        -model tfdnnc \
+        -model-epochs 3000 \
+        -model-steps 20000 \
+        -model-classification classification \
+        -model-classifications 0 1 2 \
+        -model-clstype int \
+        -sources iris=csv \
+        -source-filename iris_training.csv \
+        -features \
+          def:SepalLength:float:1 \
+          def:SepalWidth:float:1 \
+          def:PetalLength:float:1 \
+          def:PetalWidth:float:1 \
+        -log debug
     ... lots of output ...
     $ dffml accuracy \
-      -model tfdnnc \
-      -model-classification classification \
-      -model-classifications 0 1 2 \
-      -model-clstype int \
-      -sources iris=csv \
-      -source-filename iris_test.csv \
-      -features \
-        def:SepalLength:float:1 \
-        def:SepalWidth:float:1 \
-        def:PetalLength:float:1 \
-        def:PetalWidth:float:1 \
-      -log critical
+        -model tfdnnc \
+        -model-classification classification \
+        -model-classifications 0 1 2 \
+        -model-clstype int \
+        -sources iris=csv \
+        -source-filename iris_test.csv \
+        -features \
+          def:SepalLength:float:1 \
+          def:SepalWidth:float:1 \
+          def:PetalLength:float:1 \
+          def:PetalWidth:float:1 \
+        -log critical
     0.99996233782
     $ dffml predict all \
-      -model tfdnnc \
-      -model-classification classification \
-      -model-classifications 0 1 2 \
-      -model-clstype int \
-      -sources iris=csv \
-      -source-filename iris_test.csv \
-      -features \
-        def:SepalLength:float:1 \
-        def:SepalWidth:float:1 \
-        def:PetalLength:float:1 \
-        def:PetalWidth:float:1 \
-      -caching \
-      -log critical \
+        -model tfdnnc \
+        -model-classification classification \
+        -model-classifications 0 1 2 \
+        -model-clstype int \
+        -sources iris=csv \
+        -source-filename iris_test.csv \
+        -features \
+          def:SepalLength:float:1 \
+          def:SepalWidth:float:1 \
+          def:PetalLength:float:1 \
+          def:PetalWidth:float:1 \
+        -caching \
+        -log critical \
       > results.json
     $ head -n 33 results.json
     [
@@ -147,6 +156,124 @@ Implemented using Tensorflow's DNNClassifier. Models are saved under the
   - default: <class 'str'>
   - Data type of classifications values (default: str)
 
+tfdnnr
+~~~~~~
+
+*Core*
+
+Implemented using Tensorflow's DNNEstimator.
+
+Usage:
+
+* predict: Name of the feature we are trying to predict or using for training.
+
+Generating train and test data
+
+* This creates files `train.csv` and `test.csv`,
+  make sure to take a BACKUP of files with same name in the directory
+  from where this command is run as it overwrites any existing files.
+
+.. code-block:: console
+
+    $ cat > train.csv << EOF
+    Feature1,Feature2,TARGET
+    0.93,0.68,3.89
+    0.24,0.42,1.75
+    0.36,0.68,2.75
+    0.53,0.31,2.00
+    0.29,0.25,1.32
+    0.29,0.52,2.14
+    EOF
+    $ cat > test.csv << EOF
+    Feature1,Feature2,TARGET
+    0.57,0.84,3.65
+    0.95,0.19,2.46
+    0.23,0.15,0.93
+    EOF
+    $ dffml train \
+        -model tfdnnr \
+        -model-epochs 300 \
+        -model-steps 2000 \
+        -model-predict TARGET \
+        -model-hidden 8 16 8 \
+        -sources s=csv \
+        -source-readonly \
+        -source-filename train.csv \
+        -features \
+          def:Feature1:float:1 \
+          def:Feature2:float:1 \
+        -log debug
+    Enabling debug log shows tensorflow losses...
+    $ dffml accuracy \
+        -model tfdnnr \
+        -model-predict TARGET \
+        -model-hidden 8 16 8 \
+        -sources s=csv \
+        -source-readonly \
+        -source-filename test.csv \
+        -features \
+          def:Feature1:float:1 \
+          def:Feature2:float:1 \
+        -log critical
+    0.9468210011
+    $ echo -e 'Feature1,Feature2,TARGET\n0.21,0.18,0.84\n' | \
+      dffml predict all \
+        -model tfdnnr \
+        -model-predict TARGET \
+        -model-hidden 8 16 8 \
+        -sources s=csv \
+        -source-readonly \
+        -source-filename /dev/stdin \
+        -features \
+          def:Feature1:float:1 \
+          def:Feature2:float:1 \
+        -log critical
+    [
+        {
+            "extra": {},
+            "features": {
+                "Feature1": 0.21,
+                "Feature2": 0.18,
+                "TARGET": 0.84
+            },
+            "last_updated": "2019-10-24T15:26:41Z",
+            "prediction": {
+                "confidence": NaN,
+                "value": 1.1983429193496704
+            },
+            "src_url": 0
+        }
+    ]
+
+The ``NaN`` in ``confidence`` is the expected behaviour. (See TODO in
+predict).
+
+**Args**
+
+- directory: String
+
+  - default: /home/user/.cache/dffml/tensorflow
+  - Directory where state should be saved
+
+- steps: Integer
+
+  - default: 3000
+  - Number of steps to train the model
+
+- epochs: Integer
+
+  - default: 30
+  - Number of iterations to pass over all repos in a source
+
+- hidden: List of integers
+
+  - default: [12, 40, 15]
+  - List length is the number of hidden layers in the network. Each entry in the list is the number of nodes in that hidden layer
+
+- predict: String
+
+  - Feature name holding truth value
+
 dffml_model_scratch
 -------------------
 

diff --git a/model/tensorflow/README.md b/model/tensorflow/README.md
@@ -1,9 +1,5 @@
 # DFFML Models for Tensorflow Library
 
-## About
-
-DFFML models backed by Tensorflow.
-
 ## Demo
 
 ![Demo](https://github.com/intel/dffml/raw/master/docs/images/iris_demo.gif)
@@ -12,69 +8,21 @@ DFFML models backed by Tensorflow.
 > may vary as this video shows accuracy being assessed against the training
 > data. You should try it for yourself and see!
 
-## Install
-
-```console
-virtualenv -p python3.7 .venv
-. .venv/bin/activate
-python3.7 -m pip install --user -U dffml[tensorflow]
-```
+## Documentation
 
-## Usage
-
-```console
-wget http://download.tensorflow.org/data/iris_training.csv
-wget http://download.tensorflow.org/data/iris_test.csv
-head iris_training.csv
-sed -i 's/.*setosa,versicolor,virginica/SepalLength,SepalWidth,PetalLength,PetalWidth,classification/g' *.csv
-head iris_training.csv
-dffml train \
-  -model tfdnnc \
-  -model-epochs 3000 \
-  -model-steps 20000 \
-  -model-classification classification \
-  -model-classifications 0 1 2 \
-  -model-clstype int \
-  -sources iris=csv \
-  -source-filename iris_training.csv \
-  -features \
-    def:SepalLength:float:1 \
-    def:SepalWidth:float:1 \
-    def:PetalLength:float:1 \
-    def:PetalWidth:float:1 \
-  -log debug
-dffml accuracy \
-  -model tfdnnc \
-  -model-classification classification \
-  -model-classifications 0 1 2 \
-  -model-clstype int \
-  -sources iris=csv \
-  -source-filename iris_test.csv \
-  -features \
-    def:SepalLength:float:1 \
-    def:SepalWidth:float:1 \
-    def:PetalLength:float:1 \
-    def:PetalWidth:float:1 \
-  -log critical
-dffml predict all \
-  -model tfdnnc \
-  -model-classification classification \
-  -model-classifications 0 1 2 \
-  -model-clstype int \
-  -sources iris=csv \
-  -source-filename iris_test.csv \
-  -features \
-    def:SepalLength:float:1 \
-    def:SepalWidth:float:1 \
-    def:PetalLength:float:1 \
-    def:PetalWidth:float:1 \
-  -caching \
-  -log critical \
-  > results.json
-head -n 33 results.json
-```
+Documentation is hosted at https://intel.github.io/dffml/plugins/dffml_model.html#dffml-model-tensorflow
 
 ## License
 
 DFFML Tensorflow Models are distributed under the terms of the
 [MIT License](LICENSE).
+
+## Legal
+
+> This software is subject to the U.S. Export Administration Regulations and
+> other U.S. law, and may not be exported or re-exported to certain countries
+> (Cuba, Iran, Crimea Region of Ukraine, North Korea, Sudan, and Syria) or to
+> persons or entities prohibited from receiving U.S. exports (including
+> Denied Parties, Specially Designated Nationals, and entities on the Bureau
+> of Export Administration Entity List or involved with missile technology or
+> nuclear, chemical or biological weapons).
diff --git a/model/tensorflow/dffml_model_tensorflow/__init__.py b/model/tensorflow/dffml_model_tensorflow/__init__.py
@@ -0,0 +1,11 @@
+"""
+.. note::
+
+    It's important to keep the hidden layer config and feature config the same
+    across invocations of train, predict, and accuracy methods.
+
+    Models are saved under the ``directory`` parameter in subdirectories named
+    after the hash of their feature names and hidden layer config. Which means
+    if any of those parameters change between invocations, it's being told to
+    look for a different saved model.
+"""