|
| 1 | +# How we are going to validate the new API codebase |
| 2 | + |
| 3 | +* Date: 2021-03-10 |
| 4 | + |
| 5 | +Technical Story: |
| 6 | +- [securesystemslib schema checker issues](https://github.com/secure-systems-lab/securesystemslib/issues/183) |
| 7 | +- [new TUF validation guidelines](https://github.com/theupdateframework/tuf/issues/1130) |
| 8 | + |
| 9 | +## Context and Problem Statement |
| 10 | + |
| 11 | +1. Some schemas sound more specific than they are. |
| 12 | +2. Some schemas are an odd replacement for constants. |
| 13 | +3. Schema validation is generally **overused**. Together with user input, |
| 14 | +we are validating input programmatically generated from our private functions. |
| 15 | +4. There are instances where some attributes are validated multiple times |
| 16 | +when executing one API call. |
| 17 | +5. Schema checking sometimes makes execution branches unreachable. |
| 18 | +6. The error messages from checking schemas are often not helpful. |
| 19 | + |
| 20 | +## Decision Drivers and Requirements |
| 21 | +Some of the requirements we want to meet are: |
| 22 | +1. The ability to decide which functions to validate and which not. |
| 23 | +2. Allow for custom deeper validation beyond type check. |
| 24 | +3. As little as possible performance overhead. |
| 25 | +4. Add as minimal number of dependencies as possible. |
| 26 | +5. Support for all python versions we are using. |
| 27 | + |
| 28 | +## Considered Options |
| 29 | +1. Usage of a `ValidationMixin`. |
| 30 | +2. Usage of a third-party library called `pydantic`. |
| 31 | + |
| 32 | +## Pros, Cons, and Considerations of the Options |
| 33 | + |
| 34 | +### Option 1: Usage of a ValidationMixin |
| 35 | + |
| 36 | +**Note:** All pros, cons, and considerations are documented with the assumption |
| 37 | +we would implement the `ValidationMixin` the same way it is implemented in |
| 38 | +[in-toto](https://github.com/in-toto) until version 1.0.1 (the latest |
| 39 | +version at the time of writing.) |
| 40 | + |
| 41 | +* Good, because it's shorter by calling one function and validating |
| 42 | +multiple fields. |
| 43 | + |
| 44 | +* Good, because it allows reuse of the validation code through |
| 45 | +`securesystemslib.schemas` or another schema of our choice. |
| 46 | + |
| 47 | +* Bad, because there could be different code paths and return statements, and as |
| 48 | +a consequence there could be a code path which doesn't call `validate()`. |
| 49 | + |
| 50 | +Examle: |
| 51 | +```python |
| 52 | +class User(ValidationMixin): |
| 53 | + |
| 54 | + def __init__(self, id: int, nickname: str) -> None: |
| 55 | + self.id = id |
| 56 | + self.nickname = nickname |
| 57 | + self.pro_user = False |
| 58 | + |
| 59 | + self.validate() |
| 60 | + |
| 61 | + def _validate_id(self): |
| 62 | + if not isinstance(self.id, int): |
| 63 | + raise FormatError(f'id should be from type int') |
| 64 | + |
| 65 | + if self.id < 0: |
| 66 | + raise ValueError(f'id is expected to be a positive number') |
| 67 | + |
| 68 | + def update_profile(self, new_id: int, new_nickname: str): |
| 69 | + self.id = new_id |
| 70 | + |
| 71 | + if not self.pro_user: |
| 72 | + print(f'Standart users can only change their id! ' |
| 73 | + f'If you want to change your nickname become a pro user.) |
| 74 | +
|
| 75 | + return |
| 76 | + |
| 77 | + self.nickname = new_nickname |
| 78 | + # Be careful if you rely on _validate_id() to verify self.id! |
| 79 | + # This won't be called if new_name is "". |
| 80 | + self.validate() |
| 81 | +``` |
| 82 | + |
| 83 | +* *Personal opinion*: bad, because it's not a clean solution from an OOP |
| 84 | +perspective to inherit `ValidationMixin` from classes without a "IS A" |
| 85 | +relationship with it. |
| 86 | + |
| 87 | +* Consideration: if we use this option, we are limited on what can be validated. |
| 88 | +With the `in-toto` implementation of the `ValidationMixin`, we can only validate |
| 89 | +class attributes inside class methods. |
| 90 | +If we want to validate functions outside classes or function arguments we would |
| 91 | +have to enhance this solution. |
| 92 | + |
| 93 | +* Consideration: if we use this option, we would be responsible for the code |
| 94 | +and all identified issues related to `securesystemslib.schemas` should be |
| 95 | +resolved by us or replace the schema implementation with something else. |
| 96 | + |
| 97 | +* Consideration: if we want to enforce assignment validation, this solution |
| 98 | +should be combined with custom "setter" properties. |
| 99 | + |
| 100 | +### Option 2: Usage of a third-party library called "pydantic" |
| 101 | + |
| 102 | +* Good, because it's flexible: |
| 103 | +1. There is a `@validate_arguments` decorator which allows us to decide which |
| 104 | +functions to validate and the ability to validate functions outside classes. |
| 105 | +2. There is a `@validator` decorator which allows us to make a deeper validation |
| 106 | +beyond type checking for our class attributes. |
| 107 | +3. We can use an embedded `Config` class inside our classes, which allows for |
| 108 | +even more customization (for example enforce assignment validation). |
| 109 | + |
| 110 | +* Good, because (according to their documentation) `pydantic` is the fastest |
| 111 | +validation library compared to others (including our other third-party library |
| 112 | +option `marshmallow`). |
| 113 | +See: https://pydantic-docs.helpmanual.io/benchmarks/ |
| 114 | + |
| 115 | +* Good, because it uses the built-in types from `python 3.6` onwards. |
| 116 | + |
| 117 | +* Bad, because this library **has not yet implemented** a `strict` mode and |
| 118 | +the default behaviour when validating a certain argument or field is to **try |
| 119 | +a cast to the expected type from the received type**. |
| 120 | +To enable strict mode, we would have to add this manually through |
| 121 | +`validators` that are called before the cast. |
| 122 | +See: https://github.com/samuelcolvin/pydantic/issues/1098 |
| 123 | + |
| 124 | +* Bad, because there is a learning curve when using `pydantic`. |
| 125 | +1. For example, when I had to handle the `_type` attribute in `Signed` it took me |
| 126 | +a lot of reading to understand that standard attributes whose name begin with |
| 127 | +"_" are ignored. The `_type` attribute can only be `PrivateAttr` |
| 128 | +(defined in `pydantic`) even though we don't handle it as a typical private |
| 129 | +attribute. |
| 130 | +2. Also, I had difficulties using pydantic when there is inheritance. |
| 131 | +The initialization and validation of new objects was tricky. |
| 132 | + |
| 133 | +* Bad, because it adds `2` new dependencies: `pydantic` and `typing-extensions`. |
| 134 | +This was concluded by performing the following steps: |
| 135 | +1. Creating a fresh virtual environment with python3.8. |
| 136 | +2. Installing all dependencies in `requirements-dev.txt` from `tuf`. |
| 137 | +3. Install `pydantic` with `pip install pydantic`. |
| 138 | + |
| 139 | +## Links |
| 140 | +* [in-toto ValidatorMixin](https://github.com/in-toto/in-toto/blob/74da7a/in_toto/models/common.py#L27-L40) |
| 141 | +* [ValidatorMixing usage](https://github.com/in-toto/in-toto/blob/74da7a/in_toto/models/layout.py#L420-L438) |
| 142 | +* [Pydantic documentation](https://pydantic-docs.helpmanual.io/) |
| 143 | + |
| 144 | +## Decision Outcome |
| 145 | + |
| 146 | +*TODO: Make and describe the decision* |
0 commit comments