Skip to content

Oonirun v2 1 #962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 70 commits into
base: master
Choose a base branch
from
Open

Oonirun v2 1 #962

wants to merge 70 commits into from

Conversation

LDiazN
Copy link
Contributor

@LDiazN LDiazN commented May 23, 2025

Add DB model migrations, new view models and some tests. Related to #955

@LDiazN LDiazN requested a review from hellais May 23, 2025 08:05
@LDiazN LDiazN self-assigned this May 23, 2025
Copy link

codecov bot commented May 23, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.76%. Comparing base (d57a18f) to head (70240f3).
Report is 4 commits behind head on master.

❌ Your project check has failed because the head coverage (82.76%) is below the target coverage (95.00%). You can increase the head coverage or adjust the target coverage.

❗ There is a different number of reports uploaded between BASE (d57a18f) and HEAD (70240f3). Click for more details.

HEAD has 4 uploads less than BASE
Flag BASE (d57a18f) HEAD (70240f3)
ooniauth 1 0
oonifindings 1 0
oonirun 1 0
ooniprobe 1 0
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #962      +/-   ##
==========================================
- Coverage   91.81%   82.76%   -9.05%     
==========================================
  Files          61       19      -42     
  Lines        5179     1938    -3241     
  Branches      339      208     -131     
==========================================
- Hits         4755     1604    -3151     
+ Misses        365      285      -80     
+ Partials       59       49      -10     
Flag Coverage Δ
ooniauth ?
oonifindings ?
oonimeasurements 82.76% <ø> (+0.13%) ⬆️
ooniprobe ?
oonirun ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -151,9 +171,9 @@ class OONIRunLinkCreateEdit(OONIRunLinkBase):
)
def create_oonirun_link(
create_request: OONIRunLinkCreateEdit,
db: PostgresSession,
Copy link
Member

@hellais hellais May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting way to defined the dependencies I wasn't aware of: https://fastapi.tiangolo.com/tutorial/dependencies/#declare-the-dependency-in-the-dependant.

I wonder if at some point we should refactor the other dependencies to use it too (note: we should do this as part of a future PR, not here).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it more clear this is a dependency injection maybe we want to change the type name to DependsPostgresSession

@@ -41,6 +41,7 @@ class OONIRunLinkNettest(BaseModel):
inputs: List[str] = Field(
default=[], title="list of input dictionaries for the nettest"
)
# TODO(luis): Options and backend_options not in the new spec. Should be removed?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should for sure drop backend_options. I am not sure if it's useful to have a top level options that applies to all the inputs inside of the nettest or if it's redundant as we can include it directly inside of the inputs_extra.

@DecFox what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The engine does support having top level options and overwriting them with input level options if they are conflicting in nature. I would however say we get rid of options, since it also allows us to simplify input processing in the engine and reduce redundancy here. We can directly include them in inputs_extra

@@ -51,6 +52,24 @@ class OONIRunLinkNettest(BaseModel):
default=False, title="if this test should be enabled by default for manual runs"
)

targets_name: Optional[str] = Field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do some validation on create and edit, that if targets_name is provided, you can't also provide input and inputs_extra, since these are to be populated based on the value of targets_name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it an error to not provide any of them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an error, because you can have a test that has no input for it (eg. whatsapp)

@hellais
Copy link
Member

hellais commented May 26, 2025

This is a good start, I left a few comments for things to improve or change. Regarding handling of the targets_name, I would place logic for that inside of the get_nettests function and have in it a hardcoded list of supported targets_name that we know how to handle.

For the moment we only support the targets_name of websites_list_prioritized, which uses the pio code to generate a list following the same logic of check (see: https://github.com/ooni/backend/blob/master/api/ooniapi/probe_services.py#L234). You will need to pass into it all the metadata of the probe and the is_charging, is_manual_run flags.

date_created = oonirun_link.nettests[0].date_created
nettests = []
for nt in oonirun_link.nettests:
if revision and nt.revision != revision:
continue
date_created = nt.date_created
inputs, inputs_extra = nt.inputs, nt.inputs_extra
targets_name = nt.targets_name
if nt.targets_name is not None and meta is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that meta should also be an assertion rather than being in the if statement. The reason for that is that we want to fail hard on the caller of the function not having passed a required arguments, when we are dealing with a dynamic test list target.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that this function is also called when no dynamic test list is required. If you don't pass the meta it just gives you whatever is in the inputs field in the DB (empty for nettests with dynamic test lists)

If we require the meta here we have to:

  1. Option 1: Always compute a dynamic list when this function is called and and making the meta required for endpoints that don't currently use it. Ex: /v2/oonirun/links that lists descriptors, /v2/oonirun/links/{oonirun_link_id}/full-descriptor/{revision_number} to get a descriptor

  2. Option 2: We make a new function to get the nettest from the DB without the dynamic test list

I would suggest to split the dynamic test list calculation and fetching from the DB into two different functions, since these are two different tasks, and only compute the dynamic test lists explicitly when needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't nt.targets_name is not None imply that it's getting called as a dynamic test list and hence meta is required?

If this function is called when it's not dynamic, shouldn't nt.targets_name be none and therefore the assertion will not be triggered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK there are endpoints where we want the nettest without computing the dynamic list, like /v2/oonirun/links to list descriptors

Or perhaps my understanding is incorrect and we do want the nettest in all cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing this on slack, I think now I have a better understanding of the issue. I like the solution you proposed in the last commit of splitting out the two functions.

For me that's a good way to to do it 👍

@router.get(
@staticmethod
def get_header_pattern() -> str:
return r'^([a-zA-Z0-9]+),([a-zA-Z0-9]+) \(([a-zA-Z0-9]+)\)$'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should simplify the format of this such that it's possible to parse it without the need of a regexp.

How about we just say the format is {probe_asn},{probe_cc},{network_type}? This way we can just do a split on , and get the values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this, I will still keep the regex since it's used for validation but a simpler format would be better (we also have to update the spec to reflect this)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern was also related to the use of regexp for parsing this field. If we can get rid of regular expressions it's probably better due to risks associated with regular expression DoS. This pattern should be safe, but if we can avoid regular expressions altogether it's going to be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of this kind of attack, thanks for sharing it!
I will change the parsing to use split by "," then

@LDiazN LDiazN marked this pull request as ready for review June 3, 2025 10:59
@@ -559,8 +708,11 @@ def list_oonirun_links(
Optional[bool],
Query(description="List also expired descriptors"),
] = None,
only_latest: Annotated[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop this parameter from this PR.

description="Expected format: <software_name>/<software_version> (<platform>) <engine_name>/<engine_version> (<engine_version_full>)"
),
] = None,
credentials : Annotated[Optional[bytes], Header(description="base64 encoded OONI anonymous credentials")] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call this header X-OONI-Credentials

class OonirunMeta(BaseModel):
run_type : str = Field(description="Run type", pattern="^(timed|manual)$")
is_charging : bool = Field(description="If the probe is charging")
probe_asn : str = Field(pattern=r"^([a-zA-Z0-9]+)$")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern seems a bit too lax. I would say it should be: ^(AS)?([0-9]{1,10})$

is_charging : bool = Field(description="If the probe is charging")
probe_asn : str = Field(pattern=r"^([a-zA-Z0-9]+)$")
probe_cc : str = Field(description="Country code. Ex: VE")
network_type : str = Field(description="Ex: wifi")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have an explicit list of valid network_type. @aanorbel @sdsantos can you provide us this list so we can add it here to the validation list?

Copy link
Member

@aanorbel aanorbel Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • vpn
  • wifi
  • mobile
  • wired_ethernet
  • no_internet
  • unknown

@@ -33,16 +37,26 @@
def utcnow_seconds():
return datetime.now(timezone.utc).replace(microsecond=0)

class OonirunMeta(BaseModel):
run_type : str = Field(description="Run type", pattern="^(timed|manual)$")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aanorbel @sdsantos @DecFox have other versions of OONI Probe used something other than timed and manual for run_type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far no version of OONI Probe has used any other type.


def downgrade() -> None:
op.drop_column("oonirun_nettest", "targets_name")
op.drop_column("oonirun_nettest", "inputs_extra")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add to the DB migration also the drop of the backend_options column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that break old probes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Older probes aren't using it and we have anyways dropped it from the JSON API format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, then I will drop the backend and backend_options columns and remove it from the API


@pytest.fixture(scope="module")
def fixtures_data_dir():
yield Path("tests/fixtures/data")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest rooting this path into THIS_DIR to make sure the tests will run properly irrespective of where you invoke pytest from (see: https://github.com/ooni/backend/blob/master/ooniapi/services/oonimeasurements/tests/conftest.py#L15).

Fixtures should also go inside of the conftest.py file instead of the test file itself

query = "INSERT INTO url_priorities (sign, category_code, cc, domain, url, priority) VALUES"
insert_click(clickhouse_db, query, [values])
yield
clickhouse_db.execute("DELETE FROM url_priorities WHERE domain='www.ooni.com'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't domain here be ooni.org?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was a mistake, thanks for catching it!

# TODO(luis) Finish this test


# TODO(luis) finish this test for checking the parsing of user agent headers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to address these TODO in this PR or as following ones? If the later, then it would be a good idea to make an issue for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can mark it as complete, because I put that before doing integration tests and the simpler header format so I don't think there's something else to add in this tests

Copy link
Member

@hellais hellais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments for things to go over before merging.

Also, I noticed in several places there is some odd formatting (eg. there being an extra space around :, like "foo" : "bar"). Could you run black on all the files you edited to make the formatting consistent?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants