Skip to content

people-and-planet-ai.image-classification.e2e_test: test_predict failed #6463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
flaky-bot bot opened this issue Jul 22, 2021 · 12 comments · Fixed by #6668
Closed

people-and-planet-ai.image-classification.e2e_test: test_predict failed #6463

flaky-bot bot opened this issue Jul 22, 2021 · 12 comments · Fixed by #6668
Assignees
Labels
api: dataflow Issues related to the Dataflow API. flakybot: flaky Tells the Flaky Bot not to close or comment on this issue. flakybot: issue An issue filed by the Flaky Bot. Should not be added manually. priority: p2 Moderately-important priority. Fix may not be included in next release. samples Issues that are directly related to samples. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@flaky-bot
Copy link

flaky-bot bot commented Jul 22, 2021

Note: #6157 was also for this test, but it was closed more than 10 days ago. So, I didn't mark it flaky.


commit: 3b5a4bb
buildURL: Build Status, Sponge
status: failed

Test output
Traceback (most recent call last):
  File "/workspace/people-and-planet-ai/image-classification/e2e_test.py", line 112, in model_endpoint_id
    PROJECT, REGION, MODEL_PATH, MODEL_ENDPOINT, endpoint_id
  File "/workspace/people-and-planet-ai/image-classification/deploy_model.py", line 80, in deploy_model
    deployed_model = response.result()
  File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-7/lib/python3.7/site-packages/google/api_core/future/polling.py", line 135, in result
    raise self._exception
google.api_core.exceptions.InternalServerError: 500 INTERNAL
@flaky-bot flaky-bot bot added flakybot: issue An issue filed by the Flaky Bot. Should not be added manually. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Jul 22, 2021
@product-auto-label product-auto-label bot added the samples Issues that are directly related to samples. label Jul 22, 2021
@flaky-bot flaky-bot bot added the flakybot: flaky Tells the Flaky Bot not to close or comment on this issue. label Jul 22, 2021
@flaky-bot
Copy link
Author

flaky-bot bot commented Jul 22, 2021

Looks like this issue is flaky. 😟

I'm going to leave this open and stop commenting.

A human should fix and close this.


When run at the same commit (3b5a4bb), this test passed in one build (Build Status, Sponge) and failed in another build (Build Status, Sponge).

@dandhlee
Copy link
Collaborator

Oh no it's back D:

@dandhlee
Copy link
Collaborator

dandhlee commented Jul 22, 2021

Closing this for now, this is the same issue we've had from the last time this test was flaky, hoping that it's a one time flake. If it happens again I'll reach back to the product team.

@flaky-bot
Copy link
Author

flaky-bot bot commented Jul 30, 2021

Oops! Looks like this issue is still flaky. It failed again. 😬

I reopened the issue, but a human will need to close it again.


commit: 73fa296
buildURL: Build Status, Sponge
status: failed

Test output
Traceback (most recent call last):
  File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 67, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.NOT_FOUND
	details = "Model `projects/python-docs-samples-tests/locations/us-central1/models/1590773423066316800` is not found."
	debug_error_string = "{"created":"@1627637689.682554879","description":"Error received from peer ipv4:74.125.142.95:443","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Model `projects/python-docs-samples-tests/locations/us-central1/models/1590773423066316800` is not found.","grpc_status":5}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/workspace/people-and-planet-ai/image-classification/e2e_test.py", line 112, in model_endpoint_id
PROJECT, REGION, MODEL_PATH, MODEL_ENDPOINT, endpoint_id
File "/workspace/people-and-planet-ai/image-classification/deploy_model.py", line 77, in deploy_model
traffic_split={"0": 100},
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/google/cloud/aiplatform_v1/services/endpoint_service/client.py", line 905, in deploy_model
response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py", line 145, in call
return wrapped_func(*args, **kwargs)
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 69, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "", line 3, in raise_from
google.api_core.exceptions.NotFound: 404 Model projects/python-docs-samples-tests/locations/us-central1/models/1590773423066316800 is not found.

@flaky-bot flaky-bot bot reopened this Jul 30, 2021
@parthea
Copy link
Collaborator

parthea commented Jul 30, 2021

This error google.api_core.exceptions.NotFound: 404 Model is different than the OP, which is an internal 500 error. I'm going to close this issue and see if it comes back.

@parthea parthea closed this as completed Jul 30, 2021
@flaky-bot
Copy link
Author

flaky-bot bot commented Jul 30, 2021

Oops! Looks like this issue is still flaky. It failed again. 😬

I reopened the issue, but a human will need to close it again.


commit: 73fa296
buildURL: Build Status, Sponge
status: failed

Test output
Traceback (most recent call last):
  File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 67, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-8/lib/python3.8/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-8/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.NOT_FOUND
	details = "Model `projects/python-docs-samples-tests/locations/us-central1/models/1590773423066316800` is not found."
	debug_error_string = "{"created":"@1627645645.653339393","description":"Error received from peer ipv4:74.125.142.95:443","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Model `projects/python-docs-samples-tests/locations/us-central1/models/1590773423066316800` is not found.","grpc_status":5}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/workspace/people-and-planet-ai/image-classification/e2e_test.py", line 111, in model_endpoint_id
deployed_model_id = deploy_model.deploy_model(
File "/workspace/people-and-planet-ai/image-classification/deploy_model.py", line 65, in deploy_model
response = client.deploy_model(
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-8/lib/python3.8/site-packages/google/cloud/aiplatform_v1/services/endpoint_service/client.py", line 905, in deploy_model
response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-8/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py", line 145, in call
return wrapped_func(*args, **kwargs)
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 69, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "", line 3, in raise_from
google.api_core.exceptions.NotFound: 404 Model projects/python-docs-samples-tests/locations/us-central1/models/1590773423066316800 is not found.

@flaky-bot flaky-bot bot reopened this Jul 30, 2021
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Aug 1, 2021
@busunkim96
Copy link
Contributor

@leahecole I remember you mentioning a model had disappeared, is this the same issue?

@busunkim96
Copy link
Contributor

#6503 looks like the fix.

@busunkim96 busunkim96 added the api: dataflow Issues related to the Dataflow API. label Aug 3, 2021
@busunkim96 busunkim96 assigned davidcavazos and unassigned engelke Aug 3, 2021
@flaky-bot flaky-bot bot reopened this Aug 15, 2021
@flaky-bot
Copy link
Author

flaky-bot bot commented Aug 15, 2021

Oops! Looks like this issue is still flaky. It failed again. 😬

I reopened the issue, but a human will need to close it again.


commit: 7fa18f1
buildURL: Build Status, Sponge
status: failed

Test output
Traceback (most recent call last):
  File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen
    chunked=chunked,
  File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "", line 3, in raise_from
  File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.6/http/client.py", line 1379, in getresponse
    response.begin()
  File "/usr/local/lib/python3.6/http/client.py", line 311, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.6/http/client.py", line 272, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.6/ssl.py", line 1012, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/lib/python3.6/ssl.py", line 874, in read
    return self._sslobj.read(len, buffer)
  File "/usr/local/lib/python3.6/ssl.py", line 631, in read
    v = self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen
chunked=chunked,
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.6/http/client.py", line 1379, in getresponse
response.begin()
File "/usr/local/lib/python3.6/http/client.py", line 311, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.6/http/client.py", line 272, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/local/lib/python3.6/socket.py", line 586, in readinto
return self._sock.recv_into(b)
File "/usr/local/lib/python3.6/ssl.py", line 1012, in recv_into
return self.read(nbytes, buffer)
File "/usr/local/lib/python3.6/ssl.py", line 874, in read
return self._sslobj.read(len, buffer)
File "/usr/local/lib/python3.6/ssl.py", line 631, in read
v = self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/workspace/people-and-planet-ai/image-classification/e2e_test.py", line 177, in test_predict
image_file="animals/0036/0072.jpg", # tapirus indicus
File "/workspace/people-and-planet-ai/image-classification/predict.py", line 46, in run
image_bytes = requests.get(f"{base_url}/{image_file}").content
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-6/lib/python3.6/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

@busunkim96
Copy link
Contributor

Connection reset by peer is a transient error, closing.

@flaky-bot
Copy link
Author

flaky-bot bot commented Aug 23, 2021

Oops! Looks like this issue is still flaky. It failed again. 😬

I reopened the issue, but a human will need to close it again.


commit: 2e6131a
buildURL: Build Status, Sponge
status: failed

Test output
Traceback (most recent call last):
  File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 706, in urlopen
    chunked=chunked,
  File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "", line 3, in raise_from
  File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.7/http/client.py", line 1369, in getresponse
    response.begin()
  File "/usr/local/lib/python3.7/http/client.py", line 310, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.7/http/client.py", line 271, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.7/ssl.py", line 1071, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/lib/python3.7/ssl.py", line 929, in read
    return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-7/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 756, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-7/lib/python3.7/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-7/lib/python3.7/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 706, in urlopen
chunked=chunked,
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.7/http/client.py", line 1369, in getresponse
response.begin()
File "/usr/local/lib/python3.7/http/client.py", line 310, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.7/http/client.py", line 271, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/local/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/usr/local/lib/python3.7/ssl.py", line 1071, in recv_into
return self.read(nbytes, buffer)
File "/usr/local/lib/python3.7/ssl.py", line 929, in read
return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/workspace/people-and-planet-ai/image-classification/e2e_test.py", line 177, in test_predict
image_file="animals/0036/0072.jpg", # tapirus indicus
File "/workspace/people-and-planet-ai/image-classification/predict.py", line 46, in run
image_bytes = requests.get(f"{base_url}/{image_file}").content
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-7/lib/python3.7/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-7/lib/python3.7/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-7/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-7/lib/python3.7/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/workspace/people-and-planet-ai/image-classification/.nox/py-3-7/lib/python3.7/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

@flaky-bot flaky-bot bot reopened this Aug 23, 2021
@meredithslota meredithslota removed the priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. label Aug 28, 2021
@meredithslota meredithslota added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed 🚨 This issue needs some love. labels Aug 28, 2021
@meredithslota
Copy link
Contributor

Ok. These seems real enough now, but David has been OOO. I'm dropping this to P2 since it's not (currently) blocking anything, and will ask David to look at it when he's back.

gcf-merge-on-green bot pushed a commit that referenced this issue Sep 8, 2021
…6668)

## Description

Fixes #6463

Even though the `Connection reset by peer` is a transient error and we can't do much to solve it on our side, we retry the request with exponential backoff and hopefully it'll work at some point. This will hopefully decrease the number of flaky test runs.

## Checklist
- [ ] I have followed [Sample Guidelines from AUTHORING_GUIDE.MD](https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/AUTHORING_GUIDE.md)
- [ ] README is updated to include [all relevant information](https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/AUTHORING_GUIDE.md#readme-file)
- [ ] **Tests** pass:   `nox -s py-3.6` (see [Test Environment Setup](https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/AUTHORING_GUIDE.md#test-environment-setup))
- [ ] **Lint** pass:   `nox -s lint` (see [Test Environment Setup](https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/AUTHORING_GUIDE.md#test-environment-setup))
- [ ] These samples need a new **API enabled** in testing projects to pass (let us know which ones)
- [ ] These samples need a new/updated **env vars** in testing projects set to pass (let us know which ones)
- [ ] Please **merge** this PR for me once it is approved.
- [ ] This sample adds a new sample directory, and I updated the [CODEOWNERS file](https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/.github/CODEOWNERS) with the codeowners for this sample
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: dataflow Issues related to the Dataflow API. flakybot: flaky Tells the Flaky Bot not to close or comment on this issue. flakybot: issue An issue filed by the Flaky Bot. Should not be added manually. priority: p2 Moderately-important priority. Fix may not be included in next release. samples Issues that are directly related to samples. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants