Skip to content

Avoid performing ** operations on values greater than 1e5 #1610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 21, 2022

Conversation

jacobtylerwalls
Copy link
Member

Description

Avoid actually calculating ** operations on values greater than 1e5.

Type of Changes

Type
βœ“ πŸ”¨ Refactoring

Related Issue

Closes pylint-dev/pylint#6745

@coveralls
Copy link

Pull Request Test Coverage Report for Build 2468595687

  • 3 of 3 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.002%) to 92.221%

Totals Coverage Status
Change from base Build 2468320632: 0.002%
Covered Lines: 9342
Relevant Lines: 10130

πŸ’› - Coveralls

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this ! I thought about that issues a little and I was wondering about a time out with a really small threshold (0.01s?) and the possibility to relaunch with a flag if you really want the result. Knowing that something will take a long time to calculate is a NP problem, seeing it took longer than x is a P problem. How long does 9999 ** 9999 takes for exemple ? How long does any mathematical calculation actually takes ? I don't think we want to add a bunch of conditions like that for each possible formula. What do you think?

@jacobtylerwalls
Copy link
Member Author

I don't think we want to add a bunch of conditions like that for each possible formula.

Oh I agree, I just didn't think we were likely to run into any additional cases. Exponents seemed like the only case that could blow up.

I was wondering about a time out with a really small threshold (0.01s?)

Interesting, I'm not aware of a means to interrupt a call like this, do you have a suggestion?

@Pierre-Sassoulas
Copy link
Member

Exponents seemed like the only case that could blow up.

A lot of things have a tendency to surprise us when pylint is ran on a lot of unexpected code πŸ˜„ I don't have an example of calculation that would take a long time but I think it will always be possible to do so. Maybe it's time to open a cop/robber question on code golf stackexchange with one team trying to catch pathological cases and the other one trying to make the infer slow πŸ˜„ ! But maybe it's enough to handle this case only.

I'm not aware of a means to interrupt a call like this, do you have a suggestion?

Actually doing it crossplatform and multiprocessus compatible is not that easy. (I was thinking of signal for that but I checked and it turns out it only works on unix). Do we need to handle multiprocess for that ? It's possible that we're always inferring on a single thread. We have a max_inference option where we just return uninferable if we recurse more than 100 time (or 500 I don't remember). Having a time limit for inference would be somewhat similar and "solve" all pathological cases at once (we had some other the years, especially with pandas/numpy genuinely large recursion).

@jacobtylerwalls
Copy link
Member Author

Cool, I didn't know about signal, but yeah, signal.alarm is not available on Windows.

Problems like this are fun, but, in the interest of expediency, I guess I was thinking a one-off special case would be enough of a "best efforts" basis.

FWIW 1e4 values seem to ** quickly, it's just 1e5 values that start chewing up resources. And * short-circuits straight to OverflowError.

@Pierre-Sassoulas
Copy link
Member

@cdce8p or @hippo91 do you have an opinion on a timeout option for expansive inference calls ? To sum up the discussion, the idea would be to cut inference taking too long after a threshold time, like we cut inference that recurse above a threshold of calls currently.

@jacobtylerwalls
Copy link
Member Author

the idea would be to cut inference taking too long after a threshold time

I love the idea, but are we aware of a way to accomplish this on Windows?

@Pierre-Sassoulas
Copy link
Member

It seems you need to use multiprocess of threads so harder than what I thought it would be with signal but not impossible. (I did not search thoroughly for a windows solution as this fix could be good enough, maybe we don't need the timeout depending on what astroid's experts think πŸ˜„ .)

@hippo91
Copy link
Contributor

hippo91 commented Jun 10, 2022

@Pierre-Sassoulas i think it is a good idea to limit the time spent in inference process but i would like to hear @cdce8p opinion on this...

@cdce8p
Copy link
Member

cdce8p commented Jun 21, 2022

@cdce8p or @hippo91 do you have an opinion on a timeout option for expansive inference calls ? To sum up the discussion, the idea would be to cut inference taking too long after a threshold time, like we cut inference that recurse above a threshold of calls currently.

I'm skeptical if it's truly worth the effort. My understanding is that canceling recursive calls above a threshold is comparatively easy. Any timeout based solution does add additional complexity, even if we can avoid threading / multiprocessing, which needs to be maintained. Additionally, it will make debugging more difficult and might even lead to non-deterministic results which in itself can be a nightmare. In the end we probably have to ask ourselves if it's worth it or if there are better / easier alternatives.

For instance, I do like this PR. It will provide a speedup in some edge cases without adding much complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow performance with large numbers
6 participants