Skip to content

Move PHP default threshold to 28 #97

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 8, 2016
Merged

Conversation

pbrisbin
Copy link
Contributor

@pbrisbin pbrisbin commented Feb 8, 2016

When analyzing the WordPress project at the current default mass threshold of
10, it was observed that Duplications found only by Platform had masses of 27
or less, while Duplications found by both Platform and Classic had masses of 28
or greater. Note: there were no Duplications found only by Classic.

Therefore, a threshold of 28 should ensure an analysis on Platform emits the
same Duplication issues as on Classic.

/cc @codeclimate/review @noahd1

I think this could be shipped without a Changelog. It should cause a small
uptick in GPA. For example, WordPress went from 0.85 to 0.88.

When analyzing the WordPress project at the current default mass
threshold of 10, it was observed that Duplications found only by
Platform had masses of 27 or less, while Duplications found by both
Platform and Classic had masses of 28 or greater. Note: there were no
Duplications found only by Classic.

Therefore, a threshold of 28 should ensure an analysis on Platform emits
the same Duplication issues as on Classic.
@pbrisbin pbrisbin force-pushed the pb-php-default-threshold branch from 7ddb13c to 944ce64 Compare February 8, 2016 21:19
@gdiggs
Copy link
Contributor

gdiggs commented Feb 8, 2016

LGTM

@ABaldwinHunter
Copy link
Contributor

@pbrisbin sounds good for threshold change.

Since you're changing that, can you also change the points per from 100_000 to 50_000 to match Classic as well?
https://github.com/codeclimate/analyzer/blob/master/lib/quality/rubric.rb#L20

@pbrisbin
Copy link
Contributor Author

pbrisbin commented Feb 8, 2016

@ABaldwinHunter I'd rather make smaller individual changes. I haven't looked at tuning points yet, just working on the threshold.

@ABaldwinHunter
Copy link
Contributor

@pbrisbin okay suit yourself. I'm surprised to see specs pass for php

pbrisbin added a commit that referenced this pull request Feb 8, 2016
@pbrisbin pbrisbin merged commit 7132fc9 into master Feb 8, 2016
@pbrisbin pbrisbin deleted the pb-php-default-threshold branch February 8, 2016 22:59
@noahd1
Copy link
Contributor

noahd1 commented Feb 9, 2016

"When analyzing the WordPress project at the current default mass threshold of 10, it was observed that Duplications found only by Platform had masses of 27 or less, while Duplications found by both Platform and Classic had masses of 28 or greater. Note: there were no Duplications found only by Classic."

Confused by this explanation: is there anything surprising about this behavior? Seems like that's how threshold is supposed to work?

@pbrisbin
Copy link
Contributor Author

pbrisbin commented Feb 9, 2016

Did you expect surprising? It's just the justification for choosing 28 vs
any other number.

On Mon, Feb 8, 2016, 19:18 Noah Davis [email protected] wrote:

"When analyzing the WordPress project at the current default mass
threshold of 10, it was observed that Duplications found only by Platform
had masses of 27 or less, while Duplications found by both Platform and
Classic had masses of 28 or greater. Note: there were no Duplications found
only by Classic."

Confused by this explanation: is there anything surprising about this
behavior? Seems like that's how threshold is supposed to work?


Reply to this email directly or view it on GitHub
#97 (comment)
.

@noahd1
Copy link
Contributor

noahd1 commented Feb 9, 2016

It seems like you're saying "I picked 28 because it's what we had on
classic" in a longer way. Or am I misunderstanding?
On Mon, Feb 8, 2016 at 8:20 PM pat brisbin [email protected] wrote:

Did you expect surprising? It's just the justification for choosing 28 vs
any other number.

On Mon, Feb 8, 2016, 19:18 Noah Davis [email protected] wrote:

"When analyzing the WordPress project at the current default mass
threshold of 10, it was observed that Duplications found only by Platform
had masses of 27 or less, while Duplications found by both Platform and
Classic had masses of 28 or greater. Note: there were no Duplications
found
only by Classic."

Confused by this explanation: is there anything surprising about this
behavior? Seems like that's how threshold is supposed to work?


Reply to this email directly or view it on GitHub
<
#97 (comment)

.


Reply to this email directly or view it on GitHub
#97 (comment)
.

Noah Davis, VP of Operations / [email protected] / We're hiring! /
http://jobs.codeclimate.com

@pbrisbin
Copy link
Contributor Author

pbrisbin commented Feb 9, 2016

The masses are different between platform and classic, so it's not quite
that simple, but yes that was the intention.

I picked the mass that, in platform, caused the same instances of
duplication to be reported as in classic, for WordPress. Is that any
clearer / shorter?

On Mon, Feb 8, 2016, 21:09 Noah Davis [email protected] wrote:

It seems like you're saying "I picked 28 because it's what we had on
classic" in a longer way. Or am I misunderstanding?
On Mon, Feb 8, 2016 at 8:20 PM pat brisbin [email protected]
wrote:

Did you expect surprising? It's just the justification for choosing 28 vs
any other number.

On Mon, Feb 8, 2016, 19:18 Noah Davis [email protected] wrote:

"When analyzing the WordPress project at the current default mass
threshold of 10, it was observed that Duplications found only by
Platform
had masses of 27 or less, while Duplications found by both Platform and
Classic had masses of 28 or greater. Note: there were no Duplications
found
only by Classic."

Confused by this explanation: is there anything surprising about this
behavior? Seems like that's how threshold is supposed to work?


Reply to this email directly or view it on GitHub
<

#97 (comment)

.


Reply to this email directly or view it on GitHub
<
#97 (comment)

.

Noah Davis, VP of Operations / [email protected] / We're hiring! /
http://jobs.codeclimate.com


Reply to this email directly or view it on GitHub
#97 (comment)
.

@noahd1
Copy link
Contributor

noahd1 commented Feb 9, 2016

Yeah sorry, not nit-picking English just trying to understand your thought
process.

Honestly I'm still confused, but if it's clearer in person don't want to
waste your time.

To me it seems like you're describing behavior that is the way it is by
definition. Aka, classic won't have issues with mass less than 27 by
definition (because that's it's configured mass threshold). And platform
(previous to your change) would also find these issues (and more ) because
it's threshold was 10.

I'm also confused that you state that the masses differ between classic and
platform but advocate for the same mass threshold. I would have expected
the one with a larger mass to have a larger default mass threshold. (Or
perhaps the same if they don't differ too much? Is that what you're saying)

Sorry if I'm uber confused/confusing things.

On Mon, Feb 8, 2016 at 9:21 PM pat brisbin [email protected] wrote:

The masses are different between platform and classic, so it's not quite
that simple, but yes that was the intention.

I picked the mass that, in platform, caused the same instances of
duplication to be reported as in classic, for WordPress. Is that any
clearer / shorter?

On Mon, Feb 8, 2016, 21:09 Noah Davis [email protected] wrote:

It seems like you're saying "I picked 28 because it's what we had on
classic" in a longer way. Or am I misunderstanding?
On Mon, Feb 8, 2016 at 8:20 PM pat brisbin [email protected]
wrote:

Did you expect surprising? It's just the justification for choosing 28
vs
any other number.

On Mon, Feb 8, 2016, 19:18 Noah Davis [email protected]
wrote:

"When analyzing the WordPress project at the current default mass
threshold of 10, it was observed that Duplications found only by
Platform
had masses of 27 or less, while Duplications found by both Platform
and
Classic had masses of 28 or greater. Note: there were no Duplications
found
only by Classic."

Confused by this explanation: is there anything surprising about this
behavior? Seems like that's how threshold is supposed to work?


Reply to this email directly or view it on GitHub
<

#97 (comment)

.


Reply to this email directly or view it on GitHub
<

#97 (comment)

.

Noah Davis, VP of Operations / [email protected] / We're hiring! /
http://jobs.codeclimate.com


Reply to this email directly or view it on GitHub
<
#97 (comment)

.


Reply to this email directly or view it on GitHub
#97 (comment)
.

Noah Davis, VP of Operations / [email protected] / We're hiring! /
http://jobs.codeclimate.com

@pbrisbin
Copy link
Contributor Author

pbrisbin commented Feb 9, 2016

I'll try to clarify tomorrow when not on my phone. If we need to chat in
person, we can do that too.

On Mon, Feb 8, 2016, 21:55 Noah Davis [email protected] wrote:

Yeah sorry, not nit-picking English just trying to understand your thought
process.

Honestly I'm still confused, but if it's clearer in person don't want to
waste your time.

To me it seems like you're describing behavior that is the way it is by
definition. Aka, classic won't have issues with mass less than 27 by
definition (because that's it's configured mass threshold). And platform
(previous to your change) would also find these issues (and more ) because
it's threshold was 10.

I'm also confused that you state that the masses differ between classic and
platform but advocate for the same mass threshold. I would have expected
the one with a larger mass to have a larger default mass threshold. (Or
perhaps the same if they don't differ too much? Is that what you're saying)

Sorry if I'm uber confused/confusing things.

On Mon, Feb 8, 2016 at 9:21 PM pat brisbin [email protected]
wrote:

The masses are different between platform and classic, so it's not quite
that simple, but yes that was the intention.

I picked the mass that, in platform, caused the same instances of
duplication to be reported as in classic, for WordPress. Is that any
clearer / shorter?

On Mon, Feb 8, 2016, 21:09 Noah Davis [email protected] wrote:

It seems like you're saying "I picked 28 because it's what we had on
classic" in a longer way. Or am I misunderstanding?
On Mon, Feb 8, 2016 at 8:20 PM pat brisbin [email protected]
wrote:

Did you expect surprising? It's just the justification for choosing
28
vs
any other number.

On Mon, Feb 8, 2016, 19:18 Noah Davis [email protected]
wrote:

"When analyzing the WordPress project at the current default mass
threshold of 10, it was observed that Duplications found only by
Platform
had masses of 27 or less, while Duplications found by both Platform
and
Classic had masses of 28 or greater. Note: there were no
Duplications
found
only by Classic."

Confused by this explanation: is there anything surprising about
this
behavior? Seems like that's how threshold is supposed to work?


Reply to this email directly or view it on GitHub
<

#97 (comment)

.


Reply to this email directly or view it on GitHub
<

#97 (comment)

.

Noah Davis, VP of Operations / [email protected] / We're hiring! /
http://jobs.codeclimate.com


Reply to this email directly or view it on GitHub
<

#97 (comment)

.


Reply to this email directly or view it on GitHub
<
#97 (comment)

.

Noah Davis, VP of Operations / [email protected] / We're hiring! /
http://jobs.codeclimate.com


Reply to this email directly or view it on GitHub
#97 (comment)
.

@pbrisbin
Copy link
Contributor Author

pbrisbin commented Feb 9, 2016

So the problem here is I was confused :)


Basically, I went through this process:

View duplications in a file under Platform analysis

Instance Mass
1 10
2 18
3 47
4 68

(Note: these numbers are made up)

Compare the same file under Classic analysis

Instance Mass (platform) Mass (classic)
1 10 not-visible
2 18 not-visible
3 47 58
4 68 102

This told me a few things:

  • Platform reports different masses than Classic (more on this later)
  • Platform should use some threshold between 19 and 47 to produce the same set of instances as classic

Then I moved to the console. There I listed out all the masses in Platform and Classic for this project (WordPress/WordPress). I found that instances found in both case were all (platform) mass 28 or more and all instances found only in platform had (platform) mass 27 or less.

This is how I arrived at 28.

Unfortunately, I never looked at the classic code base. I assumed that, since the masses are different, the threshold must be different. And since I'd just determined one analytically, I didn't think I needed to.

Turns out, classic does use 28! @noahd1 knew that, and assumed I knew that, and my attempts at explanation were that much more confusing.

So how can that be? If the masses are different, why does the same threshold work?

Well, originally I noticed the difference in mass between platform and classic in the UI. I never actually compared them beyond that (even in my console analysis, I was only looking at platform masses and which instances appeared in one or both sets).

Turns out, classic is actually showing an "effective" mass in that UI; one which takes number of occurrences and identical-vs-similar into account. In actuality, platform and classic produce the same "raw" mass, and so they can and should use the same threshold.

My analysis in the console did result in the right number (28) and the fact that it does match classic (now that we know that, and know why) further bolsters its appropriateness. And I apologize for not fully understanding what was going on and therefore being unable to explain things correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants