Move PHP default threshold to 28 #97

pbrisbin · 2016-02-08T20:36:56Z

When analyzing the WordPress project at the current default mass threshold of
10, it was observed that Duplications found only by Platform had masses of 27
or less, while Duplications found by both Platform and Classic had masses of 28
or greater. Note: there were no Duplications found only by Classic.

Therefore, a threshold of 28 should ensure an analysis on Platform emits the
same Duplication issues as on Classic.

/cc @codeclimate/review @noahd1

I think this could be shipped without a Changelog. It should cause a small
uptick in GPA. For example, WordPress went from 0.85 to 0.88.

When analyzing the WordPress project at the current default mass threshold of 10, it was observed that Duplications found only by Platform had masses of 27 or less, while Duplications found by both Platform and Classic had masses of 28 or greater. Note: there were no Duplications found only by Classic. Therefore, a threshold of 28 should ensure an analysis on Platform emits the same Duplication issues as on Classic.

gdiggs · 2016-02-08T21:30:08Z

LGTM

ABaldwinHunter · 2016-02-08T22:46:21Z

@pbrisbin sounds good for threshold change.

Since you're changing that, can you also change the points per from 100_000 to 50_000 to match Classic as well?
https://github.com/codeclimate/analyzer/blob/master/lib/quality/rubric.rb#L20

pbrisbin · 2016-02-08T22:51:06Z

@ABaldwinHunter I'd rather make smaller individual changes. I haven't looked at tuning points yet, just working on the threshold.

ABaldwinHunter · 2016-02-08T22:56:50Z

@pbrisbin okay suit yourself. I'm surprised to see specs pass for php

Move PHP default threshold to 28

noahd1 · 2016-02-09T00:18:08Z

"When analyzing the WordPress project at the current default mass threshold of 10, it was observed that Duplications found only by Platform had masses of 27 or less, while Duplications found by both Platform and Classic had masses of 28 or greater. Note: there were no Duplications found only by Classic."

Confused by this explanation: is there anything surprising about this behavior? Seems like that's how threshold is supposed to work?

pbrisbin · 2016-02-09T01:20:09Z

Did you expect surprising? It's just the justification for choosing 28 vs
any other number.

On Mon, Feb 8, 2016, 19:18 Noah Davis [email protected] wrote:

"When analyzing the WordPress project at the current default mass
threshold of 10, it was observed that Duplications found only by Platform
had masses of 27 or less, while Duplications found by both Platform and
Classic had masses of 28 or greater. Note: there were no Duplications found
only by Classic."

Confused by this explanation: is there anything surprising about this
behavior? Seems like that's how threshold is supposed to work?

—
Reply to this email directly or view it on GitHub
#97 (comment)
.

noahd1 · 2016-02-09T02:09:04Z

It seems like you're saying "I picked 28 because it's what we had on
classic" in a longer way. Or am I misunderstanding?
On Mon, Feb 8, 2016 at 8:20 PM pat brisbin [email protected] wrote:

Did you expect surprising? It's just the justification for choosing 28 vs
any other number.

On Mon, Feb 8, 2016, 19:18 Noah Davis [email protected] wrote:

"When analyzing the WordPress project at the current default mass
threshold of 10, it was observed that Duplications found only by Platform
had masses of 27 or less, while Duplications found by both Platform and
Classic had masses of 28 or greater. Note: there were no Duplications
found
only by Classic."

Confused by this explanation: is there anything surprising about this
behavior? Seems like that's how threshold is supposed to work?

—
Reply to this email directly or view it on GitHub
<
#97 (comment)

.

—
Reply to this email directly or view it on GitHub
#97 (comment)
.

Noah Davis, VP of Operations / [email protected] / We're hiring! /
http://jobs.codeclimate.com

pbrisbin · 2016-02-09T02:21:25Z

The masses are different between platform and classic, so it's not quite
that simple, but yes that was the intention.

I picked the mass that, in platform, caused the same instances of
duplication to be reported as in classic, for WordPress. Is that any
clearer / shorter?

On Mon, Feb 8, 2016, 21:09 Noah Davis [email protected] wrote:

It seems like you're saying "I picked 28 because it's what we had on
classic" in a longer way. Or am I misunderstanding?
On Mon, Feb 8, 2016 at 8:20 PM pat brisbin [email protected]
wrote:

Did you expect surprising? It's just the justification for choosing 28 vs
any other number.

On Mon, Feb 8, 2016, 19:18 Noah Davis [email protected] wrote:

"When analyzing the WordPress project at the current default mass
threshold of 10, it was observed that Duplications found only by
Platform
had masses of 27 or less, while Duplications found by both Platform and
Classic had masses of 28 or greater. Note: there were no Duplications
found
only by Classic."

Confused by this explanation: is there anything surprising about this
behavior? Seems like that's how threshold is supposed to work?

—
Reply to this email directly or view it on GitHub
<

#97 (comment)

.

—
Reply to this email directly or view it on GitHub
<
#97 (comment)

.

Noah Davis, VP of Operations / [email protected] / We're hiring! /
http://jobs.codeclimate.com

—
Reply to this email directly or view it on GitHub
#97 (comment)
.

noahd1 · 2016-02-09T02:55:00Z

Yeah sorry, not nit-picking English just trying to understand your thought
process.

Honestly I'm still confused, but if it's clearer in person don't want to
waste your time.

To me it seems like you're describing behavior that is the way it is by
definition. Aka, classic won't have issues with mass less than 27 by
definition (because that's it's configured mass threshold). And platform
(previous to your change) would also find these issues (and more ) because
it's threshold was 10.

I'm also confused that you state that the masses differ between classic and
platform but advocate for the same mass threshold. I would have expected
the one with a larger mass to have a larger default mass threshold. (Or
perhaps the same if they don't differ too much? Is that what you're saying)

Sorry if I'm uber confused/confusing things.

On Mon, Feb 8, 2016 at 9:21 PM pat brisbin [email protected] wrote:

The masses are different between platform and classic, so it's not quite
that simple, but yes that was the intention.

I picked the mass that, in platform, caused the same instances of
duplication to be reported as in classic, for WordPress. Is that any
clearer / shorter?

On Mon, Feb 8, 2016, 21:09 Noah Davis [email protected] wrote:

It seems like you're saying "I picked 28 because it's what we had on
classic" in a longer way. Or am I misunderstanding?
On Mon, Feb 8, 2016 at 8:20 PM pat brisbin [email protected]
wrote:

Did you expect surprising? It's just the justification for choosing 28
vs
any other number.

On Mon, Feb 8, 2016, 19:18 Noah Davis [email protected]
wrote:

"When analyzing the WordPress project at the current default mass
threshold of 10, it was observed that Duplications found only by
Platform
had masses of 27 or less, while Duplications found by both Platform
and
Classic had masses of 28 or greater. Note: there were no Duplications
found
only by Classic."

Confused by this explanation: is there anything surprising about this
behavior? Seems like that's how threshold is supposed to work?

—
Reply to this email directly or view it on GitHub
<

#97 (comment)

.

—
Reply to this email directly or view it on GitHub
<

#97 (comment)

.

Noah Davis, VP of Operations / [email protected] / We're hiring! /
http://jobs.codeclimate.com

—
Reply to this email directly or view it on GitHub
<
#97 (comment)

.

—
Reply to this email directly or view it on GitHub
#97 (comment)
.

Noah Davis, VP of Operations / [email protected] / We're hiring! /
http://jobs.codeclimate.com

pbrisbin · 2016-02-09T02:59:36Z

I'll try to clarify tomorrow when not on my phone. If we need to chat in
person, we can do that too.

On Mon, Feb 8, 2016, 21:55 Noah Davis [email protected] wrote:

Yeah sorry, not nit-picking English just trying to understand your thought
process.

Honestly I'm still confused, but if it's clearer in person don't want to
waste your time.

To me it seems like you're describing behavior that is the way it is by
definition. Aka, classic won't have issues with mass less than 27 by
definition (because that's it's configured mass threshold). And platform
(previous to your change) would also find these issues (and more ) because
it's threshold was 10.

I'm also confused that you state that the masses differ between classic and
platform but advocate for the same mass threshold. I would have expected
the one with a larger mass to have a larger default mass threshold. (Or
perhaps the same if they don't differ too much? Is that what you're saying)

Sorry if I'm uber confused/confusing things.

On Mon, Feb 8, 2016 at 9:21 PM pat brisbin [email protected]
wrote:

The masses are different between platform and classic, so it's not quite
that simple, but yes that was the intention.

I picked the mass that, in platform, caused the same instances of
duplication to be reported as in classic, for WordPress. Is that any
clearer / shorter?

On Mon, Feb 8, 2016, 21:09 Noah Davis [email protected] wrote:

It seems like you're saying "I picked 28 because it's what we had on
classic" in a longer way. Or am I misunderstanding?
On Mon, Feb 8, 2016 at 8:20 PM pat brisbin [email protected]
wrote:

Did you expect surprising? It's just the justification for choosing
28
vs
any other number.

On Mon, Feb 8, 2016, 19:18 Noah Davis [email protected]
wrote:

"When analyzing the WordPress project at the current default mass
threshold of 10, it was observed that Duplications found only by
Platform
had masses of 27 or less, while Duplications found by both Platform
and
Classic had masses of 28 or greater. Note: there were no
Duplications
found
only by Classic."

Confused by this explanation: is there anything surprising about
this
behavior? Seems like that's how threshold is supposed to work?

—
Reply to this email directly or view it on GitHub
<

#97 (comment)

.

—
Reply to this email directly or view it on GitHub
<

#97 (comment)

.

Noah Davis, VP of Operations / [email protected] / We're hiring! /
http://jobs.codeclimate.com

—
Reply to this email directly or view it on GitHub
<

#97 (comment)

.

—
Reply to this email directly or view it on GitHub
<
#97 (comment)

.

Noah Davis, VP of Operations / [email protected] / We're hiring! /
http://jobs.codeclimate.com

—
Reply to this email directly or view it on GitHub
#97 (comment)
.

pbrisbin · 2016-02-09T17:43:23Z

So the problem here is I was confused :)

Basically, I went through this process:

View duplications in a file under Platform analysis

Instance	Mass
1	10
2	18
3	47
4	68

(Note: these numbers are made up)

Compare the same file under Classic analysis

Instance	Mass (platform)	Mass (classic)
1	10	not-visible
2	18	not-visible
3	47	58
4	68	102

This told me a few things:

Platform reports different masses than Classic (more on this later)
Platform should use some threshold between 19 and 47 to produce the same set of instances as classic

Then I moved to the console. There I listed out all the masses in Platform and Classic for this project (WordPress/WordPress). I found that instances found in both case were all (platform) mass 28 or more and all instances found only in platform had (platform) mass 27 or less.

This is how I arrived at 28.

Unfortunately, I never looked at the classic code base. I assumed that, since the masses are different, the threshold must be different. And since I'd just determined one analytically, I didn't think I needed to.

Turns out, classic does use 28! @noahd1 knew that, and assumed I knew that, and my attempts at explanation were that much more confusing.

So how can that be? If the masses are different, why does the same threshold work?

Well, originally I noticed the difference in mass between platform and classic in the UI. I never actually compared them beyond that (even in my console analysis, I was only looking at platform masses and which instances appeared in one or both sets).

Turns out, classic is actually showing an "effective" mass in that UI; one which takes number of occurrences and identical-vs-similar into account. In actuality, platform and classic produce the same "raw" mass, and so they can and should use the same threshold.

My analysis in the console did result in the right number (28) and the fact that it does match classic (now that we know that, and know why) further bolsters its appropriateness. And I apologize for not fully understanding what was going on and therefore being unable to explain things correctly.

pbrisbin force-pushed the pb-php-default-threshold branch from 7ddb13c to 944ce64 Compare February 8, 2016 21:19

pbrisbin added a commit that referenced this pull request Feb 8, 2016

Merge pull request #97 from codeclimate/pb-php-default-threshold

7132fc9

Move PHP default threshold to 28

pbrisbin merged commit 7132fc9 into master Feb 8, 2016

pbrisbin deleted the pb-php-default-threshold branch February 8, 2016 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move PHP default threshold to 28 #97

Move PHP default threshold to 28 #97

Uh oh!

pbrisbin commented Feb 8, 2016

Uh oh!

gdiggs commented Feb 8, 2016

Uh oh!

ABaldwinHunter commented Feb 8, 2016

Uh oh!

pbrisbin commented Feb 8, 2016

Uh oh!

ABaldwinHunter commented Feb 8, 2016

Uh oh!

noahd1 commented Feb 9, 2016

Uh oh!

pbrisbin commented Feb 9, 2016

Uh oh!

noahd1 commented Feb 9, 2016

Uh oh!

pbrisbin commented Feb 9, 2016

Uh oh!

noahd1 commented Feb 9, 2016

Uh oh!

pbrisbin commented Feb 9, 2016

Uh oh!

pbrisbin commented Feb 9, 2016

Uh oh!

Uh oh!

Move PHP default threshold to 28 #97

Move PHP default threshold to 28 #97

Uh oh!

Conversation

pbrisbin commented Feb 8, 2016

Uh oh!

gdiggs commented Feb 8, 2016

Uh oh!

ABaldwinHunter commented Feb 8, 2016

Uh oh!

pbrisbin commented Feb 8, 2016

Uh oh!

ABaldwinHunter commented Feb 8, 2016

Uh oh!

noahd1 commented Feb 9, 2016

Uh oh!

pbrisbin commented Feb 9, 2016

Uh oh!

noahd1 commented Feb 9, 2016

Uh oh!

pbrisbin commented Feb 9, 2016

Uh oh!

noahd1 commented Feb 9, 2016

Uh oh!

pbrisbin commented Feb 9, 2016

Uh oh!

pbrisbin commented Feb 9, 2016

Uh oh!

Uh oh!