-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
use supplied bandwidth if provided #2552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ok thanks for the report. I think there's a typo in your snippet above. I'm thinking you want to set |
@@ -42,13 +42,12 @@ module.exports = function calc(gd, trace) { | |||
|
|||
// sample standard deviation | |||
var ssd = Lib.stdev(vals, len - 1, cdi.mean); | |||
var bandwidthDflt = ruleOfThumbBandwidth(vals, ssd, cdi.q3 - cdi.q1); | |||
var bandwidth = cdi.bandwidth = trace.bandwidth || bandwidthDflt; | |||
var bandwidth = cdi.bandwidth = trace.bandwidth || ruleOfThumbBandwidth(vals, ssd, cdi.q3 - cdi.q1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for your data I suspect bandwidthDflt
equals 0. Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bandwidthDflt = 3.958842142868651e-8
var span = cdi.span = calcSpan(trace, cdi, valAxis, bandwidth); | ||
|
||
// step that well covers the bandwidth and is multiple of span distance | ||
var dist = span[1] - span[0]; | ||
var n = Math.ceil(dist / (Math.min(bandwidthDflt, bandwidth) / 3)); | ||
var n = Math.ceil(dist / (bandwidth / 3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for choosing just bandwidth
here instead of bandwidthDflt ? Math.min(bandwidthDflt, bandwidth) : bandwidth
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I take the min, then n=103656617
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. In this case, I'm thinking capping n
would be best. Probably n = Math.max(n, 1e5)
should suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably
n = Math.max(n, 1e5)
should suffice
You mean Math.min
, but agreed. As is (with or without this PR) you could crash the browser just by setting a small bandwidth. @etpinard was there a reason to use Math.min(bandwidthDflt, bandwidth)
before? Seems like 3 points per bandwidth will get the curve shape correct even if you set a large explicit bandwidth, won't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was there a reason to use Math.min(bandwidthDflt, bandwidth) before?
Good question. It comes from the very first violin commit. I must have taken that (possibly erroneously) from the seaborn implementation.
@joshua-gould are you interested in fixing this bug as per #2552 (comment)? If not, we should close this PR, and this report into an issue. |
Yes, except I think we should replace 1e5 with an optional parameter rather than hard coding it:
|
I'm not quite sure what would prompt someone to want to vary this, but OK. Then it needs to be an attribute that goes through the |
Hmm. I'm not a big fan of exposing this number actually. No other graphing library does this as far as I know. I think histograms have a similar hard-coded number from #1887 |
Seaborn exposes gridsize - Number of points in the discrete grid used
to compute the kernel density estimate.
…On Mon, Apr 16, 2018 at 3:56 PM, Étienne Tétreault-Pinard ***@***.***> wrote:
Hmm. I'm not a big fan of exposing this number actually. No other graphing
library does this as far as I know. I think histograms have a similar
hard-coded number from #1887
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Ok. Can you share a link to those docs? |
Well yeah, I like the idea of a var n = trace.gridsize || Math.min(bandwidth / 3, 1e5); |
I like Also, it's not spelled out in detail, but seaborn says "Number of points in the discrete grid used to compute the kernel density estimate." - the implication being that they actually do a 2-step process to calculate the KDE:
If you have a lot of points in your distribution, and you're generating a lot of KDE values (and at least by default, the more points you have the lower your bandwidth will be and the more KDE points you need to evaluate), that process could be substantially faster than evaluating the entire distribution for each KDE value. But if we do this, it's not totally clear to me that |
The fix in this PR is works fine for addressing the bug. The important fix is that the supplied bandwidth is used here: plotly.js/src/traces/violin/calc.js Line 50 in 9329a4d
I've been having this issue where the supplied bandwidth is sometimes not being used because of the min calculation, especially if |
@joshua-gould have you made any progress on this? If not, I should have some time to work on this this week. |
I have not. Thanks.
…On Mon, May 14, 2018 at 11:32 AM, Étienne Tétreault-Pinard < ***@***.***> wrote:
@joshua-gould <https://github.com/joshua-gould> have you made any
progress on this?
If not, I should have some time to work on this this week.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2552 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE_pnwkVToOHQieWelExbQNmDyXYezD1ks5tyaN-gaJpZM4TSZrH>
.
|
Looking at the seaborn source, it seems that gridsize means exactly the length of the linear space on which the kde is plotted. |
Hmm. I'll have to dig deeper. Using the same data as in @joshua-gould 's example, Seaborn gives: without having to adjust the bandwidth. |
and by multiplying the Scott's rule estimate by the sample standard deviation, we have something equivalent to the Seaborn result of #2552 (comment) |
#2650 merged commit 9329a4d which makes sure the number of pts in kde estimate is determined by the custom bandwidth when provided. This PR also improved our default bandwidth algo adding a fallback when the Silverman estimate "blows up". We chose to not implement "gridsize" (improving the default bandwidth algo was enough to fix the issue described in the first #2552 (comment)) . If someone still wants "gridsize" or a similar attribute implemented, feel free to open another issue. |
The following code will crash your browser:
var traces = [
</script>{
'bandwidth':0.33643595519279684,
'box': {
'visible': false,
},
'boxpoints': false,
'meanline': {
'visible': true,
},
'name': 'test',
'points': false,
'showlegend': false,
'type': 'violin',
'y': [
2.4403208885500003e-7,
6.624100740240001e-8,
8.164963286729999e-8,
0.00000550607937631,
3.02033017347e-8,
3.7801716937800004e-7,
5.56297659998e-8,
5.406555702509999e-8,
3.31213740697e-8,
1.0835987313399999e-7,
3.19669534548e-7,
2.8017014902900004e-8,
2.4514782933900002e-8,
5.95554346643e-7,
4.70654584492e-8,
2.3361313822699997e-11,
3.06469187871e-7,
2.07771831308e-8,
5.04663715295e-8,
5.67592023418e-8,
6.719484075659999e-8,
2.3359219312100003e-9,
7.32680903665e-8,
1.56153538012e-7,
5.92551838794e-8,
6.48377885533e-8,
6.5098659424e-8,
1.92267340995e-7,
6.27042106128e-10,
3.79763712636e-8,
5.79353363587e-8,
2.01568499962e-7,
4.5710354583899996e-8,
8.68326386359e-8,
2.00314121449e-7,
4.16735959037e-11,
1.05264632654e-7,
1.3219377534799998e-8,
1.1323983991099999e-7,
7.9066491519e-9,
3.29907596399e-7,
3.58952178646e-7,
3.06223788373e-8,
1.34570962274e-7,
1.95650463375e-7,
8.88142553786e-8,
6.33259714546e-7,
7.89150230354e-8,
3.44378121049e-8,
3.56491902649e-8,
7.17020398538e-8,
0.00000117543109624,
1.31723026752e-7,
6.15562220021e-7,
1.0411983170000001e-8,
6.35306917979e-8,
1.4244199622e-7,
5.50507070561e-8,
4.11937918362e-7,
1.1436582759e-7,
9.00381861905e-8,
3.5761650960999997e-7,
5.406555702509999e-8,
1.91340425062e-8,
4.05244277276e-8,
1.67071698044e-8,
2.64702379432e-7,
4.54083719121e-8,
1.15367276299e-8,
7.01160189225e-8,
2.12020626256e-7,
7.55952714792e-8,
3.35252445825e-8,
6.410902643609999e-8,
7.822972433139998e-8,
3.00394443348e-8,
1.85872035552e-7,
3.21694096675e-8,
1.9166603712099998e-8,
1.4382528776500002e-7,
2.30600946296e-7,
1.1977210552999998e-7,
3.21694096675e-8,
3.1720115986e-7,
5.00989110573e-12,
2.14001834758e-9,
3.57324647064e-7,
9.836368168600001e-8,
1.3317117896700002e-8,
6.624100740240001e-8,
3.0650079256e-8,
3.23881793333e-8,
0.0000010477687171799999,
2.47766478953e-8,
4.00638166198e-7,
3.7385796368e-8,
0.00000110210670495,
5.16171069606e-9,
5.5306060294599996e-8,
7.43346458468e-8,
2.98025726842e-8,
4.50223643894e-8,
2.6869636474599998e-11,
2.23102995275e-7,
6.624100740240001e-8,
8.837339991869999e-8,
1.0959679528299999e-7,
7.70759415991e-8,
9.19332979062e-8,
5.72570744297e-9,
5.67957997273e-12,
7.200621520489999e-8,
8.75570719476e-8,
1.38879577383e-7,
5.8323354775900006e-8,
8.46519385822e-8,
0.0000010404658225700002,
2.7344550343299998e-8,
1.1561451197599999e-7,
3.37647982763e-7,
1.03142857372e-7,
3.0261981891400004e-8,
1.0770523872e-7,
5.56297659998e-8,
4.45345220689e-7,
5.190799176779999e-7,
1.1878461076600001e-7,
6.37944980645e-8,
1.028044903e-7,
2.35063847817e-8,
3.35252445825e-8,
1.75860650771e-7,
5.95322200005e-8,
2.3051930735100003e-7,
1.01713704525e-7,
6.36298925611e-8,
8.32962182234e-9,
1.73171410467e-7,
5.92070463445e-8,
1.35192178624e-7,
1.80208181383e-7,
2.32965100186e-7,
5.558425986229999e-7,
1.64668938007e-7,
6.01836809013e-8,
1.9915955477200002e-16,
9.96792751761e-9,
1.63404871286e-9,
2.64702379432e-7,
7.55162232084e-8,
5.49833977189e-8,
3.4953095347400004e-8,
2.21025674493e-8,
1.01029071417e-7,
2.30733613826e-7,
8.50266378712e-8,
3.492997660290001e-8,
1.72122175426e-7,
6.06902637292e-8,
2.0951701376400003e-8,
5.13108366946e-7,
2.80764979683e-7,
7.49786710527e-7,
1.28767098858e-7,
0.00000130482668731,
2.94524334014e-8,
1.0383202068400001e-7,
1.06529852732e-7,
3.06223788373e-8,
6.666061263539999e-8,
2.77669768835e-7,
1.9614212065399998e-7,
7.57552863852e-9,
1.3646260448700003e-8,
1.55911616248e-8,
4.2985875019e-8,
5.35662859053e-7,
1.31723026752e-7,
2.8589493252999997e-8,
6.15562220021e-7,
2.48257838284e-8,
5.54992004418e-8,
1.59590984401e-7,
1.05027747717e-7,
4.16404749061e-15,
3.21694096675e-8,
1.55235536247e-8,
7.22158660254e-8,
7.3548078204e-11,
5.12189467799e-8,
5.78423562453e-7,
7.34977455259e-8,
8.428837152339999e-8,
5.41683911418e-8,
2.0892157394300002e-7,
8.45875572871e-8,
3.19669534548e-7,
2.4721993929599997e-8,
1.97866536768e-7,
3.1078269037e-8,
4.7958877370199995e-8,
1.63190113114e-8,
1.55306103512e-7,
6.20692035531e-7,
3.82587877803e-7,
3.3292850458999995e-12,
6.62976422611e-11,
0.363112695543,
1.2438406655e-7,
1.57716890468e-7,
8.270934005960001e-8,
7.7356150975e-8,
4.17240969082e-7,
1.5340551258000003e-8,
7.2558862077e-7,
8.628154354439999e-8,
0.00000130967390314,
5.63969380635e-8],
},
]
Plotly.newPlot('chart', traces)
The same array works fine with seaborn.kde.