[ENH] KCD and Bregman tests #15

adam2392 · 2023-06-22T23:59:53Z

Changes proposed in this pull request:

Add KCD test and BregmanCDTest to the discrepancy/ submodule

To enable this and to simultaneously improve the readability and maintainability of the code:

Refactor changes to enable code-sharing between kernel-based methods
Refactored the delta kernel to be more readable and also added a unit-test. In addition, I made it in-line w/ the way that pairwise kernel functions should be, so it's compatible with sklearn API.
- Specifically refactor the way that the kernels are computed, to make it more in-line with how sklearn does pairwise kernels. This leverages their pairwise_kernels function, which inherently is parallelized probably better than we can, and can leverage all possible kernels they have and more. Moreover, I added a pattern to demonstrate that we can augment the PAIRWISE_KERNEL_FUNCTIONS from sklearn whenever we need to add a kernel that is more efficient to compute using vectorized operations of numpy. This allows us to keep the delta_kernel as is.
Speed up the kci unit tests. Before just the kci unit-tests took on average 10+ seconds to run. Now they run in less than 2 seconds with the same results to prevent regressions.

Now entire unit-test suite takes 8 seconds to run.

Refactoring keeps functionality

None of the tests changed functionally. If anything, I lowered the number of samples used for most tests and they all still pass, so the refactoring does not introduce regressions.

Before submitting

I've read and followed all steps in the Making a pull request
section of the CONTRIBUTING docs.
I've updated or added any relevant docstrings following the syntax described in the
Writing docstrings section of the CONTRIBUTING docs.
If this PR fixes a bug, I've added a test that will fail without my fix.
If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.
I have added a changelog entry succintly describing the change in this PR in the whats_new relevant version document.

After submitting

All GitHub Actions jobs for my pull request have passed.

Signed-off-by: Adam Li <[email protected]>

adam2392 · 2023-06-27T15:11:36Z

We should have an example demonstrating how to leverage different kernels. We can ideally show how different kernels are useful too on a small simulated data.

Signed-off-by: Adam Li <[email protected]>

adam2392 · 2023-06-27T15:16:27Z

Hmm a bit weird, but I think the preserve_random_seed decorator messes up the sphinx auto-doc: https://output.circle-artifacts.com/output/job/9f13ba5a-fb34-49d5-be02-2af229ef8e88/artifacts/0/dev/generated/pywhy_stats.independence.kci.html#module-pywhy_stats.independence.kci

compared to https://output.circle-artifacts.com/output/job/9f13ba5a-fb34-49d5-be02-2af229ef8e88/artifacts/0/dev/generated/pywhy_stats.independence.fisherz.html#module-pywhy_stats.independence.fisherz

there is a missing condind and ind function documentation.

Signed-off-by: Adam Li <[email protected]>

adam2392 · 2023-07-07T13:26:23Z

@bloebp just checking in to see if you had the opportunity to look at this?

bloebp

Overall, it looks good! The main thing is: Can we separate the KCI test changes into a separate PR or do you need to have these modifications for the discrepancy tests?

doc/conf.py

pywhy_stats/discrepancy/base.py

bloebp · 2023-07-07T17:55:02Z

doc/api.rst

   fisherz
   kci

+
+(Conditional) Discrepancy Testing


I am wondering if the term "discrepancy test" is really that common, I actually never heard about this before and still find it confusing. Maybe something like "Conditional Equality Test" or "Conditional Distribution Equivalence Test" (or something along the line) makes it much clearer. Or is "Discrepancy Testing" a well established term?

Perhaps Conditional K-Sample Testing?

I use discrepancy because I saw it in one paper, but I think (conditional) k-sample is more correct. I guess in our cases, we are only doing conditional 2-sample testing, since it's P_1(y|x) =? P_2(y|x)

I changed it

pywhy_stats/discrepancy/base.py

bloebp · 2023-07-07T18:01:13Z

pywhy_stats/discrepancy/base.py

+    # compute the test statistic on the conditionally permuted
+    # dataset, where each group label is resampled for each sample
+    # according to its propensity score
+    null_dist = Parallel(n_jobs=n_jobs)(


Note that using the parallel jobs here would ignore the previously set random seed. This can be fixed doing something like here:
https://github.com/py-why/dowhy/blob/main/dowhy/gcm/independence_test/kernel.py#L101

The idea is to generate random seeds based on the current (seeded) random generator and provide these seeds to the parallel processes. In that way, the generated seeds are deterministic and, thus, the parallel processes.

I think this one is fine because it uses the rng.binomial is not passed into the inner function. I added a unit-test though just for completeness tho.

Ah I see. Just to confirm, the only random thing here is the rng call and the function that is executed in parallel is deterministic?

bloebp · 2023-07-07T18:17:17Z

tests/discrepancy/test_cd.py

+seed = 12345
+rng = np.random.default_rng(seed)
+# number of samples to use in generating test dataset; the lower the faster
+n_samples = 150


Generally, rather use the flaky decorator with a 'reasonable' corresponding assert definition instead of a fixed random seed in unit tests. Happened too often that a bug was not captured because it just worked with a concrete seed. If we just generate random data, then we would capture it if it is consistently failing. I know the concern regarding the "random fails" or runtime in stochastic tests, but this is rarely a practical issue if we set the retries to like 3.

bloebp · 2023-07-07T18:20:03Z

tests/test_kci.py

+seed = 12345
+rng = np.random.default_rng(seed)


See the other comment regarding the seed

bloebp · 2023-07-07T18:20:49Z

tests/test_kci.py

-    x = np.random.randn(1000, 1)
-    y = np.exp(np.random.rand(1000, 1))
+    n_samples = 200
+    x = rng.standard_normal((200, 1))


Any reason to change it to rng instead of using numpy? If it is to use the random generator, this is already covered by the flaky decorator of the test, i.e. no need to have a seed.

It's quite difficult to debug tests if they're not deterministic, so the nice thing is if we break stuff and this test fails, we can keep running it to get the same answer and do some debugging. With the previous random seed, I spent a lot of time trying to figure out what was going on because the test didn't pass but the output was changing, so hard to isolate.

The flaky thing will still work because the CIs inherently test on different distros/OS/etc., so this will still have the same features, but just locally you can reproduce failures to the exact numerical precision.

The flaky thing will still work because the CIs inherently test on different distros/OS/etc.,

I don't think this is how the random generator here really works (maybe the very old generation processes that took these things into consideration). For a given seed, it typically doesn't matter which distro/OS/system you are using, it will produce the same results. Otherwise, providing the random seeds of experiments would be pointless. You can also easily verify this by executing the same line of code with the same seed on different systems. So in that sense, we will lose the "flaky benefit".

I see the point of the debugging advantage, but I also experienced the "danger" of having a fixed seed. Peter suggest before to simply print/log the current random state as string at the beginning of the test (e.g. print(np.random.get_state()), which can then be copied if the tests fails to make it reproducible. I used that approach before as well.

tests/test_kci.py

bloebp · 2023-07-07T18:23:36Z

tests/discrepancy/test_cd.py

+            random_seed=seed,
+        )
+        assert res.pvalue > alpha, f"Fails with {res.pvalue} not greater than {alpha}"
+    elif env_type == "multi":


What about instead of the if-elifs, just separate them into two tests with test_cd_simulation_single and test_cd_simulation_multi.

Optimally, one would even have a separate test for each assert (but we don't need to over-engineer it).

Also, since you don't use the given-when-then pattern for the unit test names, add a brief description what is tested here.

Refactored and also renamed tests.

Signed-off-by: Adam Li <[email protected]>

adam2392 · 2023-07-10T15:29:24Z

I addressed most of your comments and will refactor now to a 2-stage PR. I'll be able to do this sometime later on.

bloebp · 2023-07-11T16:50:41Z

pywhy_stats/__init__.py

@@ -1,3 +1,5 @@
-from . import fisherz, kci
+from . import discrepancy, independence


I like "conditional k-sample test", since this is I think quite a good understanding. We should then also rename the module accordingly, it is still "discrepancy".

Okay perhaps, conditional_ksample?

Sounds good!

bloebp · 2023-07-11T16:53:01Z

pywhy_stats/discrepancy/base.py

+    # compute the test statistic on the conditionally permuted
+    # dataset, where each group label is resampled for each sample
+    # according to its propensity score
+    null_dist = Parallel(n_jobs=n_jobs)(


Ah I see. Just to confirm, the only random thing here is the rng call and the function that is executed in parallel is deterministic?

pywhy_stats/discrepancy/bregman.py

bloebp · 2023-07-11T16:55:12Z

pywhy_stats/discrepancy/kcd.py

+
+
+# XXX: determine if we can do this with Y being optional.
+def condind(


Yea I was thinking of the "classical" two sample tests. Since the module is now more generally called "(Conditional) k-sample test", then it makes sense to have potentially unconditional tests. Should we still add an unconditional function that simply raises an error? Or would this be rather confusing?

bloebp · 2023-07-11T16:57:32Z

pywhy_stats/discrepancy/base.py

+from pywhy_stats.kernel_utils import _default_regularization
+
+
+def _preprocess_propensity_data(


Since it is not really preprocessing something bur rather validates that the parameter/inputs are correctly specified, what about calling it _validate_propensity_data instead?

bloebp · 2023-07-11T17:06:25Z

pywhy_stats/kernels.py

-        .reshape(X.shape[0], X.shape[0])
-        .astype(np.float32)
-    )
+    X, Y = check_pairwise_arrays(X, Y)


Does check_pairwise_arrays also reshape the data to (-1, 1) in case X and Y were passed as one dimensional array?

bloebp · 2023-07-11T17:13:15Z

tests/test_kci.py

-    x = np.random.randn(1000, 1)
-    y = np.exp(np.random.rand(1000, 1))
+    n_samples = 200
+    x = rng.standard_normal((200, 1))


The flaky thing will still work because the CIs inherently test on different distros/OS/etc.,

I don't think this is how the random generator here really works (maybe the very old generation processes that took these things into consideration). For a given seed, it typically doesn't matter which distro/OS/system you are using, it will produce the same results. Otherwise, providing the random seeds of experiments would be pointless. You can also easily verify this by executing the same line of code with the same seed on different systems. So in that sense, we will lose the "flaky benefit".

I see the point of the debugging advantage, but I also experienced the "danger" of having a fixed seed. Peter suggest before to simply print/log the current random state as string at the beginning of the test (e.g. print(np.random.get_state()), which can then be copied if the tests fails to make it reproducible. I used that approach before as well.

tests/test_kci.py

bloebp · 2023-07-11T18:03:26Z

pywhy_stats/kernel_utils.py

+    if metric is None:
+        metric, kernel_params = _get_default_kernel(X)
+    else:
+        kernel_params = dict()
+
+    # compute the potentially pairwise kernel matrix
+    # If the number of arguments is just one, then we bypass the pairwise kernel
+    # optimized computation via sklearn and opt to use the metric function directly
+    if callable(metric) and len(inspect.getfullargspec(metric).args) == 1:
+        kernel = metric(X)
+    else:
+        kernel = pairwise_kernels(
+            X, Y=Y, metric=metric, n_jobs=n_jobs, filter_params=False, **kernel_params
+        )


Can we move the change of allowing strings as kernels instead of only callable into a separate PR? Would like to take a separate look at this, which is not mixed together with other changes. For this PR, maybe stick to passing a callable (e.g., a kernel from scikit-learn via kernel_X = partial(pairwise_kernels, metric=metric, n_jobs=n_jobs, filter_params=False, other_params...) which should be equivalent).

Sure thing!

Note: The change here is made to allow maximum flexibility on the user, while not having to write our own pairwise kernel code for custom kernels.

tldr: if we adhere to the scikit-learn pattern, we have less code and buys some helper functions for free, but we need to restrict what type of kernel functions are allowed.

The issue here is that custom kernels are slow, but that is always the case. However, in order to provide some consistency to make maintenance and code simpler, we probably want to be consistent with how we define a kernel function. That is why I added the extra unused parameters to the delta kernel. Having the def kernel_func(X, Y=None, **kwargs) pattern, allows us to re-use the pairwise_kernels function.

But it also buys us the usages of strings for free, since they accept those.

Alternatively, if we want to not allow strings, and use the partial(func(...)) pattern, then we have to write our own parallelized pairwise_kernel function.

Limiting the scope of what we support is probably a good idea, so users can't just feed in some arbitrary lambda functions that we can't easily error check.

Idk how to handle the custom_kernel you define that combines a delta and rbf kernel, so I added this hack for now. However, this raises a question for me: is a custom kernel like that even valid?

I think most users can simply use a kernel from scikit. We can definitely think about allowing strings as well, just want to have this review separate.

However, this raises a question for me: is a custom kernel like that even valid

It is, a convolution of different valid kernels is also a valid kernel and, practically, you can achieve this by pointwise multiplication of the resulting kernel matrices.

bloebp · 2023-07-11T18:03:48Z

pywhy_stats/kernel_utils.py

+
+
+def von_neumann_divergence(A: ArrayLike, B: ArrayLike) -> float:
+    """Compute Von Neumann divergence between two PSD matrices.


What does PSD stand for? Maybe write it out.

#19) Towards: #15 Changes proposed in this pull request: - refactors code to setup for kcd test - allows any of the pairwise kernel strings to be passed in from sklearn (which is significantly faster than using partial because sklearn optimizes the in-house kernels) - also requires kernel functions to be a specific API, so it's easier to test, implement and document This should all make implementation of the kcd test pretty straightforward --------- Signed-off-by: Adam Li <[email protected]> Co-authored-by: Patrick Bloebaum <[email protected]>

Closes: #15 Follow-up to #19 Changes proposed in this pull request: - Adds `kcd` and `bremgan` test along w/ unit-tests and documentation update - As a result of #19, the code is entirely self-contained and leverages the kernel functions that are shared w/ the kci test. ## Before submitting  - [ ] I've read and followed all steps in the [Making a pull request](https://github.com/py-why/pywhy-stats/blob/main/CONTRIBUTING.md#making-a-pull-request) section of the `CONTRIBUTING` docs. - [ ] I've updated or added any relevant docstrings following the syntax described in the [Writing docstrings](https://github.com/py-why/pywhy-stats/blob/main/CONTRIBUTING.md#writing-docstrings) section of the `CONTRIBUTING` docs. - [ ] If this PR fixes a bug, I've added a test that will fail without my fix. - [ ] If this PR adds a new feature, I've added tests that sufficiently cover my new functionality. - [ ] I have added a changelog entry succintly describing the change in this PR in the [whats_new](https://github.com/py-why/pywhy-stats/blob/main/docs/whats_new/) relevant version document. ## After submitting  - [ ] All GitHub Actions jobs for my pull request have passed. --------- Signed-off-by: Adam Li <[email protected]>

adam2392 added 7 commits June 22, 2023 19:59

Refactor to add kcd

92c73cf

Signed-off-by: Adam Li <[email protected]>

Adding kcd and bregman test

0bab643

Signed-off-by: Adam Li <[email protected]>

Now unit test WIP

9e747ee

Signed-off-by: Adam Li <[email protected]>

Adding changelog and documentation

be297d7

Signed-off-by: Adam Li <[email protected]>

Fix docs build

02ce950

Signed-off-by: Adam Li <[email protected]>

Speed up unit-tests and refactored code

cce98a2

Signed-off-by: Adam Li <[email protected]>

Speed up tests

578f577

Signed-off-by: Adam Li <[email protected]>

adam2392 marked this pull request as ready for review June 27, 2023 15:04

Fixed unit tests

8fd5997

Signed-off-by: Adam Li <[email protected]>

adam2392 requested a review from bloebp June 27, 2023 15:12

update readme

3cf25e9

Signed-off-by: Adam Li <[email protected]>

adam2392 mentioned this pull request Jul 6, 2023

[DOC] An example demonstrating how to leverage different kernels. We can ideally show how different kernels are useful too on a small simulated data. #16

Open

bloebp reviewed Jul 7, 2023

View reviewed changes

Address some comments of patrick

b877dc9

Signed-off-by: Adam Li <[email protected]>

bloebp reviewed Jul 11, 2023

View reviewed changes

adam2392 mentioned this pull request Jul 19, 2023

[ENH] Faster and more flexible code, and code sharing for kernel tests #19

Merged

6 tasks

adam2392 mentioned this pull request Aug 30, 2023

[ENH] KCD and Bregman conditional 2-sample tests #21

Merged

6 tasks

bloebp closed this in #21 Sep 6, 2023

adam2392 deleted the kcd branch September 11, 2023 14:56

		@@ -1,3 +1,5 @@
		from . import fisherz, kci
		from . import discrepancy, independence



		# XXX: determine if we can do this with Y being optional.
		def condind(

		from pywhy_stats.kernel_utils import _default_regularization


		def _preprocess_propensity_data(



		def von_neumann_divergence(A: ArrayLike, B: ArrayLike) -> float:
		"""Compute Von Neumann divergence between two PSD matrices.

[ENH] KCD and Bregman tests #15

[ENH] KCD and Bregman tests #15

Uh oh!

Conversation

adam2392 commented Jun 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactoring keeps functionality

Before submitting

After submitting

Uh oh!

adam2392 commented Jun 27, 2023

Uh oh!

adam2392 commented Jun 27, 2023

Uh oh!

adam2392 commented Jul 7, 2023

Uh oh!

bloebp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adam2392 commented Jul 10, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adam2392 commented Jun 22, 2023 •

edited

Loading