Skip to content

[ENH] KCD and Bregman tests #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from
Closed

[ENH] KCD and Bregman tests #15

wants to merge 10 commits into from

Conversation

adam2392
Copy link
Collaborator

@adam2392 adam2392 commented Jun 22, 2023

Changes proposed in this pull request:

  • Add KCD test and BregmanCDTest to the discrepancy/ submodule

To enable this and to simultaneously improve the readability and maintainability of the code:

  • Refactor changes to enable code-sharing between kernel-based methods
  • Refactored the delta kernel to be more readable and also added a unit-test. In addition, I made it in-line w/ the way that pairwise kernel functions should be, so it's compatible with sklearn API.
    - Specifically refactor the way that the kernels are computed, to make it more in-line with how sklearn does pairwise kernels. This leverages their pairwise_kernels function, which inherently is parallelized probably better than we can, and can leverage all possible kernels they have and more. Moreover, I added a pattern to demonstrate that we can augment the PAIRWISE_KERNEL_FUNCTIONS from sklearn whenever we need to add a kernel that is more efficient to compute using vectorized operations of numpy. This allows us to keep the delta_kernel as is.
  • Speed up the kci unit tests. Before just the kci unit-tests took on average 10+ seconds to run. Now they run in less than 2 seconds with the same results to prevent regressions.

Now entire unit-test suite takes 8 seconds to run.

Refactoring keeps functionality

None of the tests changed functionally. If anything, I lowered the number of samples used for most tests and they all still pass, so the refactoring does not introduce regressions.

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.
  • I have added a changelog entry succintly describing the change in this PR in the whats_new relevant version document.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

@adam2392 adam2392 marked this pull request as ready for review June 27, 2023 15:04
@adam2392
Copy link
Collaborator Author

We should have an example demonstrating how to leverage different kernels. We can ideally show how different kernels are useful too on a small simulated data.

Signed-off-by: Adam Li <[email protected]>
@adam2392 adam2392 requested a review from bloebp June 27, 2023 15:12
@adam2392
Copy link
Collaborator Author

Signed-off-by: Adam Li <[email protected]>
@adam2392
Copy link
Collaborator Author

adam2392 commented Jul 7, 2023

@bloebp just checking in to see if you had the opportunity to look at this?

Copy link
Member

@bloebp bloebp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks good! The main thing is: Can we separate the KCI test changes into a separate PR or do you need to have these modifications for the discrepancy tests?

doc/api.rst Outdated
fisherz
kci


(Conditional) Discrepancy Testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if the term "discrepancy test" is really that common, I actually never heard about this before and still find it confusing. Maybe something like "Conditional Equality Test" or "Conditional Distribution Equivalence Test" (or something along the line) makes it much clearer. Or is "Discrepancy Testing" a well established term?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Conditional K-Sample Testing?

I use discrepancy because I saw it in one paper, but I think (conditional) k-sample is more correct. I guess in our cases, we are only doing conditional 2-sample testing, since it's P_1(y|x) =? P_2(y|x)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it

# compute the test statistic on the conditionally permuted
# dataset, where each group label is resampled for each sample
# according to its propensity score
null_dist = Parallel(n_jobs=n_jobs)(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that using the parallel jobs here would ignore the previously set random seed. This can be fixed doing something like here:
https://github.com/py-why/dowhy/blob/main/dowhy/gcm/independence_test/kernel.py#L101

The idea is to generate random seeds based on the current (seeded) random generator and provide these seeds to the parallel processes. In that way, the generated seeds are deterministic and, thus, the parallel processes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is fine because it uses the rng.binomial is not passed into the inner function. I added a unit-test though just for completeness tho.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Just to confirm, the only random thing here is the rng call and the function that is executed in parallel is deterministic?

Comment on lines +8 to +11
seed = 12345
rng = np.random.default_rng(seed)
# number of samples to use in generating test dataset; the lower the faster
n_samples = 150
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, rather use the flaky decorator with a 'reasonable' corresponding assert definition instead of a fixed random seed in unit tests. Happened too often that a bug was not captured because it just worked with a concrete seed. If we just generate random data, then we would capture it if it is consistently failing. I know the concern regarding the "random fails" or runtime in stochastic tests, but this is rarely a practical issue if we set the retries to like 3.

Comment on lines +11 to +12
seed = 12345
rng = np.random.default_rng(seed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the other comment regarding the seed

Comment on lines -15 to +18
x = np.random.randn(1000, 1)
y = np.exp(np.random.rand(1000, 1))
n_samples = 200
x = rng.standard_normal((200, 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to change it to rng instead of using numpy? If it is to use the random generator, this is already covered by the flaky decorator of the test, i.e. no need to have a seed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite difficult to debug tests if they're not deterministic, so the nice thing is if we break stuff and this test fails, we can keep running it to get the same answer and do some debugging. With the previous random seed, I spent a lot of time trying to figure out what was going on because the test didn't pass but the output was changing, so hard to isolate.

The flaky thing will still work because the CIs inherently test on different distros/OS/etc., so this will still have the same features, but just locally you can reproduce failures to the exact numerical precision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flaky thing will still work because the CIs inherently test on different distros/OS/etc.,

I don't think this is how the random generator here really works (maybe the very old generation processes that took these things into consideration). For a given seed, it typically doesn't matter which distro/OS/system you are using, it will produce the same results. Otherwise, providing the random seeds of experiments would be pointless. You can also easily verify this by executing the same line of code with the same seed on different systems. So in that sense, we will lose the "flaky benefit".

I see the point of the debugging advantage, but I also experienced the "danger" of having a fixed seed. Peter suggest before to simply print/log the current random state as string at the beginning of the test (e.g. print(np.random.get_state()), which can then be copied if the tests fails to make it reproducible. I used that approach before as well.

random_seed=seed,
)
assert res.pvalue > alpha, f"Fails with {res.pvalue} not greater than {alpha}"
elif env_type == "multi":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about instead of the if-elifs, just separate them into two tests with test_cd_simulation_single and test_cd_simulation_multi.

Optimally, one would even have a separate test for each assert (but we don't need to over-engineer it).

Also, since you don't use the given-when-then pattern for the unit test names, add a brief description what is tested here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored and also renamed tests.

@adam2392
Copy link
Collaborator Author

I addressed most of your comments and will refactor now to a 2-stage PR. I'll be able to do this sometime later on.

@@ -1,3 +1,5 @@
from . import fisherz, kci
from . import discrepancy, independence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like "conditional k-sample test", since this is I think quite a good understanding. We should then also rename the module accordingly, it is still "discrepancy".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay perhaps, conditional_ksample?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

# compute the test statistic on the conditionally permuted
# dataset, where each group label is resampled for each sample
# according to its propensity score
null_dist = Parallel(n_jobs=n_jobs)(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Just to confirm, the only random thing here is the rng call and the function that is executed in parallel is deterministic?



# XXX: determine if we can do this with Y being optional.
def condind(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I was thinking of the "classical" two sample tests. Since the module is now more generally called "(Conditional) k-sample test", then it makes sense to have potentially unconditional tests. Should we still add an unconditional function that simply raises an error? Or would this be rather confusing?

from pywhy_stats.kernel_utils import _default_regularization


def _preprocess_propensity_data(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is not really preprocessing something bur rather validates that the parameter/inputs are correctly specified, what about calling it _validate_propensity_data instead?

.reshape(X.shape[0], X.shape[0])
.astype(np.float32)
)
X, Y = check_pairwise_arrays(X, Y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does check_pairwise_arrays also reshape the data to (-1, 1) in case X and Y were passed as one dimensional array?

Comment on lines -15 to +18
x = np.random.randn(1000, 1)
y = np.exp(np.random.rand(1000, 1))
n_samples = 200
x = rng.standard_normal((200, 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flaky thing will still work because the CIs inherently test on different distros/OS/etc.,

I don't think this is how the random generator here really works (maybe the very old generation processes that took these things into consideration). For a given seed, it typically doesn't matter which distro/OS/system you are using, it will produce the same results. Otherwise, providing the random seeds of experiments would be pointless. You can also easily verify this by executing the same line of code with the same seed on different systems. So in that sense, we will lose the "flaky benefit".

I see the point of the debugging advantage, but I also experienced the "danger" of having a fixed seed. Peter suggest before to simply print/log the current random state as string at the beginning of the test (e.g. print(np.random.get_state()), which can then be copied if the tests fails to make it reproducible. I used that approach before as well.

Comment on lines +129 to +142
if metric is None:
metric, kernel_params = _get_default_kernel(X)
else:
kernel_params = dict()

# compute the potentially pairwise kernel matrix
# If the number of arguments is just one, then we bypass the pairwise kernel
# optimized computation via sklearn and opt to use the metric function directly
if callable(metric) and len(inspect.getfullargspec(metric).args) == 1:
kernel = metric(X)
else:
kernel = pairwise_kernels(
X, Y=Y, metric=metric, n_jobs=n_jobs, filter_params=False, **kernel_params
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the change of allowing strings as kernels instead of only callable into a separate PR? Would like to take a separate look at this, which is not mixed together with other changes. For this PR, maybe stick to passing a callable (e.g., a kernel from scikit-learn via kernel_X = partial(pairwise_kernels, metric=metric, n_jobs=n_jobs, filter_params=False, other_params...) which should be equivalent).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing!

Note: The change here is made to allow maximum flexibility on the user, while not having to write our own pairwise kernel code for custom kernels.

tldr: if we adhere to the scikit-learn pattern, we have less code and buys some helper functions for free, but we need to restrict what type of kernel functions are allowed.

The issue here is that custom kernels are slow, but that is always the case. However, in order to provide some consistency to make maintenance and code simpler, we probably want to be consistent with how we define a kernel function. That is why I added the extra unused parameters to the delta kernel. Having the def kernel_func(X, Y=None, **kwargs) pattern, allows us to re-use the pairwise_kernels function.

But it also buys us the usages of strings for free, since they accept those.

Alternatively, if we want to not allow strings, and use the partial(func(...)) pattern, then we have to write our own parallelized pairwise_kernel function.

Limiting the scope of what we support is probably a good idea, so users can't just feed in some arbitrary lambda functions that we can't easily error check.

Idk how to handle the custom_kernel you define that combines a delta and rbf kernel, so I added this hack for now. However, this raises a question for me: is a custom kernel like that even valid?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most users can simply use a kernel from scikit. We can definitely think about allowing strings as well, just want to have this review separate.

However, this raises a question for me: is a custom kernel like that even valid

It is, a convolution of different valid kernels is also a valid kernel and, practically, you can achieve this by pointwise multiplication of the resulting kernel matrices.



def von_neumann_divergence(A: ArrayLike, B: ArrayLike) -> float:
"""Compute Von Neumann divergence between two PSD matrices.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does PSD stand for? Maybe write it out.

adam2392 added a commit that referenced this pull request Aug 30, 2023
#19)

Towards: #15 

Changes proposed in this pull request:
- refactors code to setup for kcd test
- allows any of the pairwise kernel strings to be passed in from sklearn
(which is significantly faster than using partial because sklearn
optimizes the in-house kernels)
- also requires kernel functions to be a specific API, so it's easier to
test, implement and document

This should all make implementation of the kcd test pretty
straightforward

---------

Signed-off-by: Adam Li <[email protected]>
Co-authored-by: Patrick Bloebaum <[email protected]>
@bloebp bloebp closed this in #21 Sep 6, 2023
bloebp pushed a commit that referenced this pull request Sep 6, 2023
Closes: #15 
Follow-up to #19 

Changes proposed in this pull request:
- Adds `kcd` and `bremgan` test along w/ unit-tests and documentation
update
- As a result of #19, the code is entirely self-contained and leverages
the kernel functions that are shared w/ the kci test.

## Before submitting

<!-- Please complete this checklist BEFORE submitting your PR to speed
along the review process. -->
- [ ] I've read and followed all steps in the [Making a pull
request](https://github.com/py-why/pywhy-stats/blob/main/CONTRIBUTING.md#making-a-pull-request)
    section of the `CONTRIBUTING` docs.
- [ ] I've updated or added any relevant docstrings following the syntax
described in the
[Writing
docstrings](https://github.com/py-why/pywhy-stats/blob/main/CONTRIBUTING.md#writing-docstrings)
section of the `CONTRIBUTING` docs.
- [ ] If this PR fixes a bug, I've added a test that will fail without
my fix.
- [ ] If this PR adds a new feature, I've added tests that sufficiently
cover my new functionality.
- [ ] I have added a changelog entry succintly describing the change in
this PR in the
[whats_new](https://github.com/py-why/pywhy-stats/blob/main/docs/whats_new/)
relevant version document.

## After submitting

<!-- Please complete this checklist AFTER submitting your PR to speed
along the review process. -->
- [ ] All GitHub Actions jobs for my pull request have passed.

---------

Signed-off-by: Adam Li <[email protected]>
@adam2392 adam2392 deleted the kcd branch September 11, 2023 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants