-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Transformer should accept y
argument in the transform
method
#1494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yup, that would be a good thing to fix! I apologise that we can fix it quickly but we won't be making a new release until perhaps the end of June due to some other features coming in. You can use the development branch once that fix is in. Otherwise if you need it desperatly at the moment, you can make a fork and use that. |
Well, actually, thinking about it, the main problem is somewhere else as well, because the auto-sklearn doesn't pass the Are you interested in some state-of-art balancing method? In case, when it's ready, I can do a pull request |
I guess the line you're looking for is this one? auto-sklearn/autosklearn/pipeline/components/feature_preprocessing/__init__.py Lines 129 to 130 in b2ac331
At some point, we definitely want to update the whole pipeline to 1) Be more flexible, i.e. you could define your own pipelines 2) Be fully sklearn compliant and 3) Be more accessible from the outside. As you can imagine this is a large task and so smaller baby steps like your issue are a step in the right direction :) So we do try to be fully in the Sklearn realm, i.e. we don't add XGBoost, however we do have some custom implementation of things so it's not unfeasable as long as it doesn't add requirements. Do you have any literature to link to? Adding choices to the configuration space is a large decision and we would have to benchmark this of course. I imagine this is doing some balancing with respect to distributions of the @mfeurer when you're back, you might be interested in following up here Best, |
Sure, I am trying to use an eversampling and undersampling methods, making BOHB learn which method of the two works better (or both of them). For oversampling, ProWRAS [1] looks like the state-of-art, with extensive benchmarks against the previous state-of-art. Another method (gamus [2]) is implemented in python, but no fair benchmarks are available (they only tested with old methods, see [3] for extensive benchmarks up to 2019). All the implementations are provided by For undersampling, there is a recent paper proposing a boost-like method [4], but since I don't know how to introduce it in the auto-sklearn pipeline and since they also show that classic clustering-based undersampling reaches comparable results [5], I'm using cluster-based method from [1] Bej, Saptarshi, Kristian Schulz, Prashant Srivastava, Markus Wolfien, and Olaf Wolkenhauer. “A Multi-Schematic Classifier-Independent Oversampling Approach for Imbalanced Datasets.” IEEE Access 9 (2021): 123358–74. https://doi.org/10.1109/ACCESS.2021.3108450. Implementations: |
There is indeed an issue here in that this interface is defined wrongly. However, the implementation correctly passes y-values and components such as the select percentile make use of this. TBH I'm somewhat confused about this and why it does not accept a y nor pass it to the underlying algorithm. I'd be very happy about a fix for this issue.
Yes and no, we have a discussion on why we don't have any balancing methods in issue #1164 |
Actually, the problem is also in the returned value: if we pass |
Hi @00sapo, Do you have any test code that illustrates this failing, it would make going through and fixing this a lot easier. Best, |
Hi, unfortunately, we cannot accommodate a transformer changing |
Well, other objects throw an error if their methods are called in an incorrect order. In the code I have in mind, that if one calls As example, take any class from |
How do you then transform data at training time?
I am not familiar with imba-learn, could you please give some further details? |
Sorry, In
In our balancing case, the training calls For the example, I will try to put an example here as soon as I have time |
I'm not interested anymore in this issue for now, so I won't work on this soon. Btw, as I explained in my previous comment, most of those methods are actually old. See [1] for extensive benchmarking. [1] Kovács, György. “An Empirical Comparison and Evaluation of Minority Oversampling Techniques on a Large Number of Imbalanced Datasets.” Applied Soft Computing 83 (October 1, 2019): 105662. https://doi.org/10.1016/j.asoc.2019.105662. |
According to sklearn API, a data or feature pre-processor should accept the
y
argument intransform()
. For instance, I'm trying to add balancing algorithms, that do need they
because they add/remove samples and so they have to change the target vector as well.I think it requires just a simple edit in this line:
auto-sklearn/autosklearn/pipeline/components/base.py
Line 253 in b2ac331
The text was updated successfully, but these errors were encountered: