-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Inconsistent UnitTest Results on MacOS #514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Unfortunately, I don't know what's happening here. As there is no fast and open CI system for MacOS we can also not provide running unit tests and therefore not support it. However, as long as only the performance comparisons are off by a bit it should not be a big deal. Just out of curiosity: is this a system python or did you install it with AnaConda? |
It is python, the reproducibility of results is quite important for my use case so I will see if I can figure it out |
Also it seems that the slow startup times for MacOS Travis CI builds might have been solved. travis-ci/travis-ci#7304 |
As a followup, I've found that even on linux systems that the above toy example seems to provide differing results. Is there any way to set the limits of autosklearn on a runs or iterations basis to get deterministic results? @mfeurer |
Please excuse my initial, not very helpful answer. What you're seeing here is most likely some small variation due to time limits and random effects introduced by them. To get rid of such effects, you need to remove all time limits and run Auto-sklearn for a specific number of iterations instead. Please see #451 for an example. |
This looks like what I need, thank you! |
Hmm.... so I followed the instructions from the cited issue and it seems that I am still getting results that vary. To be absolutely certain that it wasn't something related to my testing setup (linux system), I pulled the git repo and ran the tests on master. All of the test cases passed. import numpy as np
from sklearn.model_selection import train_test_split
from autosklearn.classification import AutoSklearnClassifier
seed = 0
np.random.seed(seed)
X = np.array([0] * 50 + [1] * 50).reshape((-1, 1))
y = np.array([0] * 50 + [1] * 50)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
est = AutoSklearnClassifier(time_left_for_this_task=40,
ensemble_size=0,
seed=seed,
include_preprocessors=['no_preprocessing'],
include_estimators=["liblinear_svc", ],
smac_scenario_args={'runcount_limit': 5})
est.fit(X_train, y_train)
est.fit_ensemble(y_train, ensemble_size=50)
print(est.predict_proba(X_test))
# print(est.show_models()) The outputs:
--and--
|
@mfeurer It seems this might be two separate issues: one with test cases failing on the mac and one with reproducibility on linux systems. Should I split these into two issues? Also, not sure if this might help with debugging this but it seems that even with a fixed number of runs, numpy's "random function" is called a differing number of times between runs with a fixed seed. I overwrote numpy's random setup using the following snippet which I inserted into the code above. fit ensemble seems to consistently call random 50 times whereas the actual fit method itself runs a variable number of times ranging from 300 to 500 times overall. # snippet
from forbiddenfruit import curse
import random
i = 0
def randint(self, low, high=None, size=None, dtype='l'):
global i
# curframe = inspect.currentframe()
# calframe = inspect.getouterframes(curframe, 2)
# i+=calframe[1][3]+'\n'
val = random.randint(low, high-1) if low is not None and high is not None else random.randint(0, low-1)
i+=1 # '{}\n'.format(val)
return val
# val = low if high is not None else low-1
# if size is not None:
# return np.full(size, val).astype(dtype)
# else:
# return val
curse(np.random.RandomState, 'randint', randint) |
Thanks for digging into that. I expected the following script to be deterministic, but it turns out it isn't: import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
import autosklearn.classification
def main():
X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = \
sklearn.model_selection.train_test_split(X, y, random_state=1)
automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=1000000000,
per_run_time_limit=86400,
ml_memory_limit=8000,
tmp_folder='/tmp/autosklearn_holdout_example_tmp',
output_folder='/tmp/autosklearn_holdout_example_out',
disable_evaluator_output=False,
smac_scenario_args={
'runcount_limit': 5,
'deterministic': 'true',
'intensification_percentage': 0.000000001
},
delete_tmp_folder_after_terminate=False,
ensemble_size=0,
initial_configurations_via_metalearning=0
)
automl.fit(X_train, y_train, dataset_name='digits')
automl.fit_ensemble(y_train, ensemble_size=1)
# Print the final ensemble constructed by auto-sklearn.
print(automl.show_models())
predictions = automl.predict(X_test)
# Print statistics about the auto-sklearn run such as number of
# iterations, number of models failed with a time out.
print(automl.sprint_statistics())
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))
if __name__ == '__main__':
main() I just had a brief look at the code and there is at least one issue in |
@mfeurer Based on the fixes for #517 I tested my snippet again I continued to get different results... yet when I ran your snipped I began to get consistent results. I started to try to pare down your snippet to the essentials and it seems that the program hangs if I allow the fitting processes to automatically construct the ensemble model, which I am quite unsure of as to why (is it because of the passing of smac args?). Is it possible to have auto-sklearn build the ensemble and produce consistent results in one go? |
That is surprising and I don't know why this would/should happen.
I expected this to happen with the snippet. Does this issue happen with your specific dataset or a simple example dataset? |
Both datasets, though it maybe as a result of my misuse of the SMAC args as it doesn't seem to be new behavior (0.4.2 produces the same freezing). The following hangs for me in both versions (I let it run for about 10 minutes each time, just to be certain). Note that I commented out the ensembling portions of the setup and execution code. import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
import autosklearn.classification
def main():
X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = \
sklearn.model_selection.train_test_split(X, y, random_state=1)
automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=1000000000,
per_run_time_limit=86400,
ml_memory_limit=8000,
tmp_folder='/tmp/autosklearn_holdout_example_tmp',
output_folder='/tmp/autosklearn_holdout_example_out',
disable_evaluator_output=False,
smac_scenario_args={
'runcount_limit': 5,
'deterministic': 'true',
'intensification_percentage': 0.000000001
},
delete_tmp_folder_after_terminate=True,
# ensemble_size=0,
# initial_configurations_via_metalearning=0
)
automl.fit(X_train, y_train, dataset_name='digits')
# automl.fit_ensemble(y_train, ensemble_size=1)
# Print the final ensemble constructed by auto-sklearn.
print(automl.show_models())
predictions = automl.predict(X_test)
# Print statistics about the auto-sklearn run such as number of
# iterations, number of models failed with a time out.
print(automl.sprint_statistics())
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))
if __name__ == '__main__':
main() |
Thanks for sharing the script. Indeed, there is currently an issue because of the way |
Closing this as we |
Uh oh!
There was an error while loading. Please reload this page.
I was running test cases on my mac and it seems that some of the tests were failing due to the results not being what was expected. I was lead on this path while running a toy example with the random seed set which produced different results and I found that the unit tests were failing on the MacOS platform. For example:
Here is a toy example:
My output between runs would be variable. For example:
--and--
I would get a variable number of these errors specifically in the regression and classification unit test sections. Do you have any idea what might be causing this.
Relavent versioning info:
MacOS 10.13.6
Python 3.6
sklearn 0.19.1
autosklearn 0.4.0
The text was updated successfully, but these errors were encountered: