You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a dataset which has a special column. Every row has the same value in this column. I fit a model on this data and I create an Explainer instance.
When I try to create an Aspect with the explainer, I get an error:
When initialising the Aspect instance, inside the utils.calculate_depend_matrix, corr method of pandas is called with the data we provide. If there is a non-varying column, that column has NaN values in the resulting correlation matrix (related Pandas issue). When I change a value in the column with non-varying values, problem goes away.
Solution
utils.calculate_depend_matrix method can be updated to replace NaN values before returning the depend_matrix:
defcalculate_depend_matrix(
data, depend_method, corr_method, agg_method
):
depend_matrix=pd.DataFrame()
ifdepend_method=="assoc":
depend_matrix=calculate_assoc_matrix(data, corr_method)
ifdepend_method=="pps":
depend_matrix=calculate_pps_matrix(data, agg_method)
ifcallable(depend_method):
try:
depend_matrix=depend_method(data)
except:
raiseValueError(
"You have passed wrong callable in depend_method argument. ""'depend_method' is the callable to use for calculating dependency matrix."
)
# if there is a non-varying column in data, there will be NaN values in the 'depend_matrix'.# replace NaN values on the diagonal with 1 and others with 0. depend_matrix[depend_matrix.isnull()] =0foriinrange(depend_matrix.shape[0]):
depend_matrix.iloc[i,i] =1returndepend_matrix
When the method is updated this way, I am able to create an Aspect instance and call the plot_dendrogram method. Following plot is generated:
Label 2 is the third column in my data, where all the rows have value 3.
The text was updated successfully, but these errors were encountered:
Hello @CahidArda,
Thank you for your contribution!
It looks good but can you add in #538 a warning informing the user of this replacement procedure if it happens, please?
Something like this would work: warnings.warn("There were NaNs in `depend_matrix`. Replacing NaN values on the diagonal with 1 and others with 0.")
I have added a warning message. I added another sentence to explain why this may happen to let the user know. Message says:
There were NaNs in depend_matrix. This is possibly because there is a feature in the data with only one unique value. Replacing NaN values on the diagonal with 1 and others with 0.
* [python] Replace NaN values in depend matrix (Fix#537)
* [python] Show warning when replacing NaN values in depend matrix (#537)
* [python] Fix depend matrix NaN replacement warning (#537)
Uh oh!
There was an error while loading. Please reload this page.
Problem
I have a dataset which has a special column. Every row has the same value in this column. I fit a model on this data and I create an
Explainer
instance.When I try to create an
Aspect
with the explainer, I get an error:How to replicate
You can run the following code to replicate. Notice that third column in the data has the same value (3) in every row.
Cause
When initialising the Aspect instance, inside the
utils.calculate_depend_matrix
,corr
method ofpandas
is called with the data we provide. If there is a non-varying column, that column has NaN values in the resulting correlation matrix (related Pandas issue). When I change a value in the column with non-varying values, problem goes away.Solution
utils.calculate_depend_matrix
method can be updated to replace NaN values before returning thedepend_matrix
:When the method is updated this way, I am able to create an Aspect instance and call the
plot_dendrogram
method. Following plot is generated:Label 2 is the third column in my data, where all the rows have value 3.
The text was updated successfully, but these errors were encountered: