Skip to content

Feature request: Assign coords for new axis in xr.concat #839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jolespin opened this issue May 1, 2016 · 3 comments
Closed

Feature request: Assign coords for new axis in xr.concat #839

jolespin opened this issue May 1, 2016 · 3 comments
Labels

Comments

@jolespin
Copy link

jolespin commented May 1, 2016

It would be awesome to add coords while concatenating. Basically, combining this into one line:

DA_data = xr.concat(list(D_patient_DA.values()), dim="Patients"); DA_data.coords["Patients"] = list(D_patient_DA.keys())

For this dataset I made up, imagine 100 patients, 12 months, and 10000 attributes which would be a typical 3D dataset. Basically, I end up with a bunch of 2D DataArrays (row=months, col=attributes) this DataArray is the value in my dictionary and the patient it came from is the key (i.e. (patient_x : DataArray_X) )

I'm trying to do DA_data = xr.concat(list(D_patient_DA.values()), coords = list(D_patient_DA.keys()), dim="Patients") but it's not working and I need to split it up like DA_data = xr.concat(list(D_patient_DA.values()), dim="Patients"); DA_data.coords["Patients"] = list(D_patient_DA.keys())

Am I not writing the one-liner in the right format?
The docs say coords : {‘minimal’, ‘different’, ‘all’ o list of str} so it seems like it should work

Here is my code for generating fake data for this problem:

import xarray as xr
import numpy as np
from collections import * 

np.random.seed(1618033)
#Set dimensions
a,b,c = 100,12,10000 #100 patients, 12 months, 10000 attributes

#Create labels
patients = ["patient_%d" % i for i in range(a)]
months = [j for j in range(b)]
attributes = ["attr_%d" % k for k in range(c)]

#Dict of DataFrames
D_patient_DA = OrderedDict()

for i, patient in enumerate(patients):
    A_placeholder = np.zeros((b,c))
    for j, month in enumerate(months):
        #Genes x Replicates
        V_attrExp = np.random.random(c)
        #Fill array with row
        A_placeholder[j,:] = V_attrExp
    #Assign dataframe for every patient
    D_patient_DA[patient] = xr.DataArray(A_placeholder, coords = [months, attributes], dims = ["Months","Attributes"])

#I'd like to do this:
#DA_data = xr.concat(list(D_patient_DA.values()), coords = list(D_patient_DA.keys()), dim="Patients")

#Traceback (most recent call last):
#   File "Untitled.py", line 29, in <module>
#       DA_data = xr.concat(list(D_patient_DA.values()), coords = list(D_patient_DA.keys()), dim="Patients")
#   File "/Users/Mu/Dropbox/anaconda/lib/python3.5/site-packages/xarray/core/combine.py", line 114, in concat
#       return f(objs, dim, data_vars, coords, compat, positions)
#   File "/Users/Mu/Dropbox/anaconda/lib/python3.5/site-packages/xarray/core/combine.py", line 301, in _dataarray_concat
#       positions)
#   File "/Users/Mu/Dropbox/anaconda/lib/python3.5/site-packages/xarray/core/combine.py", line 207, in _dataset_concat
#       concat_over = _calc_concat_over(datasets, dim, data_vars, coords)
#   File "/Users/Mu/Dropbox/anaconda/lib/python3.5/site-packages/xarray/core/combine.py", line 186, in _calc_concat_over
#       concat_over.update(process_subset_opt(coords, 'coords'))
#   File "/Users/Mu/Dropbox/anaconda/lib/python3.5/site-packages/xarray/core/combine.py", line 177, in process_subset_opt
#       % (subset, subset_long_name, invalid_vars))
#ValueError: some variables in coords are not coordinates on the first dataset: ['patient_0', 'patient_1', 'patient_2', 'patient_3', 'patient_4', 'patient_5', 'patient_6', 'patient_7', 'patient_8', 'patient_9', 'patient_10', 'patient_11', 'patient_12', 'patient_13', 'patient_14', 'patient_15', 'patient_16', 'patient_17', 'patient_18', 'patient_19', 'patient_20', 'patient_21', 'patient_22', 'patient_23', 'patient_24', 'patient_25', 'patient_26', 'patient_27', 'patient_28', 'patient_29', 'patient_30', 'patient_31', 'patient_32', 'patient_33', 'patient_34', 'patient_35', 'patient_36', 'patient_37', 'patient_38', 'patient_39', 'patient_40', 'patient_41', 'patient_42', 'patient_43', 'patient_44', 'patient_45', 'patient_46', 'patient_47', 'patient_48', 'patient_49', 'patient_50', 'patient_51', 'patient_52', 'patient_53', 'patient_54', 'patient_55', 'patient_56', 'patient_57', 'patient_58', 'patient_59', 'patient_60', 'patient_61', 'patient_62', 'patient_63', 'patient_64', 'patient_65', 'patient_66', 'patient_67', 'patient_68', 'patient_69', 'patient_70', 'patient_71', 'patient_72', 'patient_73', 'patient_74', 'patient_75', 'patient_76', 'patient_77', 'patient_78', 'patient_79', 'patient_80', 'patient_81', 'patient_82', 'patient_83', 'patient_84', 'patient_85', 'patient_86', 'patient_87', 'patient_88', 'patient_89', 'patient_90', 'patient_91', 'patient_92', 'patient_93', 'patient_94', 'patient_95', 'patient_96', 'patient_97', 'patient_98', 'patient_99']

#But I have to do this instead
DA_data = xr.concat(list(D_patient_DA.values()), dim="Patients")
DA_data.coords["Patients"] = list(D_patient_DA.keys())
@MathieuSchopfer
Copy link

I don't think the coords key word argument is meant to specify new coordinates. Though, it would be really nice if it were possible to concatenate along a new dimension and easily provide the new coordinates in one line.

Would it possible to make this thread a feature request ?

@jolespin jolespin changed the title Can't assign coords for new axis in xr.concat Feature request: Assign coords for new axis in xr.concat Aug 17, 2016
@shoyer
Copy link
Member

shoyer commented Aug 17, 2016

Indeed coords is a bad keyword argument name. On concat it indicates which coordinates from the concatenated objects should be concatenated. It probably should be renamed something like which_coords. It's not a way to set new coordinates.

So you can actually do this right now if you provide a DataArray or pandas.Index as the argument dim argument to concat.

Instead of:

DA_data = xr.concat(list(D_patient_DA.values()), dim="Patients")
DA_data.coords["Patients"] = list(D_patient_DA.keys())

you could write:

DA_data = xr.concat(list(D_patient_DA.values()),
                    dim=pandas.Index(D_patient_DA.keys(), name='Patients'))

But in this particular case (converting a dictionary to a DataArray), you can actually just use the Dataset.to_array() method instead, e.g., xr.Dataset(D_patient_DA).to_array(dim='Patients')

@stale
Copy link

stale bot commented Jan 26, 2019

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Jan 26, 2019
@stale stale bot closed this as completed Feb 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants