Feature request: Assign coords for new axis in xr.concat #839

jolespin · 2016-05-01T00:34:51Z

It would be awesome to add coords while concatenating. Basically, combining this into one line:

DA_data = xr.concat(list(D_patient_DA.values()), dim="Patients"); DA_data.coords["Patients"] = list(D_patient_DA.keys())

For this dataset I made up, imagine 100 patients, 12 months, and 10000 attributes which would be a typical 3D dataset. Basically, I end up with a bunch of 2D DataArrays (row=months, col=attributes) this DataArray is the value in my dictionary and the patient it came from is the key (i.e. (patient_x : DataArray_X) )

I'm trying to do DA_data = xr.concat(list(D_patient_DA.values()), coords = list(D_patient_DA.keys()), dim="Patients") but it's not working and I need to split it up like DA_data = xr.concat(list(D_patient_DA.values()), dim="Patients"); DA_data.coords["Patients"] = list(D_patient_DA.keys())

Am I not writing the one-liner in the right format?
The docs say coords : {‘minimal’, ‘different’, ‘all’ o list of str} so it seems like it should work

Here is my code for generating fake data for this problem:

import xarray as xr
import numpy as np
from collections import * 

np.random.seed(1618033)
#Set dimensions
a,b,c = 100,12,10000 #100 patients, 12 months, 10000 attributes

#Create labels
patients = ["patient_%d" % i for i in range(a)]
months = [j for j in range(b)]
attributes = ["attr_%d" % k for k in range(c)]

#Dict of DataFrames
D_patient_DA = OrderedDict()

for i, patient in enumerate(patients):
    A_placeholder = np.zeros((b,c))
    for j, month in enumerate(months):
        #Genes x Replicates
        V_attrExp = np.random.random(c)
        #Fill array with row
        A_placeholder[j,:] = V_attrExp
    #Assign dataframe for every patient
    D_patient_DA[patient] = xr.DataArray(A_placeholder, coords = [months, attributes], dims = ["Months","Attributes"])

#I'd like to do this:
#DA_data = xr.concat(list(D_patient_DA.values()), coords = list(D_patient_DA.keys()), dim="Patients")

#Traceback (most recent call last):
#   File "Untitled.py", line 29, in <module>
#       DA_data = xr.concat(list(D_patient_DA.values()), coords = list(D_patient_DA.keys()), dim="Patients")
#   File "/Users/Mu/Dropbox/anaconda/lib/python3.5/site-packages/xarray/core/combine.py", line 114, in concat
#       return f(objs, dim, data_vars, coords, compat, positions)
#   File "/Users/Mu/Dropbox/anaconda/lib/python3.5/site-packages/xarray/core/combine.py", line 301, in _dataarray_concat
#       positions)
#   File "/Users/Mu/Dropbox/anaconda/lib/python3.5/site-packages/xarray/core/combine.py", line 207, in _dataset_concat
#       concat_over = _calc_concat_over(datasets, dim, data_vars, coords)
#   File "/Users/Mu/Dropbox/anaconda/lib/python3.5/site-packages/xarray/core/combine.py", line 186, in _calc_concat_over
#       concat_over.update(process_subset_opt(coords, 'coords'))
#   File "/Users/Mu/Dropbox/anaconda/lib/python3.5/site-packages/xarray/core/combine.py", line 177, in process_subset_opt
#       % (subset, subset_long_name, invalid_vars))
#ValueError: some variables in coords are not coordinates on the first dataset: ['patient_0', 'patient_1', 'patient_2', 'patient_3', 'patient_4', 'patient_5', 'patient_6', 'patient_7', 'patient_8', 'patient_9', 'patient_10', 'patient_11', 'patient_12', 'patient_13', 'patient_14', 'patient_15', 'patient_16', 'patient_17', 'patient_18', 'patient_19', 'patient_20', 'patient_21', 'patient_22', 'patient_23', 'patient_24', 'patient_25', 'patient_26', 'patient_27', 'patient_28', 'patient_29', 'patient_30', 'patient_31', 'patient_32', 'patient_33', 'patient_34', 'patient_35', 'patient_36', 'patient_37', 'patient_38', 'patient_39', 'patient_40', 'patient_41', 'patient_42', 'patient_43', 'patient_44', 'patient_45', 'patient_46', 'patient_47', 'patient_48', 'patient_49', 'patient_50', 'patient_51', 'patient_52', 'patient_53', 'patient_54', 'patient_55', 'patient_56', 'patient_57', 'patient_58', 'patient_59', 'patient_60', 'patient_61', 'patient_62', 'patient_63', 'patient_64', 'patient_65', 'patient_66', 'patient_67', 'patient_68', 'patient_69', 'patient_70', 'patient_71', 'patient_72', 'patient_73', 'patient_74', 'patient_75', 'patient_76', 'patient_77', 'patient_78', 'patient_79', 'patient_80', 'patient_81', 'patient_82', 'patient_83', 'patient_84', 'patient_85', 'patient_86', 'patient_87', 'patient_88', 'patient_89', 'patient_90', 'patient_91', 'patient_92', 'patient_93', 'patient_94', 'patient_95', 'patient_96', 'patient_97', 'patient_98', 'patient_99']

#But I have to do this instead
DA_data = xr.concat(list(D_patient_DA.values()), dim="Patients")
DA_data.coords["Patients"] = list(D_patient_DA.keys())

The text was updated successfully, but these errors were encountered:

MathieuSchopfer · 2016-08-17T07:07:25Z

I don't think the coords key word argument is meant to specify new coordinates. Though, it would be really nice if it were possible to concatenate along a new dimension and easily provide the new coordinates in one line.

Would it possible to make this thread a feature request ?

shoyer · 2016-08-17T21:26:47Z

Indeed coords is a bad keyword argument name. On concat it indicates which coordinates from the concatenated objects should be concatenated. It probably should be renamed something like which_coords. It's not a way to set new coordinates.

So you can actually do this right now if you provide a DataArray or pandas.Index as the argument dim argument to concat.

Instead of:

DA_data = xr.concat(list(D_patient_DA.values()), dim="Patients")
DA_data.coords["Patients"] = list(D_patient_DA.keys())

you could write:

DA_data = xr.concat(list(D_patient_DA.values()),
                    dim=pandas.Index(D_patient_DA.keys(), name='Patients'))

But in this particular case (converting a dictionary to a DataArray), you can actually just use the Dataset.to_array() method instead, e.g., xr.Dataset(D_patient_DA).to_array(dim='Patients')

stale · 2019-01-26T20:18:48Z

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically

jolespin changed the title ~~Can't assign coords for new axis in xr.concat~~ Feature request: Assign coords for new axis in xr.concat Aug 17, 2016

ceridwen mentioned this issue Oct 23, 2017

Make passing a DataArray for the xarray.concat dim argument equivalent to passing a pandas Index #1646

Open

stale bot added the stale label Jan 26, 2019

stale bot closed this as completed Feb 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Feature request: Assign coords for new axis in xr.concat #839

Feature request: Assign coords for new axis in xr.concat #839

jolespin commented May 1, 2016 •

edited

Loading

MathieuSchopfer commented Aug 17, 2016

Uh oh!

shoyer commented Aug 17, 2016

Uh oh!

stale bot commented Jan 26, 2019

Uh oh!

Uh oh!

Feature request: Assign coords for new axis in xr.concat #839

Feature request: Assign coords for new axis in xr.concat #839

Comments

jolespin commented May 1, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

MathieuSchopfer commented Aug 17, 2016

Uh oh!

shoyer commented Aug 17, 2016

Uh oh!

stale bot commented Jan 26, 2019

Uh oh!

jolespin commented May 1, 2016 •

edited

Loading