Skip to content

[DONATION] of new datasets #93

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
PaulRabich opened this issue Oct 27, 2023 · 11 comments
Closed

[DONATION] of new datasets #93

PaulRabich opened this issue Oct 27, 2023 · 11 comments

Comments

@PaulRabich
Copy link

Hi

In the Paper "Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency" found here https://arxiv.org/abs/2206.08496 they use the following datasets:

  1. https://figshare.com/articles/dataset/TF-C_Pretrain_SleepEEG/19930178
  2. https://figshare.com/articles/dataset/TF-C_Pretrain_Epilepsy/19930199
  3. https://figshare.com/articles/dataset/TF-C_Pretrain_FD-A/19930205
  4. https://figshare.com/articles/dataset/TF-C_Pretrain_FD-B/19930226
  5. https://figshare.com/articles/dataset/TF-C_Pretrain_HAR/19930244
  6. https://figshare.com/articles/dataset/TF-C_Pretrain_Gesture/19930247
  7. https://figshare.com/articles/dataset/TF-C_Pretrain_ECG/19930253
  8. https://figshare.com/articles/dataset/TF-C_Pretrain_EMG/19930250

All of these datasets are published under the https://creativecommons.org/licenses/by/4.0/ licence.

Would it be possible to add them? And if yes, what are the next steps for uploading them?

@TonyBagnall
Copy link
Member

hi, we would welcome these data. Are they all labelled classification problems? If so, the next stage is to get it into our format. If they are equal length, you get the data into memory so that

X = np.ndarray shape (n_cases, n_channels, n_timepoints)
y = np.ndarray shape n_cases

if unequal length make X a list of ndarray (n_channels, n_timepoints_i) where n_timepoints_i is the length of the ith case.

then you should be able to write them to aeon compatible format

     from aeon.datasets import write_to_tsfile
    write_to_tsfile(X, path = "your_directory", y=y, problem_name="your_filename.ts")

if there is a provided train test split, create trainX, trainy, testX, testy

     from aeon.datasets import write_to_tsfile
    write_to_tsfile(trainX, path = "your_directory", y=trainy, problem_name="your_filename_TRAIN.ts")
    write_to_tsfile(testX, path = "your_directory", y=tresty, problem_name="your_filename_TEST.ts")

you can check it works with this

     from aeon.datasets import load_from_tsfile
    X, y, meta = load_from_tsfile(full_file_path_and_name="your_directory\your_filename.ts", return_meta_data=True)

if there is no provided train test split we create one, but you need to be careful, if there are repetitions from the same subject (e.g. one person repeats a HAR task many times) you need to be clear if you are splitting so train and test do not contain the same person or not. Any problems, let us know

@PaulRabich
Copy link
Author

Hello, the data comes with presplit train, validation and test sets.

I have converted them all into .ts files. And i can load them with the load_from_tsfile function.

What is the next step?

@TonyBagnall
Copy link
Member

fantastic, next stage is to get them to us. How big are they? you can email to [email protected] or we can find another way. I will then list them on the site.

Is there a text description we could use? And preferably an image? I set up the pages something like this

https://timeseriesclassification.com/description.php?Dataset=AsphaltObstacles

Not sure how to handle validation set, would be tempted to merge it into train, since its really part of the training.

We will try out our standard suite of classifiers and they can go into the next batch release. Hoping to improve the website this year, will try get an intern as my web skills are not the best :)

@TonyBagnall
Copy link
Member

got the data, thanks, will process it all next week.

@PaulRabich
Copy link
Author

If all goes ok, and there is nothing more to do from my side, i would have another set of datasets

@TonyBagnall
Copy link
Member

will post here as I do them, if you could check that would be great. Ive changed the names to conform to our standards but hopefully links make it clear.
https://timeseriesclassification.com/description.php?Dataset=Sleep
https://timeseriesclassification.com/description.php?Dataset=WalkingSittingStanding

@TonyBagnall
Copy link
Member

@TonyBagnall
Copy link
Member

@TonyBagnall
Copy link
Member

@TonyBagnall
Copy link
Member

@TonyBagnall
Copy link
Member

and lastly
https://timeseriesclassification.com/description.php?Dataset=Epilepsy2

image

Not putting Gesture in as its really UWave. Happy to put more in if you have them @PaulRabich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants