Skip to content

Commit cdaeed5

Browse files
committed
handle the following issue
automl#1373 (comment) this commit fixes the first 3 bullet points on the to do list. 1. rename hyperparameter "ngram_range" --> "ngram_upper_bound" this includes changing all *csv and *json files 2. Create a new textpreprocessing example_text_preprocessing.py, this new example features the 20Newsgroups dataset import in example_text_preprocessing.py to long, but i can not come up with a good solution include feedback from 02.24.
1 parent db62290 commit cdaeed5

File tree

2 files changed

+5
-7
lines changed

2 files changed

+5
-7
lines changed

examples/40_advanced/example_text_preprocessing.py

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,15 @@
11
# -*- encoding: utf-8 -*-
22
"""
3-
==============
3+
==================
44
Text preprocessing
5-
==============
5+
==================
66
77
The following example shows how to fit a simple NLP problem with
88
*auto-sklearn*.
99
10-
For deeper insights into the field of text preprocessing you can follow these links:
10+
For an introduction to text preprocessing you can follow these links:
1111
1. https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
1212
2. https://machinelearningmastery.com/clean-text-machine-learning-python/
13-
14-
1513
"""
1614
from pprint import pprint
1715

test/test_pipeline/components/data_preprocessing/test_data_preprocessing_text.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@
22

33
import numpy as np
44
import pandas as pd
5-
from autosklearn.pipeline.components.data_preprocessing.text_encoding.bag_of_word_encoding import (
5+
from autosklearn.pipeline.components.data_preprocessing.text_encoding.bag_of_word_encoding import ( # noqa: E501
66
BagOfWordEncoder as BOW,
77
)
8-
from autosklearn.pipeline.components.data_preprocessing.text_encoding.bag_of_word_encoding_distinct import (
8+
from autosklearn.pipeline.components.data_preprocessing.text_encoding.bag_of_word_encoding_distinct import ( # noqa: E501
99
BagOfWordEncoder as BOW_distinct,
1010
)
1111

0 commit comments

Comments
 (0)