Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Support pathlib in FileDataStream #269

@ianlini

Description

@ianlini

pathlib is a built-in module that is very popular in Python. Almost all APIs in Python built-in modules, numpy and pandas support path-like objects as arguments for path-related parameters. Therefore, it would be better to support them in FileDataStream.

Current behavior:

In [1]: from nimbusml import FileDataStream

In [2]: from pathlib import Path

In [3]: test= FileDataStream.read_csv(Path('test.csv'))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-41a5e889c3ff> in <module>
----> 1 test= FileDataStream.read_csv(Path('test.csv'))

~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/utils.py in wrapper(*args, **kwargs)
    218                          '__qualname__',
    219                          func.__name__)))
--> 220             params = func(*args, **kwargs)
    221             if verbose > 0:
    222                 logger_trace.info(

~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/data_stream.py in read_csv(filepath_or_buffer, tool, nrows, **kwargs)
    306         if tool == 'pandas':
    307             return FileDataStream.read_csv_pandas(
--> 308                 filepath_or_buffer, nrows=nrows, **kwargs)
    309         elif tool == 'internal':
    310             if 'schema' not in kwargs:

~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/utils.py in wrapper(*args, **kwargs)
    218                          '__qualname__',
    219                          func.__name__)))
--> 220             params = func(*args, **kwargs)
    221             if verbose > 0:
    222                 logger_trace.info(

~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/data_stream.py in read_csv_pandas(filepath_or_buffer, nrows, collapse, numeric_dtype, **kwargs)
    340         """
    341         schema = DataSchema.read_schema(filepath_or_buffer, collapse=collapse,
--> 342                                         numeric_dtype=numeric_dtype, **kwargs)
    343         return FileDataStream(filepath_or_buffer, schema)
    344

~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/data_schema.py in read_schema(*data, **options)
    855                 raise TypeError(
    856                     "Unable to guess the schema for type '{0}'".format(
--> 857                         type(X)))
    858             final_schema = sch
    859

TypeError: Unable to guess the schema for type '<class 'pathlib.PosixPath'>'

Expected behavior:
FileDataStream.read_csv(Path('test.csv')) is equivalent to FileDataStream.read_csv('test.csv').

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions