-
Notifications
You must be signed in to change notification settings - Fork 8
Add CSV dialect support for reading and writing operations #361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Introduced `merge_csv_options_with_dialect` function to handle merging CSV options with dialect settings. - Updated `read_to_*` functions to utilize the new merging logic, ensuring compatibility with user-defined CSV options. - Enhanced `write` functions to validate headers and merge options, providing warnings for conflicts between user options and dialect settings. - Added tests to verify correct handling of CSV dialects and option overrides in both reading and writing scenarios.
for more information, see https://pre-commit.ci
- Reformatted code in `merge_csv_options_with_dialect` and related functions for better readability by breaking long lines. - Ensured consistent formatting in test files for CSV data and dialect definitions. - Added missing newlines at the end of some files to adhere to coding standards.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Many thanks for working into this! I'll review and provide feedback within the next few days. |
- Corrected assertion from len(data) == 1 to len(data) == 2 - Added proper assertions for data structure validation - Test now correctly validates CSV dialect override behavior
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #361 +/- ##
=======================================
Coverage ? 94.20%
=======================================
Files ? 7
Lines ? 483
Branches ? 0
=======================================
Hits ? 455
Misses ? 28
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for putting this together. It looks really good. I have a few comments to polish the implementation, but the functionality is there.
- Add comprehensive docstring to merge_csv_options_with_dialect with parameter and return descriptions - Reduce nesting by early return in merge_csv_options_with_dialect function - Replace hardcoded dialect_mapping with direct iteration over CSVDialectValidator.model_fields - Move merge_csv_options_with_dialect to new csvy/utils.py module - Update readers.py and writers.py to import from utils.py - Remove duplicate function definitions and unused imports - Fix Pydantic deprecation warning by accessing model_fields from class instead of instance - All pre-commit hooks pass with proper formatting and linting
- Move merge_csv_options_with_dialect to new csvy/utils.py module - Add comprehensive docstring with args/returns description - Reduce nesting by using early return pattern - Remove dialect_mapping dict, iterate directly over validator fields - Update readers.py and writers.py to import from utils - Add tests/test_utils.py with full test coverage
for more information, see https://pre-commit.ci
I have tried to address all your comments from the code review!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for the changes. It looks much better. However, I've found a small issue that needs to be fixed before we can merge the PR.
- Add overrides parameter to merge_csv_options_with_dialect() - Support library-specific option names (pandas: sep, polars: separator) - Detect and resolve conflicts between user options and dialect settings - Update CSVY headers when conflicts occur with user warnings - Add comprehensive test coverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good, and I just have a couple of minor comments.
# Determine overrides based on data type | ||
overrides = {} | ||
try: | ||
import pandas as pd | ||
|
||
if isinstance(data, pd.DataFrame): | ||
overrides = {"sep": "delimiter"} | ||
except ImportError: | ||
pass | ||
|
||
try: | ||
import polars as pl | ||
|
||
if isinstance(data, pl.DataFrame | pl.LazyFrame): | ||
overrides = {"separator": "delimiter"} | ||
except ImportError: | ||
pass | ||
|
||
# For numpy arrays and lists, use standard dialect names (no overrides needed) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you separate this into a separate get_overrides
function within utils.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there's something missing here? Probably it should be a test within test_utils.py
rather that its own file.
@all-contributors please add @Kaos599 for code, test |
I've put up a pull request to add @Kaos599! 🎉 |
@Kaos599 there're also some tests failing. |
merge_csv_options_with_dialect
function to handle merging CSV options with dialect settings.read_to_*
functions to utilize the new merging logic, ensuring compatibility with user-defined CSV options.write
functions to validate headers and merge options, providing warnings for conflicts between user options and dialect settings.Resolves #82