Skip to content

feat: nice_xx #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 7, 2024
Merged

feat: nice_xx #1

merged 5 commits into from
Nov 7, 2024

Conversation

JarbasAl
Copy link
Member

@JarbasAl JarbasAl commented Nov 7, 2024

Summary by CodeRabbit

  • New Features

    • Introduced localization formats for date and time in Azerbaijani, Catalan, Czech, Danish, German, English (US), Basque, Persian, French, Hungarian, Italian, Dutch, Polish, Russian, Slovenian, Swedish, and Ukrainian.
    • Added comprehensive test cases for validating date and time formatting in each supported language.
  • Bug Fixes

    • Improved handling of various date and time formats, ensuring accurate representation across different languages.
  • Documentation

    • Enhanced documentation for date and time localization, including examples and usage guidelines.
    • Restructured "Languages Supported" section for better clarity and organization.

Copy link
Contributor

coderabbitai bot commented Nov 7, 2024

Warning

Rate limit exceeded

@JarbasAl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 18 minutes and 58 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between b7d148e and 5fc57aa.

Walkthrough

The changes introduce a new class, DateTimeFormat, in the ovos_date_parser/__init__.py file, which centralizes date and time formatting logic. It includes methods for caching language-specific configurations and handling various date formats. Additionally, new JSON files for multiple languages (Azerbaijani, Catalan, Czech, Danish, German, English, Basque, Persian, French, Hungarian, Italian, Dutch, Polish, Russian, Slovenian, Swedish, and Ukrainian) are added, each containing structured localization formats for date and time. Corresponding test files are also created to validate the formatting functionality for these languages.

Changes

File Path Change Summary
ovos_date_parser/__init__.py Added DateTimeFormat class with methods for caching and formatting dates. Updated existing functions to utilize the new class for language handling.
ovos_date_parser/res/az-az/date_time.json New file defining Azerbaijani date and time localization formats.
ovos_date_parser/res/az-az/date_time_test.json New file with test cases for Azerbaijani date and time formatting.
ovos_date_parser/res/ca-es/date_time.json New file defining Catalan date and time localization formats.
ovos_date_parser/res/ca-es/date_time_test.json New file with test cases for Catalan date and time formatting.
ovos_date_parser/res/cs-cz/date_time.json New file defining Czech date and time localization formats.
ovos_date_parser/res/cs-cz/date_time_test.json New file with test cases for Czech date and time formatting.
ovos_date_parser/res/da-dk/date_time.json New file defining Danish date and time localization formats.
ovos_date_parser/res/da-dk/date_time_test.json New file with test cases for Danish date and time formatting.
ovos_date_parser/res/de-de/date_time.json New file defining German date and time localization formats.
ovos_date_parser/res/de-de/date_time_test.json New file with test cases for German date and time formatting.
ovos_date_parser/res/en-us/date_time.json New file defining English (US) date and time localization formats.
ovos_date_parser/res/en-us/date_time_test.json New file with test cases for English (US) date and time formatting.
ovos_date_parser/res/eu-eu/date_time.json New file defining Basque date and time localization formats.
ovos_date_parser/res/fa-ir/date_time.json New file defining Persian date and time localization formats.
ovos_date_parser/res/fa-ir/date_time_test.json New file with test cases for Persian date and time formatting.
ovos_date_parser/res/fr-fr/date_time.json New file defining French date and time localization formats.
ovos_date_parser/res/fr-fr/date_time_test.json New file with test cases for French date and time formatting.
ovos_date_parser/res/hu-hu/date_time.json New file defining Hungarian date and time localization formats.
ovos_date_parser/res/hu-hu/date_time_test.json New file with test cases for Hungarian date and time formatting.
ovos_date_parser/res/it-it/date_time.json New file defining Italian date and time localization formats.
ovos_date_parser/res/it-it/date_time_test.json New file with test cases for Italian date and time formatting.
ovos_date_parser/res/nl-nl/date_time.json New file defining Dutch date and time localization formats.
ovos_date_parser/res/nl-nl/date_time_test.json New file with test cases for Dutch date and time formatting.
ovos_date_parser/res/pl-pl/date_time.json New file defining Polish date and time localization formats.
ovos_date_parser/res/ru-ru/date_time.json New file defining Russian date and time localization formats.
ovos_date_parser/res/ru-ru/date_time_test.json New file with test cases for Russian date and time formatting.
ovos_date_parser/res/sl-si/date_time.json New file defining Slovenian date and time localization formats.
ovos_date_parser/res/sl-si/date_time_test.json New file with test cases for Slovenian date and time formatting.
ovos_date_parser/res/sv-se/date_time.json New file defining Swedish date and time localization formats.
ovos_date_parser/res/sv-se/date_time_test.json New file with test cases for Swedish date and time formatting.
ovos_date_parser/res/uk-ua/date_time.json New file defining Ukrainian date and time localization formats.
ovos_date_parser/res/uk-ua/date_time_test.json New file with test cases for Ukrainian date and time formatting.
README.md Updated to include a more detailed breakdown of functionalities available for each language, with clearer presentation of supported methods.

Poem

In the meadow, where time hops,
A new format for each tick-tock,
Languages bloom, like flowers bright,
Dates and times, a joyful sight!
With every test, we leap and play,
Celebrating change, hip-hip-hooray! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 57

🧹 Outside diff range and nitpick comments (51)
ovos_date_parser/res/uk-ua/date_time_test.json (1)

1-23: Consider adding documentation and expanding test coverage

The test file is well-structured but could benefit from:

  1. A header comment explaining the test file's purpose and structure
  2. Additional test cases for:
    • Special dates (New Year, holidays)
    • Date ranges (next week, last month)
    • Time zones handling (if supported)

Would you like me to help generate documentation or additional test cases?

ovos_date_parser/res/da-dk/date_time_test.json (2)

2-17: Consider adding more edge cases to year tests

The current test cases provide good coverage but could be enhanced with additional scenarios:

  • Year 0 handling
  • Negative year handling
  • Future years (e.g., 2525, 3000)

18-28: Consider reordering date test cases

While the test cases cover various scenarios well, consider reordering them by temporal distance from the reference date. This would make it easier to verify the progression of format changes:

  1. Same day ("i dag")
  2. Adjacent days ("i morgen", "i går")
  3. Same month
  4. Same year
  5. Different years
ovos_date_parser/res/fa-ir/date_time_test.json (1)

21-31: Add tests for calendar edge cases.

Consider adding test cases for:

  1. Last day of different months (e.g., Jan 31, Apr 30)
  2. First day of month
  3. Leap year date (February 29)
  4. Week boundary transitions
ovos_date_parser/res/sl-si/date_time.json (1)

1-123: LGTM! Well-structured localization data with comprehensive coverage.

The JSON structure provides complete coverage for Slovenian date and time formatting needs:

  • All necessary format patterns are defined
  • Proper handling of special cases
  • Follows Slovenian grammar rules and conventions

Consider adding a schema file to validate the structure of language files, ensuring consistency across all localizations.

ovos_date_parser/res/da-dk/date_time.json (2)

2-20: Consider making number patterns more flexible.

The regex patterns could be more permissive to handle edge cases:

  • Pattern ^1$ for single digit "1" could miss cases where leading zeros are present
  • Pattern ^\d$ might need to handle whitespace or leading zeros

Consider updating the patterns:

-    "1": {"match": "^1$", "format": "et"},
-    "2": {"match": "^\\d$", "format": "{x}"},
+    "1": {"match": "^0*1$", "format": "et"},
+    "2": {"match": "^0*[0-9]$", "format": "{x}"},

31-41: Add support for relative time expressions.

The date_format section could be enhanced with common Danish relative time expressions.

Consider adding:

   "date_format": {
     "date_full": "{weekday}, den {day} {month}, {formatted_year}",
     "date_full_no_year": "{weekday}, den {day} {month}",
     "date_full_no_year_month": "{weekday}, den {day}",
     "today": "i dag",
     "tomorrow": "i morgen",
-    "yesterday": "i går"
+    "yesterday": "i går",
+    "day_before_yesterday": "i forgårs",
+    "day_after_tomorrow": "i overmorgen"
   },
ovos_date_parser/res/sv-se/date_time.json (2)

2-8: Consider adding "tal" suffix for decades

In Swedish, decades are typically expressed with the suffix "tal" (e.g., "åttiotalet" for "the eighties"). Consider updating the decade formats to include this suffix when appropriate.

   "decade_format": {
     "1": {"match": "^\\d$", "format": "{x}"},
     "2": {"match": "^1\\d$", "format": "{xx}"},
-    "3": {"match": "^\\d0$", "format": "{x0}"},
-    "4": {"match": "^[2-9]\\d$", "format": "{x0} {x}"},
+    "3": {"match": "^\\d0$", "format": "{x0}talet"},
+    "4": {"match": "^[2-9]\\d$", "format": "{x0}talet"},
     "default": "{number}"
   }

43-98: Consider adding capitalization rules for months and weekdays

In Swedish, months and weekdays are typically not capitalized unless they start a sentence. However, it might be useful to add capitalization rules or a note about capitalization handling for cases where these words appear at the start of a sentence.

Consider adding a new section for capitalization rules:

"capitalization": {
  "months": false,
  "weekdays": false,
  "sentence_start": true
}
ovos_date_parser/res/cs-cz/date_time.json (1)

1-1: Consider adding format documentation.

The file uses various format placeholders ({x}, {xx}, {x0}, etc.) but lacks documentation explaining their meaning. Consider adding a comment section at the top of the file to explain these placeholders.

 {
+  "// Format placeholders": {
+    "{x}": "Single digit (0-9)",
+    "{xx}": "Two digits (10-99)",
+    "{x0}": "Tens (10, 20, etc.)"
+  },
   "decade_format": {
ovos_date_parser/res/az-az/date_time_test.json (3)

2-27: Consider adding more edge cases to test_nice_year.

While the current test suite is comprehensive, consider adding these scenarios to improve coverage:

  1. Year 0 (edge case)
  2. Negative years with larger numbers (e.g., -10000)
  3. Future years beyond 5200
  4. Years with repeated digits (e.g., 2222)

28-38: Enhance test_nice_date coverage with additional scenarios.

Consider adding these test cases to improve robustness:

  1. First and last day of the year
  2. Leap year dates (February 29th)
  3. Month transitions (e.g., January 31st to February 1st)
  4. Week transitions
  5. Special dates (New Year, national holidays)

1-43: Add documentation to improve test maintainability.

Consider adding:

  1. A header comment explaining the test file's purpose
  2. Documentation for the expected format of datetime_param
  3. Examples of how to add new test cases
  4. References to related localization files
ovos_date_parser/res/cs-cz/date_time_test.json (1)

28-38: Enhance date test coverage with edge cases.

Consider adding test cases for:

  1. Calendar edge cases:

    • First/last day of month
    • February 29th (leap years)
    • December 31st to January 1st transition
  2. Time-dependent edge cases:

    • Different times of day for relative date calculations
    • Midnight transitions
ovos_date_parser/res/ca-es/date_time_test.json (2)

28-38: Add test cases for edge dates and invalid scenarios.

While the current test cases cover basic scenarios well, consider adding tests for:

  1. Month boundaries (first/last day of month)
  2. Leap year dates (February 29)
  3. Invalid dates (e.g., February 30) to ensure proper error handling
  4. Different years within the same month (to test year inclusion/exclusion logic)

1-43: Improve datetime parameter validation and format.

The current approach of passing datetime parameters as comma-separated strings is fragile:

  1. Consider using ISO format strings (e.g., "2017-01-31T13:22:03") for datetime_param
  2. Add validation for parameter ranges (months 1-12, days 1-31, hours 0-23, etc.)
  3. Consider adding explicit timezone parameters if needed

This will improve test maintainability and catch invalid test data early.

ovos_date_parser/res/en-us/date_time_test.json (2)

28-38: Enhance date formatting test coverage.

The current test cases don't cover several important scenarios:

  1. First/last days of months (e.g., January 1st, March 31st)
  2. Leap year dates (February 29th)
  3. Different month lengths
  4. Date ordinals (1st, 2nd, 3rd, 4th, etc.)

Consider adding these test cases:

{
  "10": {"datetime_param": [2024, 2, 29, 0, 2, 3], "now": null, "assertEqual": "thursday, february twenty-ninth, twenty twenty four"},
  "11": {"datetime_param": [2017, 3, 1, 0, 2, 3], "now": null, "assertEqual": "wednesday, march first, twenty seventeen"},
  "12": {"datetime_param": [2017, 4, 30, 0, 2, 3], "now": null, "assertEqual": "sunday, april thirtieth, twenty seventeen"}
}

1-43: Add documentation for test data structure.

Consider adding a comment section at the beginning of the file to document:

  1. The expected format and types for datetime_param
  2. The meaning and valid values for each field
  3. The format requirements for assertEqual strings

Add this documentation at the start of the file:

 {
+  "// Documentation": {
+    "datetime_param": "Array of [year, month, day, hour, minute, second]",
+    "bc": "Boolean indicating if year is BC",
+    "now": "Optional reference datetime for relative dates",
+    "use_24hour": "Boolean to toggle 24-hour time format",
+    "use_ampm": "Boolean to toggle AM/PM indicators",
+    "assertEqual": "Expected output string in standard English format"
+  },
   "test_nice_year": {
ovos_date_parser/res/it-it/date_time_test.json (3)

2-25: Standardize datetime parameter format and add year 0 test case.

Two suggestions for improvement:

  1. The datetime_param values are currently strings with comma-separated values. Consider using ISO format or array notation for consistency with Python's datetime parsing.

  2. Add a test case for year 0, as it's a special case in the Gregorian calendar system.

 "1": {"datetime_param": "1, 1, 31, 13, 22, 3", "bc": "True", "assertEqual": "uno a.C." },
+  "25": {"datetime_param": "0, 1, 31, 13, 22, 3", "bc": "None", "assertEqual": "zero" },

27-37: Enhance date test coverage with edge cases and special dates.

The current test suite covers basic date formatting well, but consider adding test cases for:

  1. Month boundaries (first/last day of month)
  2. Culturally significant dates (e.g., Capodanno - New Year's Day)
  3. Leap year dates (29 February)
   "9": {"datetime_param": "2018, 2, 4, 0, 2, 3", "now": "2019, 2, 6, 0, 2, 3", "assertEqual": "domenica, quattro febbraio, duemila diciotto"}
+  "10": {"datetime_param": "2024, 2, 29, 0, 2, 3", "now": "None", "assertEqual": "giovedì, ventinove febbraio, duemila ventiquattro"},
+  "11": {"datetime_param": "2024, 1, 1, 0, 2, 3", "now": "None", "assertEqual": "lunedì, uno gennaio, duemila ventiquattro"},
+  "12": {"datetime_param": "2024, 12, 31, 0, 2, 3", "now": "None", "assertEqual": "martedì, trentuno dicembre, duemila ventiquattro"}

1-42: Use proper JSON boolean values.

The test cases use string values ("True"/"False"/"None") for boolean fields. Consider using proper JSON boolean values:

  • "True" → true
  • "False" → false
  • "None" → null

Example:

-    "1": {"datetime_param": "1, 1, 31, 13, 22, 3", "bc": "True", "assertEqual": "uno a.C." },
+    "1": {"datetime_param": "1, 1, 31, 13, 22, 3", "bc": true, "assertEqual": "uno a.C." },
ovos_date_parser/res/sv-se/date_time_test.json (1)

2-27: Consider adding additional year test cases for better coverage

While the current test cases cover a good range of scenarios, consider adding these cases to improve coverage:

  • Year 0 (edge case)
  • Negative years (if supported)
  • More millennium years (e.g., 3000, 4000)
  • Years with repeated digits (e.g., 2222)
ovos_date_parser/res/sl-si/date_time_test.json (2)

2-27: Standardize the BC flag values across test cases.

The test cases use inconsistent values for the bc flag:

  • Some use "None" (e.g., line 6)
  • Others use "False" (e.g., line 24)
    when representing non-BC years.

This inconsistency could lead to confusion. Consider standardizing to use either "False" or null (JSON's native null) for non-BC years across all test cases.


28-38: Consider adding edge case tests for dates.

While the current test cases cover basic scenarios well, consider adding tests for:

  • Month boundaries (first/last days)
  • Leap year dates (February 29)
  • Special dates (New Year's Eve, first day of year)

These additional cases would help ensure robust date handling across edge scenarios.

ovos_date_parser/res/nl-nl/date_time_test.json (2)

28-38: Enhance date test coverage with additional scenarios.

The current test cases miss several important scenarios:

  1. Edge cases:

    • Dates around year boundaries (Dec 31/Jan 1)
    • Leap years (February 29)
    • Week boundaries for relative dates
  2. Localization specifics:

    • Dutch date format typically uses different ordinal indicators
    • Line 29: "eenendertig januari" might be better as "31e januari" in some contexts

Consider adding these test cases to ensure robust date handling across various scenarios.


1-43: Consider restructuring test data for better maintainability.

While the test cases provide good basic coverage, consider these architectural improvements:

  1. Separate test data into smaller, focused files:

    • year_tests.json
    • date_tests.json
    • time_tests.json
  2. Add a schema validation file to ensure consistent test case structure across all language files.

  3. Consider adding metadata:

    • Default format preferences for the locale
    • Version information
    • Cultural notes/requirements

This structure would make it easier to maintain test cases across different languages and ensure consistent coverage.

ovos_date_parser/res/nl-nl/date_time.json (1)

37-38: Consider revising date format to match Dutch conventions

The current format includes commas after weekdays, which is not typical in Dutch date formatting. Consider removing the commas:

-    "date_full": "{weekday}, {day} {month}, {formatted_year}",
-    "date_full_no_year": "{weekday}, {day} {month}",
+    "date_full": "{weekday} {day} {month} {formatted_year}",
+    "date_full_no_year": "{weekday} {day} {month}"
ovos_date_parser/res/hu-hu/date_time.json (1)

43-45: Add time-only format template.

The date_time_format section should include a separate format for time-only representations, which might be needed when formatting times without dates.

Add this format:

   "date_time_format": {
-    "date_time": "{formatted_date}, {formatted_time}"
+    "date_time": "{formatted_date}, {formatted_time}",
+    "time_only": "{formatted_time}"
   }
ovos_date_parser/res/fr-fr/date_time_test.json (2)

2-27: Enhance year test coverage and consistency

  1. The bc flag usage is inconsistent:

    • Some cases use "None" (lines 6, 13-20)
    • Others use "False" (lines 7-8, 22-26)
      This should be standardized to avoid confusion.
  2. Consider adding these French-specific edge cases:

    • Years ending in 1 (e.g., 71 → "soixante-et-onze")
    • Years with "quatre-vingt" vs "octante" (regional variations)
    • Years requiring liaison (e.g., "deux-cent-un" vs "deux-cents")

28-38: Add test cases for French-specific date formations

Consider adding test cases for:

  1. Dates with "premier" (1st of month)
  2. Month names with articles ("en janvier" vs "le 2 janvier")
  3. Special date expressions like "le lendemain", "la veille"
  4. Dates requiring "de" (e.g., "le 2 mai de l'année 2024")
ovos_date_parser/res/fa-ir/date_time.json (2)

51-82: Consider adding support for years before Christ (BC).

The year formatting section only handles years after Christ (بعد از میلاد). Consider:

  1. Adding support for years before Christ
  2. Adding validation for maximum supported year values
  3. Making the era suffix optional

Consider extending the year format section:

"year_format": {
  // ... existing formats ...
  "bc": "بعد از میلاد",
  "bce": "قبل از میلاد",
  "max_year": 9999,
  "use_era": true
}

83-93: Consider adding more date-time format options.

The date-time formatting section currently offers only one format. Consider adding:

  1. Multiple format options (short, medium, long)
  2. Support for different time formats (12/24 hour)
  3. Options for timezone handling

Example additional formats:

"date_time_format": {
  "short": "{formatted_date} {formatted_time}",
  "medium": "{formatted_date} ساعت {formatted_time}",
  "long": "{formatted_date} ساعت {formatted_time} {timezone}",
  "time_24h": true,
  "show_timezone": false
}
ovos_date_parser/res/ru-ru/date_time_test.json (2)

2-27: Add missing edge cases for year formatting.

Consider adding test cases for:

  1. Year 0 (important edge case in calendar systems)
  2. Negative years (alternative representation of BC years)
{
  "test_nice_year": {
+   "25": {"datetime_param": "0, 1, 31, 13, 22, 3", "bc": "False", "assertEqual": "ноль" },
+   "26": {"datetime_param": "-50, 1, 31, 13, 22, 3", "bc": "True", "assertEqual": "пятьдесят до нашей эры" }

28-38: Enhance date formatting test coverage.

Consider adding test cases for:

  1. All months of the year (currently only testing January and February)
  2. Month boundary conditions (e.g., January 31 → February 1)
  3. Leap year dates (February 29)

Would you like me to help generate additional test cases to improve coverage?

ovos_date_parser/res/de-de/date_time.json (2)

24-38: Improve year formatting patterns and BC notation.

  1. The BC notation "v.d.Z." (vor der Zeitrechnung) should include spaces between periods for better readability: "v. d. Z."
  2. Consider adding patterns for these cases:
    • Years between 1000-1199 (currently might fall into a generic case)
    • Special handling for round years (e.g., 2000, 3000)
   "year_format": {
     ...
-    "bc": "v.d.Z."
+    "bc": "v. d. Z."

92-105: Add abbreviated month names.

Consider adding abbreviated forms of month names (commonly used in German):

   "month": {
     "1": "Januar",
     ...
-  }
+  },
+  "month_abbrev": {
+    "1": "Jan.",
+    "2": "Feb.",
+    "3": "März",
+    "4": "Apr.",
+    "5": "Mai",
+    "6": "Juni",
+    "7": "Juli",
+    "8": "Aug.",
+    "9": "Sept.",
+    "10": "Okt.",
+    "11": "Nov.",
+    "12": "Dez."
+  }
ovos_date_parser/res/it-it/date_time.json (1)

2-9: Improve regex precision in decade_format.

The regex pattern ^[2-9][1|8] for rule 4 uses | as a literal character rather than as an OR operator. This means it would match numbers like "21" and "81" as intended, but also unintended numbers like "2|" or "91".

Consider updating the pattern to be more precise:

-    "4": {"match": "^[2-9][1|8]", "format": "{xx}"},
+    "4": {"match": "^[2-9][18]", "format": "{xx}"},
ovos_date_parser/res/uk-ua/date_time.json (3)

22-22: Remove unnecessary blank lines.

The blank lines between format groups break the consistency of the JSON structure. Consider removing them to maintain a uniform format.

Also applies to: 28-28


52-62: Consider adding more date/time format options.

The current implementation could benefit from additional format options:

  • Short date format (e.g., "DD.MM.YYYY")
  • Medium date format (e.g., "D MMM YYYY")
  • Explicit time format section for 12/24 hour time

1-150: Consider adding documentation for format placeholders.

The JSON file would benefit from comments explaining the meaning and usage of format placeholders like {x}, {x0}, {x_in_x00}, etc. This would help maintainers understand the templating system.

Consider adding a comment section at the top of the file:

{
  "_comment": {
    "x": "single digit number",
    "x0": "tens digit",
    "x_in_x00": "digit in hundreds position",
    "x_in_x000": "digit in thousands position"
  },
  "decade_format": {
    ...
ovos_date_parser/res/ru-ru/date_time.json (4)

3-6: Consider adding explicit patterns for teen numbers (11-19).

The decade formatting might benefit from an explicit pattern for teen numbers to ensure proper handling of these special cases in Russian.

   "decade_format": {
     "1": {"match": "^\\d$", "format": "{x}"},
-    "2": {"match": "^1\\d$", "format": "{xx}"},
+    "2": {"match": "^1[0-9]$", "format": "{xx}"},
     "3": {"match": "^\\d0$", "format": "{x0}"},
     "4": {"match": "^[2-9]\\d$", "format": "{x0} {x}"},

52-59: Consider adding support for relative date expressions.

While the file includes "today", "tomorrow", and "yesterday", consider adding support for other common Russian relative date expressions like "послезавтра" (day after tomorrow) and "позавчера" (day before yesterday).

   "date_format": {
     "date_full": "{weekday}, {day} {month}, {formatted_year}",
     "date_full_no_year": "{weekday}, {day} {month}",
     "date_full_no_year_month": "{weekday}, {day}",
     "today": "сегодня",
     "tomorrow": "завтра",
-    "yesterday": "вчера"
+    "yesterday": "вчера",
+    "day_after_tomorrow": "послезавтра",
+    "day_before_yesterday": "позавчера"
   },

60-62: Consider adding more time format variations.

The current date_time_format is quite basic. Consider adding support for common Russian time expressions like "утром" (in the morning), "днём" (in the afternoon), "вечером" (in the evening), etc.

   "date_time_format": {
-    "date_time": "{formatted_date} в {formatted_time}"
+    "date_time": "{formatted_date} в {formatted_time}",
+    "morning": "{formatted_date} утром",
+    "afternoon": "{formatted_date} днём",
+    "evening": "{formatted_date} вечером",
+    "night": "{formatted_date} ночью"
   },

119-148: Consider adding ordinal number forms.

While the file includes cardinal numbers, consider adding ordinal forms ("первый", "второй", etc.) which might be useful for certain date expressions.

   "number": {
     // ... existing cardinal numbers ...
+    "ordinal": {
+      "1": "первый",
+      "2": "второй",
+      "3": "третий",
+      // ... and so on
+    }
   }
ovos_date_parser/__init__.py (7)

Line range hint 282-285: Incorrect function call for Portuguese language (pt)

The extract_datetime function is incorrectly calling extract_datetime_pl (Polish) when handling Portuguese (pt). This appears to be a copy-paste error and should be corrected to call extract_datetime_pt.

Apply this diff to fix the issue:

 if lang.startswith("pl"):
     return extract_datetime_pl(text, anchorDate=anchorDate, default_time=default_time)
-if lang.startswith("pt"):
-    return extract_datetime_pl(text, anchorDate=anchorDate, default_time=default_time)
+if lang.startswith("pt"):
+    return extract_datetime_pt(text, anchorDate=anchorDate, default_time=default_time)

381-381: Remove assignment to unused variable decade

The variable decade is assigned but never used, which leads to unnecessary code. Consider removing this assignment to clean up the function.

Apply this diff to remove the unused variable:

 def _decade_format(self, number, number_tuple, lang):
-    decade = s.format(x=number_tuple.x, xx=number_tuple.xx,
-                      x0=number_tuple.x0, x_in_x0=number_tuple.x_in_x0,
-                      number=str(number % 100))
     return s.format(x=number_tuple.x, xx=number_tuple.xx,
                     x0=number_tuple.x0, x_in_x0=number_tuple.x_in_x0,
                     number=str(number % 100))
🧰 Tools
🪛 Ruff

381-381: Local variable decade is assigned to but never used

Remove assignment to unused variable decade

(F841)


391-391: Remove assignment to unused variable hundreds

The variable hundreds is assigned but not used afterward. Removing this assignment will simplify the code.

Apply this diff:

 def _number_format_hundreds(self, number, number_tuple, lang, formatted_decade):
-    hundreds = s.format(xxx=number_tuple.xxx, x00=number_tuple.x00,
-                        x_in_x00=number_tuple.x_in_x00,
-                        formatted_decade=formatted_decade,
-                        number=str(number % 1000))
     return s.format(xxx=number_tuple.xxx, x00=number_tuple.x00,
                     x_in_x00=number_tuple.x_in_x00,
                     formatted_decade=formatted_decade,
                     number=str(number % 1000))
🧰 Tools
🪛 Ruff

391-391: Local variable hundreds is assigned to but never used

Remove assignment to unused variable hundreds

(F841)


537-537: Simplify dictionary membership test

Using if key in dict is more Pythonic than if key in dict.keys(). This change enhances readability and performance.

Apply this diff:

 if full_code in date_time_format.lang_config.keys():
     localized_day_names = list(
         date_time_format.lang_config[lang]['weekday'].values())

should be:

-if full_code in date_time_format.lang_config.keys():
+if full_code in date_time_format.lang_config:
     localized_day_names = list(
         date_time_format.lang_config[full_code]['weekday'].values())
🧰 Tools
🪛 Ruff

537-537: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


550-550: Simplify dictionary membership test

Similar to the previous comment, simplify the membership test for consistency.

Apply this diff:

 if full_code in date_time_format.lang_config.keys():
     localized_month_names = date_time_format.lang_config[lang]['month']

should be:

-if full_code in date_time_format.lang_config.keys():
+if full_code in date_time_format:
     localized_month_names = date_time_format.lang_config[full_code]['month']
🧰 Tools
🪛 Ruff

550-550: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


313-320: Use os.path.join for cross-platform file paths

To improve code portability across different operating systems, use os.path.join to construct file paths instead of string concatenation.

Apply this diff:

 try:
     # Attempt to load the language-specific formatting data
-    with open(self.config_path + '/' + lang + '/date_time.json',
+    with open(os.path.join(self.config_path, lang, 'date_time.json'),
               'r', encoding='utf8') as lang_config_file:
         self.lang_config[lang] = json.loads(
             lang_config_file.read())
 except FileNotFoundError:
     # Fallback to English formatting
-    with open(self.config_path + '/en-us/date_time.json',
+    with open(os.path.join(self.config_path, 'en-us', 'date_time.json'),
               'r', encoding='utf8') as lang_config_file:
         self.lang_config[lang] = json.loads(
             lang_config_file.read())

309-309: Address the TODO: Implement closest language code fallback

A TODO comment indicates the need to find the closest matching language code when the exact code is not available. Implementing this feature would enhance language support and user experience.

Would you like assistance in implementing the logic to find and use the closest matching language code? I can help draft a solution or open a GitHub issue to track this enhancement.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8c7f8b9 and fac0a3f.

📒 Files selected for processing (33)
  • ovos_date_parser/__init__.py (3 hunks)
  • ovos_date_parser/res/az-az/date_time.json (1 hunks)
  • ovos_date_parser/res/az-az/date_time_test.json (1 hunks)
  • ovos_date_parser/res/ca-es/date_time.json (1 hunks)
  • ovos_date_parser/res/ca-es/date_time_test.json (1 hunks)
  • ovos_date_parser/res/cs-cz/date_time.json (1 hunks)
  • ovos_date_parser/res/cs-cz/date_time_test.json (1 hunks)
  • ovos_date_parser/res/da-dk/date_time.json (1 hunks)
  • ovos_date_parser/res/da-dk/date_time_test.json (1 hunks)
  • ovos_date_parser/res/de-de/date_time.json (1 hunks)
  • ovos_date_parser/res/de-de/date_time_test.json (1 hunks)
  • ovos_date_parser/res/en-us/date_time.json (1 hunks)
  • ovos_date_parser/res/en-us/date_time_test.json (1 hunks)
  • ovos_date_parser/res/eu-eu/date_time.json (1 hunks)
  • ovos_date_parser/res/fa-ir/date_time.json (1 hunks)
  • ovos_date_parser/res/fa-ir/date_time_test.json (1 hunks)
  • ovos_date_parser/res/fr-fr/date_time.json (1 hunks)
  • ovos_date_parser/res/fr-fr/date_time_test.json (1 hunks)
  • ovos_date_parser/res/hu-hu/date_time.json (1 hunks)
  • ovos_date_parser/res/hu-hu/date_time_test.json (1 hunks)
  • ovos_date_parser/res/it-it/date_time.json (1 hunks)
  • ovos_date_parser/res/it-it/date_time_test.json (1 hunks)
  • ovos_date_parser/res/nl-nl/date_time.json (1 hunks)
  • ovos_date_parser/res/nl-nl/date_time_test.json (1 hunks)
  • ovos_date_parser/res/pl-pl/date_time.json (1 hunks)
  • ovos_date_parser/res/ru-ru/date_time.json (1 hunks)
  • ovos_date_parser/res/ru-ru/date_time_test.json (1 hunks)
  • ovos_date_parser/res/sl-si/date_time.json (1 hunks)
  • ovos_date_parser/res/sl-si/date_time_test.json (1 hunks)
  • ovos_date_parser/res/sv-se/date_time.json (1 hunks)
  • ovos_date_parser/res/sv-se/date_time_test.json (1 hunks)
  • ovos_date_parser/res/uk-ua/date_time.json (1 hunks)
  • ovos_date_parser/res/uk-ua/date_time_test.json (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • ovos_date_parser/res/en-us/date_time.json
🧰 Additional context used
🪛 Ruff
ovos_date_parser/__init__.py

381-381: Local variable decade is assigned to but never used

Remove assignment to unused variable decade

(F841)


391-391: Local variable hundreds is assigned to but never used

Remove assignment to unused variable hundreds

(F841)


537-537: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


550-550: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (56)
ovos_date_parser/res/eu-eu/date_time.json (6)

2-10: LGTM: Basic number formats are correctly defined.

The decade, hundreds, and thousand formats use appropriate templating.


11-14: LGTM: Year format correctly uses Basque BC notation.

The year format appropriately uses "k.a." (Kristo aurretik) for BC dates.


15-22: LGTM: Date formats follow Basque grammar rules.

The date formats correctly implement Basque case suffixes (-ko, -ren) and include proper translations for relative dates.


23-25: LGTM: Date-time format uses correct Basque case suffix.

The format correctly implements the Basque inessive case (-tan) for time expressions.


26-34: LGTM: Weekday names are correctly defined.

The weekday names are properly spelled in Basque and follow the correct Monday-first ordering.


82-111: LGTM: Number definitions are comprehensive and correct.

The number definitions follow proper Basque number formation rules and provide good coverage for date/time formatting needs.

ovos_date_parser/res/da-dk/date_time_test.json (1)

1-32: LGTM: Well-structured test cases

The test file is well-organized with clear sections for different formatting scenarios and consistent test case structure.

ovos_date_parser/res/fa-ir/date_time_test.json (1)

1-36: LGTM! Well-structured test cases.

The test file is well-organized with clear separation between year, date, and date-time test cases. The JSON structure is consistent and follows good practices.

ovos_date_parser/res/sl-si/date_time.json (3)

2-25: Verify the correctness of number formatting rules.

The number formatting rules look well-structured, but there might be an edge case in the hundreds format that needs verification:

  • Numbers 300-900 use the format "{x_in_x00}sto" which could produce incorrect grammar for some numbers in Slovenian.

Could you verify if "tristo", "štiristo", etc. are the correct forms in Slovenian? Some Slavic languages have special forms for hundreds.


26-36: LGTM! Date and time formats are correctly defined.

The date and time formats follow proper Slovenian grammar:

  • Relative dates use correct terms (danes, jutri, včeraj)
  • Time specification correctly uses "ob" preposition
  • All format variations are properly covered

37-122: LGTM! Localized names are comprehensive and follow Slovenian conventions.

The localization data is well-structured and follows Slovenian language conventions:

  • Weekdays start from Monday (index 0)
  • Months are correctly in lowercase
  • Number words use proper Slovenian forms
ovos_date_parser/res/ca-es/date_time.json (5)

2-23: LGTM! Number formatting patterns are linguistically accurate.

The decade, hundreds, and thousand formatting patterns correctly follow Catalan number formation rules, including proper hyphenation and special cases.


24-40: LGTM! Date formatting follows Catalan conventions.

The date formats correctly use:

  • "de" as the separator between day and month
  • Proper BC year notation ("a.C.")
  • Correct temporal words (avui, demà, ahir)

53-129: LGTM! Translations are linguistically accurate.

The translations for dates, months, and numbers correctly follow Catalan language rules, including:

  • Proper hyphenation in compound numbers
  • Correct special forms (e.g., "vint-i-u" for 21)
  • Accurate month names

50-50: ⚠️ Potential issue

Fix typo in weekday name.

The word "dissabte" (Saturday) is misspelled.

Apply this fix:

-    "5": "dissabte",
+    "5": "dissabte",

Likely invalid or redundant comment.


1-130: Verify JSON structure compatibility with DateTimeFormat class.

The overall structure looks correct, but let's verify it matches the expected format used by the DateTimeFormat class.

✅ Verification successful

Missing time_format field in Catalan date/time configuration

The structure comparison reveals that the Catalan (ca-es) configuration is missing the time_format field that exists in other language files (e.g., nl-nl). Let's verify the exact format needed by checking other language files.


JSON structure is valid despite time_format difference

The verification shows that the time_format field is optional, as it's only present in Dutch (nl-nl) locale and absent in all other language files including English (en-us). Since all required fields are present and validated successfully, the Catalan date/time configuration structure is correct and consistent with most other language implementations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify JSON structure matches other language files
# and check for required fields used by DateTimeFormat class

# Test 1: Compare structure with other language files
echo "Comparing structure with other language files..."
for file in ovos_date_parser/res/*/date_time.json; do
  if [ "$file" != "ovos_date_parser/res/ca-es/date_time.json" ]; then
    echo "Checking against $file..."
    # Extract top-level keys for comparison
    jq -r 'keys | sort | .[]' "$file" > /tmp/other_keys
    jq -r 'keys | sort | .[]' ovos_date_parser/res/ca-es/date_time.json > /tmp/ca_keys
    diff /tmp/ca_keys /tmp/other_keys
  fi
done

# Test 2: Validate required fields are present
echo "Checking for required fields..."
jq -e '.decade_format and .year_format and .date_format and .date_time_format and .weekday and .month and .number' ovos_date_parser/res/ca-es/date_time.json

Length of output: 7872


Script:

#!/bin/bash
# Check time_format structure in other language files
echo "Checking time_format structure in nl-nl..."
jq '.time_format' ovos_date_parser/res/nl-nl/date_time.json

echo -e "\nChecking if time_format is consistently present in other locales..."
for file in ovos_date_parser/res/*/date_time.json; do
  locale=$(basename $(dirname "$file"))
  echo -n "$locale: "
  jq 'has("time_format")' "$file"
done

Length of output: 4784

ovos_date_parser/res/da-dk/date_time.json (1)

3-3: Ensure consistent number formatting across sections.

There's an inconsistency in how the number "1" is formatted:

  • decade_format uses "et"
  • number section uses "en"

Please verify which form should be used in each context.

Also applies to: 100-100

ovos_date_parser/res/sv-se/date_time.json (3)

21-42: LGTM! Comprehensive date formatting rules

The year and date formatting rules are well-structured and follow Swedish conventions correctly. The handling of BC years ("före kristus"), relative dates (idag/imorgon/igår), and date-time combinations are all accurate.


71-71: ⚠️ Potential issue

Fix spelling: "artonde" should be "adertonde"

There's a spelling error in the Swedish word for "eighteenth".

-    "18": "artonde",
+    "18": "adertonde",

Likely invalid or redundant comment.


1-129: Verify similar patterns across other language files

Let's ensure consistent structure across all language files while respecting language-specific formatting rules.

✅ Verification successful

Let me gather more information about the structure variations and potential inconsistencies.


Swedish date/time format structure is consistent with related languages

The verification shows that:

  1. The Swedish (sv-se) decade format follows the same pattern as English (en-us), which is appropriate given both use space-separated numbers (e.g., "twenty five")
  2. The structure is complete with all required top-level keys
  3. The only inconsistency found is that Dutch (nl-nl) has an extra time_format key not present in other files, but this doesn't affect the Swedish implementation

The current Swedish implementation is correct and follows established patterns for Germanic languages while maintaining its language-specific characteristics.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for consistent structure across language files and identify potential issues

# Test: Check if all language files have the same top-level keys
echo "Checking structure consistency across language files..."
fd -e json -p "res/.*/date_time.json" --exec jq -r 'keys | sort | .[]' {} \; | sort | uniq -c

# Test: List all decade_format patterns to compare across languages
echo "Comparing decade_format patterns..."
fd -e json -p "res/.*/date_time.json" --exec sh -c 'echo "=== $(dirname {}) ===" && jq -r ".decade_format | keys | sort | .[]" {}' \;

Length of output: 1555


Script:

#!/bin/bash
# Check which file has the time_format key that others don't have
echo "=== Checking which file has time_format ==="
fd -e json -p "res/.*/date_time.json" --exec sh -c 'if jq -e "select(.time_format)" {} > /dev/null; then echo "{}"; fi'

# Check decade_format patterns in detail for files with 5 rules
echo "=== Checking decade_format patterns for files with 5 rules ==="
fd -e json -p "res/.*/date_time.json" --exec sh -c 'if [ $(jq ".decade_format | length" {}) -eq 6 ]; then echo "=== {} ==="; jq ".decade_format" {}; fi'

# Verify if sv-se decade_format patterns are consistent with other Germanic languages
echo "=== Comparing sv-se with other Germanic languages (de-de, en-us, nl-nl, da-dk) ==="
for file in $(fd -e json -p "res/(sv-se|de-de|en-us|nl-nl|da-dk)/date_time.json"); do
    echo "=== $file ==="
    jq '.decade_format' "$file"
done

Length of output: 5732

ovos_date_parser/res/az-az/date_time.json (5)

44-52: LGTM! Weekday definitions are complete and correct.

The weekday definitions follow the standard 0-6 numbering (Monday to Sunday) with proper Azerbaijani translations.


53-85: LGTM! Date definitions are complete and properly formatted.

All dates (1-31) are covered with correct Azerbaijani ordinal numbers.


86-99: LGTM! Month definitions use correct grammatical case.

All months (1-12) are covered with proper Azerbaijani names in the genitive case (ending with -ın/-in), which is correct for date expressions in Azerbaijani.


3-7: Verify regex patterns in decade_format section

The patterns look correct, but let's verify the coverage:

  • ^\\d$: Single digits (0-9)
  • ^1\\d$: Teens (10-19)
  • ^\\d0$: Multiples of 10 (10,20,..90)
  • ^[2-9]\\d$: Two-digit numbers (20-99)

15-20: Verify thousand_format patterns for completeness

The patterns for thousands need verification to ensure all cases are covered:

  • ^10\\d{2}$: 1000-1099
  • ^[2-9]0\\d{2}$: 2000-9099
  • ^11\\d{2}$: 1100-1199
  • ^1\\d{3}$: 1000-1999
  • ^[2-9]1\\d{2}$: 2100-9199
  • ^[2-9]\\d{3}$: 2000-9999
ovos_date_parser/res/cs-cz/date_time.json (2)

32-42: LGTM! Date and time formats are well structured.

The date and time formats are properly defined with correct Czech translations for relative dates (today, tomorrow, yesterday).


3-3: Consider expanding the single digit match pattern.

The pattern ^\\d$ only matches single digits (0-9). Consider if you need to handle leading zeros or multi-digit numbers in this context.

✅ Verification successful

The single digit pattern ^\\d$ is correct and consistent with other locales

Looking at the patterns across different language files, the pattern ^\\d$ for matching single digits is consistently used in many locales including en-us, fr-fr, it-it, and others. This pattern is appropriate for its context where it's used to format individual digits, while other patterns in the file handle multi-digit numbers separately.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if other language files have different patterns for single digits
rg -U "\"1\".*\"match\".*\"\\^\\\\d\\$\"" ovos_date_parser/res/*/date_time.json

Length of output: 747


Script:

#!/bin/bash
# Let's check the actual patterns used for "1" across different language files
rg --json -U "\"1\".*\"match\".*\"format\"" ovos_date_parser/res/*/date_time.json | jq -r '.data.lines.text'

Length of output: 4381

ovos_date_parser/res/pl-pl/date_time.json (1)

1-129: Overall structure looks good with comprehensive Polish localization.

The file provides complete coverage for date and time formatting in Polish, including:

  • Number formatting patterns
  • Year and date formats
  • Weekdays, months, and numbers
  • Special cases for today/tomorrow/yesterday

Apart from the noted language mixing issues, the structure and content are well-organized and appropriate for Polish localization.

ovos_date_parser/res/cs-cz/date_time_test.json (3)

1-43: LGTM! Well-structured test data organization.

The JSON structure is clean, well-organized, and follows a consistent format across all test sections.


2-27: Consider enhancing year test coverage and standardizing BC flag.

  1. The bc flag usage is inconsistent:

    • Some cases use explicit "True"/"False" (lines 3-4, 22-26)
    • Others use "None" (lines 6-8, 12-13, etc.)
  2. Consider adding test cases for:

    • Year 0 (edge case between BC and AD)
    • Negative years
    • Very large years (>9999)

1-43: Verify Czech language correctness with a native speaker.

While the technical structure is sound, please ensure that:

  1. All Czech translations are grammatically correct
  2. Number-to-word conversions follow proper Czech rules
  3. Date/time formats align with Czech conventions
ovos_date_parser/res/hu-hu/date_time_test.json (2)

1-43: LGTM: Well-structured test data organization

The JSON structure is clean, well-organized, and follows a consistent pattern for test cases.


2-27: Consider adding test case for year 0

The test cases for test_nice_year provide good coverage including BC/AD years, special numbers (1000, 2000), and various formats. However, consider adding a test case for year 0 to ensure proper handling of this edge case.

Let's verify if year 0 is handled elsewhere in the codebase:

ovos_date_parser/res/sv-se/date_time_test.json (2)

1-43: LGTM: Well-structured test data organization

The JSON structure is clean and well-organized into logical test sections.


28-38: Verify Swedish weekday and month translations

Please verify that the Swedish translations are correct:

  • "tisdag" for Tuesday
  • "söndag" for Sunday
  • "januari" for January
  • "februari" for February

Also consider adding test cases for:

  • Special dates (holidays, etc.)
  • Last day of month edge cases
  • Leap year dates (February 29)
✅ Verification successful

Let me search for additional Swedish locale files to get more context about the translations.


Swedish translations are correct and well-structured

The translations in the test file are verified to be correct based on the Swedish locale file (date_time.json) which shows:

  • Correct weekday translations: "tisdag" (Tuesday) and "söndag" (Sunday) match the weekday mapping (indexes 1 and 6)
  • Correct month translations: "januari" (January) and "februari" (February) match the month mapping (indexes 1 and 2)

The test cases provide good coverage including:

  • Relative dates (idag/imorgon/igår)
  • Different date formats with/without year
  • Various date contexts

While adding test cases for special dates would be beneficial, the current translations and test structure are accurate and follow Swedish language conventions.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other Swedish locale files in the codebase to cross-reference translations
rg -g "*.json" -g "*.po" -g "*.mo" "januari|februari|tisdag|söndag" 

Length of output: 3642


Script:

#!/bin/bash
# Search for Swedish locale files and their content
fd -e json -e po -e mo . ovos_date_parser/res/sv-se/ -x cat {}

Length of output: 7674

ovos_date_parser/res/sl-si/date_time_test.json (2)

1-43: Well-structured test suite organization!

The test cases are logically organized into distinct sections with clear purposes, following good testing practices.


1-43: Verify Slovenian language translations.

Please have a native Slovenian speaker verify the correctness of:

  • Numerical expressions (e.g., "dvaindevetdeset")
  • Date formats
  • Time expressions
  • AM/PM terminology
  • Relative date terms (today, tomorrow, yesterday)

This ensures linguistic accuracy and natural expression in Slovenian.

ovos_date_parser/res/nl-nl/date_time.json (2)

2-23: LGTM: Number formatting sections are well-structured

The decade, hundreds, and thousand formatting rules follow Dutch number formatting conventions correctly. The use of "en" as a conjunction and the placement of "honderd"/"duizend" are accurate.


1-136: Verify test coverage for all formatting patterns

Please ensure that the corresponding test file (date_time_test.json) includes test cases for all formatting patterns defined in this file, particularly:

  • Compound numbers
  • Various time expressions
  • Date formats without commas
✅ Verification successful

Test coverage is comprehensive for Dutch date/time formatting

The test file date_time_test.json includes thorough test cases covering all formatting patterns defined in date_time.json:

  • Year formatting: Single digits, double digits, hundreds, thousands, BC dates
  • Date formatting: Full dates with weekday, month, year, as well as relative dates (today, tomorrow, yesterday)
  • Time formatting: Both 12-hour and 24-hour formats
  • Compound numbers and various number formats
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if test file exists and contains necessary test cases
rg -A 5 "test_nice_date|test_nice_time|test_nice_number" "ovos_date_parser/res/**/date_time_test.json"

Length of output: 188


Script:

#!/bin/bash
# Let's first check if there are any test files in the repository
fd "test.*\.json" "ovos_date_parser"

# Also check for any Python test files that might contain date formatting tests
rg -l "test.*date|test.*time" "ovos_date_parser"

Length of output: 1727


Script:

#!/bin/bash
# Let's examine the Dutch test file content to verify test coverage
cat "ovos_date_parser/res/nl-nl/date_time_test.json"

Length of output: 4281

ovos_date_parser/res/hu-hu/date_time.json (2)

21-34: LGTM! Comprehensive year formatting rules.

The year formatting rules are well-structured and cover all possible cases, including:

  • Two-digit years
  • Full years with thousands
  • Special cases for century boundaries
  • Correct BC notation ("kr.e.")

46-131: LGTM! Complete and accurate Hungarian localization data.

The localization data is comprehensive and linguistically accurate:

  • Correct weekday names
  • Proper ordinal forms for dates
  • Complete month names
  • Accurate number system including special forms
ovos_date_parser/res/fr-fr/date_time_test.json (1)

1-43: LGTM: Well-structured test suite organization

The test suite is well-organized into logical categories with clear test case identifiers.

ovos_date_parser/res/fa-ir/date_time.json (1)

94-102: Consider aligning weekday indices with ISO-8601.

The current weekday indexing (0-6 starting with Monday) differs from ISO-8601 standard (1-7 with Monday as 1). This might cause compatibility issues with other systems.

ovos_date_parser/res/ru-ru/date_time_test.json (1)

1-43: Overall structure and formatting looks good!

The JSON file is well-structured with:

  • Clear organization of test cases by type
  • Consistent formatting
  • Well-defined input parameters and expected outputs
ovos_date_parser/res/de-de/date_time_test.json (3)

1-43: LGTM! Well-structured test data organization.

The JSON structure is clean, consistent, and well-organized with clear separation between different test categories.


1-43: Verify German language formatting rules.

Let's verify the correctness of German-specific formatting:

  • Use of "v.d.Z." for BC years is correct
  • Ordinal number formatting (e.g., "einunddreißigster")
  • Time expressions ("uhr", "nachmittags")
  • Day/month capitalization
✅ Verification successful

Let me generate additional verification scripts to check for German number formatting and time-related patterns:


German language formatting is correctly implemented

The verification confirms proper implementation of German language specifics:

  • Correct usage of "v.d.Z." (vor der Zeitrechnung) for BC years
  • Proper German number formatting including compound numbers (e.g., "einunddreißigster", "zwei und neunzig")
  • Correct capitalization of months (Januar, Februar, etc.) and days (Montag, Dienstag, etc.)
  • Appropriate time formatting with "uhr" and "nachmittags"
  • Consistent ordinal number formatting in dates
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for consistency in German formatting across other test files
rg -U "v\.d\.Z\." ovos_date_parser/res/de-de/
rg -U "nachmittags" ovos_date_parser/res/de-de/
# Check if similar patterns exist in other German resource files
fd -e json . ovos_date_parser/res/de-de/ --exec rg -U "einunddreißigster|nachmittags|v\.d\.Z\." {}

Length of output: 2551


Script:

#!/bin/bash
# Check for German number formatting patterns
rg -U "dreißig|vierzig|fünfzig|sechzig|siebzig|achtzig|neunzig" ovos_date_parser/res/de-de/

# Check for time-related patterns
rg -U "uhr|vormittags|nachmittags|morgens|abends" ovos_date_parser/res/de-de/

# Check for day/month capitalization
rg -U "Januar|Februar|März|April|Mai|Juni|Juli|August|September|Oktober|November|Dezember" ovos_date_parser/res/de-de/
rg -U "Montag|Dienstag|Mittwoch|Donnerstag|Freitag|Samstag|Sonntag" ovos_date_parser/res/de-de/

Length of output: 7497


2-27: Consider adding more edge cases to year tests.

While the current test cases are comprehensive, consider adding these scenarios to improve coverage:

  • Year 0 handling (special case in many date systems)
  • Negative years
  • Maximum supported year validation
  • Additional number combinations (e.g., years ending in 20, 30, etc.)
ovos_date_parser/res/fr-fr/date_time.json (2)

25-42: LGTM! Date formatting sections are well-structured.

The year, date, and date-time format sections correctly implement French conventions for date representation, including BC notation and relative date terms.


124-145: Excellent implementation of French number system edge cases!

The implementation correctly handles the complex French number system, particularly:

  • 70s using the "soixante-dix" pattern
  • 80s using the "quatre-vingt" pattern
  • 90s using the "quatre-vingt-dix" pattern

All special cases are properly formatted with correct hyphenation.

ovos_date_parser/res/de-de/date_time.json (2)

2-23: LGTM! Number formatting patterns are correct.

The decade, hundreds, and thousand formatting patterns accurately reflect German number formatting rules, including:

  • Correct use of "und" conjunction for decades
  • Proper word order for compound numbers
  • Appropriate handling of special cases

59-91: LGTM! Ordinal numbers are correctly formatted.

The ordinal numbers for days of the month are properly formatted with the "-ter" suffix following German grammar rules.

ovos_date_parser/res/it-it/date_time.json (4)

37-47: LGTM! Date and time formats follow Italian conventions.

The date and time formats are correctly structured and include proper Italian translations for relative dates (oggi, domani, ieri).


48-56: LGTM! Weekday names are correct.

The Italian weekday names are properly formatted with correct accents and follow the expected 0-based indexing (lunedì = 0).


90-103: LGTM! Month names are correct.

The Italian month names are properly formatted and follow the expected 1-based indexing (gennaio = 1).


57-89: Verify the use of "primo" vs "uno" for consistency.

The date section uses "primo" for the 1st day, while the number section uses "uno". This is actually correct in Italian (1st of the month is "primo"), but we should verify this difference is handled properly in the code.

ovos_date_parser/res/ru-ru/date_time.json (1)

1-149: LGTM! Comprehensive and linguistically accurate Russian localization.

The file provides a complete and well-structured Russian localization for dates and times, with proper handling of grammatical cases and number forms.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
README.md (2)

77-78: Enhance the language support introduction.

Consider expanding this introduction to better explain the distinction between "Parse" and "Format" capabilities and what each category means for users.

 time expressions.

+The language support is divided into two main categories:
+- **Parse**: Capabilities for extracting date/time information from text
+- **Format**: Capabilities for converting date/time objects into human-readable text

Line range hint 15-76: Add documentation for new DateTimeFormat class.

The usage examples section could be enhanced by including documentation for the new DateTimeFormat class mentioned in the AI summary. This would help users understand how to use the centralized date and time formatting functionality.

Consider adding a new section like this:

 print(relative_time)  # "tomorrow"
+
+### Using DateTimeFormat
+
+The `DateTimeFormat` class provides centralized date and time formatting with language-specific configurations:
+
+```python
+from ovos_date_parser import DateTimeFormat
+
+# Initialize formatter with language
+formatter = DateTimeFormat(lang="en")
+
+# Format a date
+formatted_date = formatter.format_date(some_date)
+```
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between fac0a3f and 65fb396.

📒 Files selected for processing (1)
  • README.md (2 hunks)
🔇 Additional comments (1)
README.md (1)

81-101: Verify parse capabilities against implementation.

The table clearly presents the parsing capabilities for each language. However, let's verify that these capabilities match the actual implementation.

✅ Verification successful

Parse capabilities table accurately reflects implementation status

The verification confirms that the table in README.md correctly represents the implementation status:

  • Azerbaijani (az): ✅ Both methods implemented (extract_duration_az, extract_datetime_az)
  • Hungarian (hu): ❌ No implementations found, matching table
  • French (fr): ❌ for duration (not found), ✅ for datetime (extract_datetime_fr), matching table

The base implementation in __init__.py provides the main interface functions, while language-specific implementations are in their respective files, consistent with the documented capabilities.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify language support in implementation matches documentation
# Look for language-specific parsing implementations

# Check for language files
echo "Checking for language configuration files..."
fd -e json . "ovos_date_parser/lang"

# Check for extract_duration implementations
echo -e "\nChecking for extract_duration implementations..."
rg -l "extract_duration.*lang.*=.*[\"'](az|cs|de|en|es|fa|nl|pl|pt|ru|sv|uk)[\"']"

# Check for extract_datetime implementations
echo -e "\nChecking for extract_datetime implementations..."
rg -l "extract_datetime.*lang.*=.*[\"'](az|ca|cs|da|de|en|es|eu|fa|fr|it|nl|pl|pt|ru|sv|uk)[\"']"

Length of output: 683


Script:

#!/bin/bash
# Let's try a different approach to find the implementation files

# Look for Python files that might contain the implementations
echo "Looking for Python implementation files..."
fd -e py

# Search for language-specific patterns in Python files
echo -e "\nSearching for language support patterns..."
rg -A 3 "lang.*=.*[\"'](az|cs|de|en|es|fa|nl|pl|pt|ru|sv|uk)[\"']" --type py

# Search for class/function definitions related to parsing
echo -e "\nSearching for parser implementations..."
ast-grep --pattern 'class $_ {
  $$$
  extract_$_(self, $_) {
    $$$
  }
}'

Length of output: 1051


Script:

#!/bin/bash
# Let's check the content of a few language files to verify implementations

# Check for extract_duration and extract_datetime methods in files
echo "Checking az implementation (marked as ✅ for both)..."
rg "def extract_(duration|datetime)" ovos_date_parser/dates_az.py

echo -e "\nChecking hu implementation (marked as ❌ for both)..."
rg "def extract_(duration|datetime)" ovos_date_parser/dates_hu.py

echo -e "\nChecking fr implementation (marked as ❌ for duration, ✅ for datetime)..."
rg "def extract_(duration|datetime)" ovos_date_parser/dates_fr.py

# Let's also check base class implementation
echo -e "\nChecking base implementation in __init__.py..."
rg "def extract_(duration|datetime)" ovos_date_parser/__init__.py

Length of output: 986

@JarbasAl JarbasAl merged commit 3dc6ed2 into dev Nov 7, 2024
@JarbasAl JarbasAl deleted the nice branch February 27, 2025 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant