Skip to content

Regression in Regex construction (and thereby DateFormat) performance on 1.7 #43610

@Sacha0

Description

@Sacha0
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.5 (2021-12-19)
 _/ |\__'_|_|_|\__'_|  |
|__/                   |

julia> using Dates

julia> using BenchmarkTools

julia> letters = String(collect(keys(Dates.CONVERSION_SPECIFIERS)))
"MIHdEepmSUYyus"

julia> pattern = "(?<!\\\\)([\\Q$letters\\E])\\1*"
"(?<!\\\\)([\\QMIHdEepmSUYyus\\E])\\1*"

julia> @benchmark Regex($pattern)
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range (min … max):  4.743 μs … 42.846 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.925 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.273 μs ±  1.469 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█▅▃▃▁▄▆▅▂▁                                                ▂
  ███████████▇▆▆▆▅▄▅▆▆▄▅▄▃▃▄▃▄▃▄▅▃▃▅▃▃▄▅▄▄▃▄▁▄▄▄▄▄▃▄▅▅▅▅▅▆▇▆ █
  4.74 μs      Histogram: log(frequency) by time     11.4 μs <

 Memory estimate: 32 bytes, allocs estimate: 1.

while

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.7.0 (2021-11-30)
 _/ |\__'_|_|_|\__'_|  |
|__/                   |

julia> using Dates

julia> using BenchmarkTools

julia> letters = String(collect(keys(Dates.CONVERSION_SPECIFIERS)))
"MIHdEepmSUYyus"

julia> pattern = "(?<!\\\\)([\\Q$letters\\E])\\1*"
"(?<!\\\\)([\\QMIHdEepmSUYyus\\E])\\1*"

julia> @benchmark Regex($pattern)
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):   5.517 μs … 202.124 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     16.872 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.625 μs ±  13.905 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁   █▆▃▁▂   ▄▅
  █▇▃▅██████▂▄██▇▆▄▃▅▃▄▃▅▄▅▆▄▃▂▂▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁ ▃
  5.52 μs         Histogram: frequency by time         57.9 μs <

 Memory estimate: 32 bytes, allocs estimate: 1.

The particular Regex comes from the body of DateFormat(f::AbstractString, locale::DateLocale=ENGLISH)

letters = String(collect(keys(CONVERSION_SPECIFIERS)))
for m in eachmatch(Regex("(?<!\\\\)([\\Q$letters\\E])\\1*"), f)
through with the regression was observed.

Profiling a function that runs the Regex constructor call above many times in a loop reveals that expansion of sljit_malloc_exec accounts for the difference in time. The first profile below is on 1.6, and the second on 1.7. (Note that for the 1.7 profile, the constructor was called about a quarter the number of times as for the 1.6 profile, yielding a similar number of samples.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    datesDates, times, and the Dates stdlib moduleregressionRegression in behavior compared to a previous version

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions