Skip to content

Lift expensive Regex construction from DateFormat method body. #43647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 6, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 31 additions & 2 deletions stdlib/Dates/src/io.jl
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,23 @@ const CONVERSION_TRANSLATIONS = IdDict{Type, Any}(
Time => (Hour, Minute, Second, Millisecond, Microsecond, Nanosecond, AMPM),
)

# The `DateFormat(format, locale)` method just below consumes the following Regex.
# Constructing this Regex is fairly expensive; doing so in the method itself can
# consume half or better of `DateFormat(format, locale)`'s runtime. So instead we
# construct and cache it outside the method body. Note, however, that when
# `keys(CONVERSION_SPECIFIERS)` changes, the cached Regex must be updated accordingly;
# hence the mutability (Ref-ness) of the cache, the helper method with which to populate
# the cache, the cache of the hash of `keys(CONVERSION_SPECIFIERS)` (to facilitate checking
# for changes), and the lock (to maintain consistency of these objects across threads when
# threads simultaneously modify `CONVERSION_SPECIFIERS` and construct `DateFormat`s).
function compute_dateformat_regex(conversion_specifiers)
letters = String(collect(keys(conversion_specifiers)))
return Regex("(?<!\\\\)([\\Q$letters\\E])\\1*")
end
const DATEFORMAT_REGEX_LOCK = ReentrantLock()
const DATEFORMAT_REGEX_HASH = Ref(hash(keys(CONVERSION_SPECIFIERS)))
const DATEFORMAT_REGEX_CACHE = Ref(compute_dateformat_regex(CONVERSION_SPECIFIERS))

"""
DateFormat(format::AbstractString, locale="english") -> DateFormat

Expand Down Expand Up @@ -379,8 +396,20 @@ function DateFormat(f::AbstractString, locale::DateLocale=ENGLISH)
prev = ()
prev_offset = 1

letters = String(collect(keys(CONVERSION_SPECIFIERS)))
for m in eachmatch(Regex("(?<!\\\\)([\\Q$letters\\E])\\1*"), f)
# To understand this block, please see the comments attached to the definitions of
# DATEFORMAT_REGEX_LOCK, DATEFORMAT_REGEX_HASH, and DATEFORMAT_REGEX_CACHE.
lock(DATEFORMAT_REGEX_LOCK)
try
dateformat_regex_hash = hash(keys(CONVERSION_SPECIFIERS))
if dateformat_regex_hash != DATEFORMAT_REGEX_HASH[]
DATEFORMAT_REGEX_HASH[] = dateformat_regex_hash
DATEFORMAT_REGEX_CACHE[] = compute_dateformat_regex(CONVERSION_SPECIFIERS)
end
finally
unlock(DATEFORMAT_REGEX_LOCK)
end

for m in eachmatch(DATEFORMAT_REGEX_CACHE[], f)
tran = replace(f[prev_offset:prevind(f, m.offset)], r"\\(.)" => s"\1")

if !isempty(prev)
Expand Down