-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Support prefix argument in mktemp() functions #22998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ensuring there are no conflicts and code is latest.
1. tempname(), mktemp(), mktempdir() can now have a keyword prefix. 2. Default value of the prefix is "jl_" for both windows and non-windows platforms
Making the platform code file separator independent.
The failure in distributed.jl does not seem to be related to the check-in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have a few more comments.
base/file.jl
Outdated
const temp_prefix = cwstring("jl_") | ||
function tempname(temppath::AbstractString,uunique::UInt32) | ||
|
||
tempname(uunique::UInt32=UInt32(0) ; prefix = temp_prefix) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below: no space before ;
, and no spaces around =
in signatures (for consistency).
base/file.jl
Outdated
@@ -302,9 +311,10 @@ end | |||
|
|||
else # !windows | |||
# Obtain a temporary filename. | |||
function tempname() | |||
|
|||
function tempname(;prefix = temp_prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space after ;
. Same below.
base/file.jl
Outdated
tempname(;prefix="jl_") | ||
|
||
Generate a unique temporary file path. A `prefix` can be provided in addition, which | ||
otherwise defaults to `jl_`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to repeat the default. OTOH, you can mention the truncation directly in that sentence.
base/file.jl
Outdated
@@ -341,37 +353,47 @@ Obtain the path of a temporary directory (possibly shared with other processes). | |||
tempdir() | |||
|
|||
""" | |||
tempname() | |||
tempname(;prefix="jl_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can replace "jl_"
with $temp_prefix
to make sure it's in sync with the corresponding constant.
test/file.jl
Outdated
|
||
function test_22922() | ||
mktempdir() do tmpdir | ||
(path, dirname) = splitdir(tmpdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use basename
to get the directory name (but don't call it dirname
since that's the name of a function returning the equivalent of path
).
test/file.jl
Outdated
(path,filename) = splitdir(tmp) | ||
@test filename[1:3] == "012" | ||
if !Sys.iswindows() | ||
#Unix like OS have prefix size limit of 5 chars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this limit mentioned in POSIX?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http://pubs.opengroup.org/onlinepubs/9699919799/functions/tempnam.html
Many applications prefer their temporary files to have certain initial letter sequences in their names. The pfx argument should be used for this. This argument may be a null pointer or point to a string of up to five bytes to be used as the beginning of the filename.
test/file.jl
Outdated
(path, dirname) = splitdir(tmpdir) | ||
@test dirname[1:3] == "jl_" | ||
end | ||
mktempdir(;prefix="ABC") do tmpdir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to test non-ASCII characters too.
base/file.jl
Outdated
|
||
tempname(uunique::UInt32=UInt32(0); prefix=temp_prefix) = | ||
tempname(tempdir(), uunique ; prefix=prefix) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One blank line is enough.
base/file.jl
Outdated
function mktemp(parent=tempdir()) | ||
filename = tempname(parent, UInt32(0)) | ||
function mktemp(parent=tempdir(); prefix=temp_prefix) | ||
filename = tempname(parent, UInt32(0); prefix = prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No spaces around =
for consistency. Same below.
base/file.jl
Outdated
|
||
Generate a unique temporary file path. | ||
*Note: The `prefix` may be truncated based on the limitations of the OS.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given what you noted about POSIX, and since Windows also truncates, it would make sense to say something like "will generally be truncated to three or five characters". Also better mention this in the sentence about prefix
as it's not an interesting note if you don't use it (as I said before, no need to repeat the default).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tempname() is to be deprecated in future. It may be ideal to just drop support of prefix from tempname for now. You still would need to keep the prefix in Windows version or rename prefix based version to something else as that will be used internally by mktemp or mktempdir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT we won't be able to get rid of the 3-char limit on Windows even for mktemp
, so the issue is more general than tempname
. Anyway my remark applied to other docstrings too.
test/file.jl
Outdated
mktemp(; prefix=tst_prefix) do tmp,io | ||
file_prefix_test(tmp, tst_prefix) | ||
end | ||
#In unix the prefix is computed upto 5 bytes and not chars so will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"up to". Also add spaces after #
here and elsewhere.
test/file.jl
Outdated
@@ -1156,3 +1156,70 @@ if !Sys.iswindows() | |||
test_22566() | |||
end | |||
end # !Sys.iswindows | |||
|
|||
function file_prefix_test(path, prefix, pfxlen=length(prefix), fails=false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fails
isn't used, better remove it.
test/file.jl
Outdated
@@ -1156,3 +1156,70 @@ if !Sys.iswindows() | |||
test_22566() | |||
end | |||
end # !Sys.iswindows | |||
|
|||
function file_prefix_test(path, prefix, pfxlen=length(prefix), fails=false) | |||
if Sys.iswindows() && (pfxlen > 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not also set pfxlen = 5
on Unix? Then you wouldn't need that argument at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For mktemp or mktempdir on Unix prefix can be any length. There is no restriction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Then maybe just mention tempname
in the comment below about the 5-byte limit on Unix.
test/file.jl
Outdated
#fail on unicode test. But extracting first 5 bytes and creating an | ||
#invalid 5-byte character array and converting to a string will pass | ||
if !Sys.iswindows() | ||
bytepfx=transcode(UInt8, tst_prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test shouldn't have to be so complex: what we want to know is whether passing a non-ASCII prefix will give a reasonable result. Real users won't use transcode
. So these lines should be the same on Windows and Unix, passing the full prefix string. Only file_prefix_test
should contain OS-specific hacks if needed. But honestly, if checking that the full prefix is used is too hard, just check that the first characters are used (below 5 bytes). What matters is that no error is thrown and that some part of the prefix is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the whole question here should be whether you are looking at Julia as a platform or OS as the platform. Ideally, a cross platform behavior of APIs should not be dependent on the underlying API used in the platform. Julia platform can use a common minimum behavior but for any advanced user the system APIs are always there to implement additional behaviors.
Hence, my first check-in truncated the number of characters to 3 for character lengths higher than 3 and ignored the prefix for prefix length lesser than 3 characters.
One additional restriction that can be imposed is prefix can only be ASCII or in my opinion base85 charsets other than the characters that cannot be technically part of a file path like: "/|*$" etc. which is not supported either in any of the OS as a file path or can technically complicate shell script writing.
Non-printable characters should be avoided as those can be considered malicious by a discerning system admin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be consistent with whatever mkdir
does. AFAIK it's not our job to check the contents of the string, and especially not of the prefix, which is unlikely to be a user-provided string. The OS will raise an error if the character is not allowed.
Whether to automatically truncate the prefix to 3 characters is a tough question. On the one hand it would allow a more consistent behavior. On the other hand, if a OS turns out not to support prefixes, or prefixes of only one characters, we will have to break API to ensure consistency, which is certainly not a good idea. So I think I'd rather keep the current behavior where it's unspecified whether the prefix will actually be used. Anyway these names are unpredictable for the caller, so variations are not an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I beg to defer here. What you claim here is accurate if the functions under discussion would have been system calls. Because you can do no better than a system offers within the purview of the system. I do not think the functions in discussion here are system calls. You can refer to system calls for Linux at:
http://syscalls.kernelgrok.com/
original Unix system calls are:
http://www.di.uevora.pt/~lmr/syscalls.html
mkdir is a system call not mktemp to best of my knowledge. So emulating behavior of mkdir is not an accurate expectation. For performance reasons syscalls do not do input sanity checks, does not mean a user realm function should not.
Hence, as a library you can always decide what your library should allow or not allow. Of course, your library cannot do better than what mkdir() offers. Now the question is do you consider Julia as a platform or underlying OS. In terms of design philosophy you can choose to offer whether consistency across OS is relevant or exposing the platforms internal API to the user by stating read a specific API documentation for details.
Coming to the specific use case here:
One can always argue why restrict mktemp() behavior but allow it to behave completely like mkstemp() while mkstemp() will support suffixes as well.
Hence, my take will be to decide on a platform behavior which will be consistent to best of Julia's currently running platforms and implement that till you reach a platform which has hypothetical prefix restrictions. The restrictions are only hypothetical, unless those limitations are implemented in the kernel with system calls for a platform. Glibc or libc behaviors can always be ignored or overwritten with your own methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the Python code as well. The general philosophy of the system has been to be POSIX compliant and emulate the POSIX behavior on other OSes. So Python implements mkstemp() exactly as a POSIX system but underlying implementation does not make any OS call other than for creating, opening and closing the file. Hence, even in MS Windows it should not have 3 char prefix or lack of suffix implementation limitations.
Essentially, the whole of user level implementation for mkstemp() is written in pure Python other than the system calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @vtjnash noted below, we should use the libuv implementation so that we don't have to worry about consistency across OSes. If you want, you can implement this in the present PR, but we could also do that later, and for now do not apply any checks manually.
test/file.jl
Outdated
end | ||
|
||
filename = basename(path) | ||
filepart = filename[1:pfxlen] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also use startswith
.
The build error is due to a timeout. Test file has passed already. |
fail on unicode test. But extracting first 5 bytes and creating an | ||
invalid 5-byte character array and converting to a string will pass | ||
=# | ||
if !Sys.iswindows() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving a new comment to remember that this should be addressed.
base/file.jl
Outdated
Apply the function `f` to the result of [`mktempdir(parent)`](@ref) and remove the | ||
temporary directory upon completion. | ||
Apply the function `f` to the result of [`mktempdir(parent; prefix)`](@ref) and remove the | ||
temporary directory upon completion. A `prefix` can be provided in addition which, otherwise defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in addition, which otherwise
base/file.jl
Outdated
function tempname(temppath::AbstractString,uunique::UInt32) | ||
|
||
tempname(uunique::UInt32=UInt32(0); prefix=temp_prefix) = | ||
tempname(tempdir(), uunique ; prefix=prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tempname(tempdir(), uunique; prefix=prefix)
(no space before ;
)
b = joinpath(parent, "tmpXXXXXX") | ||
function mktempdir(parent=tempdir(); prefix=temp_prefix) | ||
format = prefix*"XXXXXX" | ||
b = joinpath(parent, format) | ||
p = ccall(:mkdtemp, Cstring, (Cstring,), b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libuv now has a mkdtemp implementation, which could be used to reduce the number of code-paths we have to test for this function (http://docs.libuv.org/en/v1.x/fs.html#c.uv_fs_mkdtemp)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Though this should probably be done in another PR.
seed::UInt32 = rand(UInt32) | ||
while true | ||
if (seed & typemax(UInt16)) == 0 | ||
seed += 1 | ||
end | ||
filename = tempname(parent, seed) | ||
filename = tempname(parent, seed; prefix = prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spacing is still inconsistent here.
|
||
Returns `(path, io)`, where `path` is the path of a new temporary file in `parent` and `io` | ||
is an open file object for this path. | ||
is an open file object for this path. A `prefix` can be provided in addition, which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I've noted several times, please do not repeat the default, which is already shown in the signature. This applies to other docstrings as well.
While I wished I could spend more time on the PR (#22998), with my personal engagements elsewhere it may not be practical for me to commit further on it now. My apologies for the same. Whoever is interested can work on the master branch and take the PR to a closure. I will be happy to blanket approve any pull request on that branch in case that comes my way. I have already branched out my sandbox so there are no issues there. I have already summarized my observations from the discussions into the Issue (#22922) so that one can get a clearer picture on a possible API behavior rather than going through the long trail. |
Closing as a cleaner platform independent implementation is submitted as part of #23237 |
Fixes #22922.