Skip to content

Support prefix argument in mktemp() functions #22998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Closed

Support prefix argument in mktemp() functions #22998

wants to merge 12 commits into from

Conversation

sambitdash
Copy link
Contributor

@sambitdash sambitdash commented Jul 28, 2017

  1. tempname(), mktemp(), mktempdir() can now have a keyword prefix.
  2. Default value of the prefix is "jl_" for both windows and non-windows platforms

Fixes #22922.

ensuring there are no conflicts and code is latest.
1. tempname(), mktemp(), mktempdir() can now have a keyword prefix.
2. Default value of the prefix is "jl_" for both windows and non-windows
   platforms
Making the platform code file separator independent.
@sambitdash
Copy link
Contributor Author

The failure in distributed.jl does not seem to be related to the check-in.

@nalimilan nalimilan changed the title Fixing test case error in #22991 due to "/" separator in windows Support prefix argument in mktemp() functions Jul 30, 2017
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have a few more comments.

base/file.jl Outdated
const temp_prefix = cwstring("jl_")
function tempname(temppath::AbstractString,uunique::UInt32)

tempname(uunique::UInt32=UInt32(0) ; prefix = temp_prefix) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below: no space before ;, and no spaces around = in signatures (for consistency).

base/file.jl Outdated
@@ -302,9 +311,10 @@ end

else # !windows
# Obtain a temporary filename.
function tempname()

function tempname(;prefix = temp_prefix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space after ;. Same below.

base/file.jl Outdated
tempname(;prefix="jl_")

Generate a unique temporary file path. A `prefix` can be provided in addition, which
otherwise defaults to `jl_`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to repeat the default. OTOH, you can mention the truncation directly in that sentence.

base/file.jl Outdated
@@ -341,37 +353,47 @@ Obtain the path of a temporary directory (possibly shared with other processes).
tempdir()

"""
tempname()
tempname(;prefix="jl_")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can replace "jl_" with $temp_prefix to make sure it's in sync with the corresponding constant.

test/file.jl Outdated

function test_22922()
mktempdir() do tmpdir
(path, dirname) = splitdir(tmpdir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use basename to get the directory name (but don't call it dirname since that's the name of a function returning the equivalent of path).

test/file.jl Outdated
(path,filename) = splitdir(tmp)
@test filename[1:3] == "012"
if !Sys.iswindows()
#Unix like OS have prefix size limit of 5 chars
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this limit mentioned in POSIX?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://pubs.opengroup.org/onlinepubs/9699919799/functions/tempnam.html

Many applications prefer their temporary files to have certain initial letter sequences in their names. The pfx argument should be used for this. This argument may be a null pointer or point to a string of up to five bytes to be used as the beginning of the filename.

test/file.jl Outdated
(path, dirname) = splitdir(tmpdir)
@test dirname[1:3] == "jl_"
end
mktempdir(;prefix="ABC") do tmpdir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to test non-ASCII characters too.

base/file.jl Outdated

tempname(uunique::UInt32=UInt32(0); prefix=temp_prefix) =
tempname(tempdir(), uunique ; prefix=prefix)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One blank line is enough.

base/file.jl Outdated
function mktemp(parent=tempdir())
filename = tempname(parent, UInt32(0))
function mktemp(parent=tempdir(); prefix=temp_prefix)
filename = tempname(parent, UInt32(0); prefix = prefix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No spaces around = for consistency. Same below.

base/file.jl Outdated

Generate a unique temporary file path.
*Note: The `prefix` may be truncated based on the limitations of the OS.*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given what you noted about POSIX, and since Windows also truncates, it would make sense to say something like "will generally be truncated to three or five characters". Also better mention this in the sentence about prefix as it's not an interesting note if you don't use it (as I said before, no need to repeat the default).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tempname() is to be deprecated in future. It may be ideal to just drop support of prefix from tempname for now. You still would need to keep the prefix in Windows version or rename prefix based version to something else as that will be used internally by mktemp or mktempdir.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT we won't be able to get rid of the 3-char limit on Windows even for mktemp, so the issue is more general than tempname. Anyway my remark applied to other docstrings too.

test/file.jl Outdated
mktemp(; prefix=tst_prefix) do tmp,io
file_prefix_test(tmp, tst_prefix)
end
#In unix the prefix is computed upto 5 bytes and not chars so will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"up to". Also add spaces after # here and elsewhere.

test/file.jl Outdated
@@ -1156,3 +1156,70 @@ if !Sys.iswindows()
test_22566()
end
end # !Sys.iswindows

function file_prefix_test(path, prefix, pfxlen=length(prefix), fails=false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fails isn't used, better remove it.

test/file.jl Outdated
@@ -1156,3 +1156,70 @@ if !Sys.iswindows()
test_22566()
end
end # !Sys.iswindows

function file_prefix_test(path, prefix, pfxlen=length(prefix), fails=false)
if Sys.iswindows() && (pfxlen > 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not also set pfxlen = 5 on Unix? Then you wouldn't need that argument at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For mktemp or mktempdir on Unix prefix can be any length. There is no restriction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Then maybe just mention tempname in the comment below about the 5-byte limit on Unix.

test/file.jl Outdated
#fail on unicode test. But extracting first 5 bytes and creating an
#invalid 5-byte character array and converting to a string will pass
if !Sys.iswindows()
bytepfx=transcode(UInt8, tst_prefix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test shouldn't have to be so complex: what we want to know is whether passing a non-ASCII prefix will give a reasonable result. Real users won't use transcode. So these lines should be the same on Windows and Unix, passing the full prefix string. Only file_prefix_test should contain OS-specific hacks if needed. But honestly, if checking that the full prefix is used is too hard, just check that the first characters are used (below 5 bytes). What matters is that no error is thrown and that some part of the prefix is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the whole question here should be whether you are looking at Julia as a platform or OS as the platform. Ideally, a cross platform behavior of APIs should not be dependent on the underlying API used in the platform. Julia platform can use a common minimum behavior but for any advanced user the system APIs are always there to implement additional behaviors.

Hence, my first check-in truncated the number of characters to 3 for character lengths higher than 3 and ignored the prefix for prefix length lesser than 3 characters.

One additional restriction that can be imposed is prefix can only be ASCII or in my opinion base85 charsets other than the characters that cannot be technically part of a file path like: "/|*$" etc. which is not supported either in any of the OS as a file path or can technically complicate shell script writing.

Non-printable characters should be avoided as those can be considered malicious by a discerning system admin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be consistent with whatever mkdir does. AFAIK it's not our job to check the contents of the string, and especially not of the prefix, which is unlikely to be a user-provided string. The OS will raise an error if the character is not allowed.

Whether to automatically truncate the prefix to 3 characters is a tough question. On the one hand it would allow a more consistent behavior. On the other hand, if a OS turns out not to support prefixes, or prefixes of only one characters, we will have to break API to ensure consistency, which is certainly not a good idea. So I think I'd rather keep the current behavior where it's unspecified whether the prefix will actually be used. Anyway these names are unpredictable for the caller, so variations are not an issue.

Copy link
Contributor Author

@sambitdash sambitdash Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I beg to defer here. What you claim here is accurate if the functions under discussion would have been system calls. Because you can do no better than a system offers within the purview of the system. I do not think the functions in discussion here are system calls. You can refer to system calls for Linux at:

http://syscalls.kernelgrok.com/

original Unix system calls are:

http://www.di.uevora.pt/~lmr/syscalls.html

mkdir is a system call not mktemp to best of my knowledge. So emulating behavior of mkdir is not an accurate expectation. For performance reasons syscalls do not do input sanity checks, does not mean a user realm function should not.

Hence, as a library you can always decide what your library should allow or not allow. Of course, your library cannot do better than what mkdir() offers. Now the question is do you consider Julia as a platform or underlying OS. In terms of design philosophy you can choose to offer whether consistency across OS is relevant or exposing the platforms internal API to the user by stating read a specific API documentation for details.

Coming to the specific use case here:

One can always argue why restrict mktemp() behavior but allow it to behave completely like mkstemp() while mkstemp() will support suffixes as well.

Hence, my take will be to decide on a platform behavior which will be consistent to best of Julia's currently running platforms and implement that till you reach a platform which has hypothetical prefix restrictions. The restrictions are only hypothetical, unless those limitations are implemented in the kernel with system calls for a platform. Glibc or libc behaviors can always be ignored or overwritten with your own methods.

Copy link
Contributor Author

@sambitdash sambitdash Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the Python code as well. The general philosophy of the system has been to be POSIX compliant and emulate the POSIX behavior on other OSes. So Python implements mkstemp() exactly as a POSIX system but underlying implementation does not make any OS call other than for creating, opening and closing the file. Hence, even in MS Windows it should not have 3 char prefix or lack of suffix implementation limitations.

Essentially, the whole of user level implementation for mkstemp() is written in pure Python other than the system calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @vtjnash noted below, we should use the libuv implementation so that we don't have to worry about consistency across OSes. If you want, you can implement this in the present PR, but we could also do that later, and for now do not apply any checks manually.

test/file.jl Outdated
end

filename = basename(path)
filepart = filename[1:pfxlen]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use startswith.

@sambitdash
Copy link
Contributor Author

The build error is due to a timeout. Test file has passed already.

fail on unicode test. But extracting first 5 bytes and creating an
invalid 5-byte character array and converting to a string will pass
=#
if !Sys.iswindows()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving a new comment to remember that this should be addressed.

base/file.jl Outdated
Apply the function `f` to the result of [`mktempdir(parent)`](@ref) and remove the
temporary directory upon completion.
Apply the function `f` to the result of [`mktempdir(parent; prefix)`](@ref) and remove the
temporary directory upon completion. A `prefix` can be provided in addition which, otherwise defaults
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in addition, which otherwise

base/file.jl Outdated
function tempname(temppath::AbstractString,uunique::UInt32)

tempname(uunique::UInt32=UInt32(0); prefix=temp_prefix) =
tempname(tempdir(), uunique ; prefix=prefix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tempname(tempdir(), uunique; prefix=prefix)
(no space before ;)

b = joinpath(parent, "tmpXXXXXX")
function mktempdir(parent=tempdir(); prefix=temp_prefix)
format = prefix*"XXXXXX"
b = joinpath(parent, format)
p = ccall(:mkdtemp, Cstring, (Cstring,), b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libuv now has a mkdtemp implementation, which could be used to reduce the number of code-paths we have to test for this function (http://docs.libuv.org/en/v1.x/fs.html#c.uv_fs_mkdtemp)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Though this should probably be done in another PR.

seed::UInt32 = rand(UInt32)
while true
if (seed & typemax(UInt16)) == 0
seed += 1
end
filename = tempname(parent, seed)
filename = tempname(parent, seed; prefix = prefix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing is still inconsistent here.


Returns `(path, io)`, where `path` is the path of a new temporary file in `parent` and `io`
is an open file object for this path.
is an open file object for this path. A `prefix` can be provided in addition, which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I've noted several times, please do not repeat the default, which is already shown in the signature. This applies to other docstrings as well.

@sambitdash
Copy link
Contributor Author

While I wished I could spend more time on the PR (#22998), with my personal engagements elsewhere it may not be practical for me to commit further on it now. My apologies for the same. Whoever is interested can work on the master branch and take the PR to a closure. I will be happy to blanket approve any pull request on that branch in case that comes my way. I have already branched out my sandbox so there are no issues there.

I have already summarized my observations from the discussions into the Issue (#22922) so that one can get a clearer picture on a possible API behavior rather than going through the long trail.

@sambitdash
Copy link
Contributor Author

Closing as a cleaner platform independent implementation is submitted as part of #23237

@sambitdash sambitdash closed this Aug 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mktempdir() should provide an option of a prefix very similar to you can do with mkdtemp in posix
4 participants