Skip to content

Clarify ccall evaluation semantics #57931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Keno opened this issue Mar 28, 2025 · 5 comments
Open

Clarify ccall evaluation semantics #57931

Keno opened this issue Mar 28, 2025 · 5 comments

Comments

@Keno
Copy link
Member

Keno commented Mar 28, 2025

This aims to summarize a design discussion between @vtjnash, @JeffBezanson, @StefanKarpinski, @gbaraldi, @topolarity, @xal-0, @mlechu, @oscardssmith and myself around ccall. Except where otherwise annotated, I've tried to capture my understanding the consensus view, but it is of course possible that I have misunderstood or failed to remember in objection. In such cases, error is mine.

Discussion of the problems and current design

The current evaluation semantics of ccall are quite old and predate us having a particular good understanding of what the evaluation semantics of the language should be (and in particular, predate any notion of world ages, partitioned bindings, effects, etc.). For this reason, it is somewhat hard to give a coherent description of the current design.
However, I will try my best.

Syntax based disambiguation of (non-)library case

There are two basic cases for ccall. The first is call with a plain symbol or ptr:

ccall(sym, ...)

The second is

ccall((sym, lib), ...)

These cases are distinguished both syntactically a semantically. In particular,

x = sym
ccall(x)

is always the same as ccall(sym), but the same is not true for x = (sym, lib).

Evaluation sematics of the first argument

We first consider the basic case without libraries. Here we are familiar
with the usual ccall syntax:

julia> ccall(:sin, Float64, (Float64,), 1.0)
0.8414709848078965

What happens if we use an expression instead?

julia> f1() = ccall((println("Hello"); :sin), Float64, (Float64,), 1.0)
0.8414709848078965

julia> f1()
Hello
0.8414709848078965

julia> f1()
Hello
0.8414709848078965

So far so normal. However, this is where it starts to get weird. I think the
best way I can describe it is that ccall tries to find a symbol in this order
by:

  1. Statically looking at the expression that is syntactically inside ccall and
    determining whether or not it can figure out the value of the first argument.

  2. Using inference's constant propagation

  3. Using codegen's constant propagation (but not LLVM's)

However, for the non-lib case, the first of these is generally a no-op.

This leads to the following observed behavior:

f2() = ccall((@noinline identity(:sin)), Float64, (Float64,), 1.0)
f3() = ccall(Base.compilerbarrier(:const, :sin), Float64, (Float64,), 1.0)
f4() = ccall(Base.compilerbarrier(:const, (@noinline identity(:sin))), Float64, (Float64,), 1.0)
f5() = ccall((@noinline identity(Base.compilerbarrier(:const, :sin))), Float64, (Float64,), 1.0)
julia> f2()
0.8414709848078965

julia> f3()
0.8414709848078965

julia> f4()
0.8414709848078965

julia> f5()
ERROR: TypeError: in ccall: first argument not a pointer or valid constant expression, expected Ptr, got a value of type Symbol
Stacktrace:
 [1] f5()
   @ Main ./REPL[19]:1
 [2] top-level scope
   @ REPL[24]:1

To me, this is entirely unintuitive and I had to actually go read the source to figure out which cases I think would work and which didn't.

Additional complications from lib lowering

The situation becomes more complicated when using the lib syntax:

julia> ccall(((println("Hello"); :sin), :openlibm), Float64, (Float64,), 1.0)
Internal error: encountered unexpected error during compilation of top-level scope:
ErrorException("unsupported or misplaced expression \"block\" in function top-level scope")
ijl_errorf at /home/keno/julia/src/rtutils.c:77
emit_expr at /home/keno/julia/src/codegen.cpp:6694

julia> function g1()
               x = (println("Hello"); :sin)
               ccall((x, :openlibm), Float64, (Float64,), 1.0)
       end
ERROR: syntax: ccall function name and library expression cannot reference local variables
Stacktrace:
 [1] top-level scope
   @ REPL[1]:1

julia> function g2()
          x = ((println("Hello"); :sin), :libopenlibm)
          ccall(x, Float64, (Float64,), 1.0)
       end
g2 (generic function with 1 method)

julia> g2()
Hello
0.8414709848078965

julia> syms() = (println("Hello"); :sin)
syms (generic function with 1 method)

julia> ccall((syms(), :openlibm), Float64, (Float64,), 1.0)
Hello
ERROR: TypeError: in ccall: first argument not a pointer or valid constant expression, expected Ptr, got a value of type Tuple{Symbol, Symbol}
Stacktrace:
 [1] top-level scope
   @ ./REPL[3]:1

Additional complications from bindings partition

Post-bindings partition, there is an additional complication that both cases 1 and 3 depend on inferred world age bounds of the rest of the function, which can lead to completely non-intuitive behavior. This is #57749, although I will mention it here also:

julia> const sinsym = :sin
:sin

julia> g3() = ccall((sinsym, :libopenlibm), Float64, (Float64,), 1.0)
f6 (generic function with 1 method)

julia> g3()
ERROR: TypeError: in ccall: first argument not a pointer or valid constant expression, expected Ptr, got a value of type Tuple{Symbol, Symbol}
Stacktrace:
 [1] f6()
   @ Main ./REPL[15]:1
 [2] top-level scope
   @ REPL[16]:1

julia> const completely_unrelated = 2
2

julia> g4() = (completely_unrelated; ccall((sinsym, :libopenlibm), Float64, (Float64,), 1.0))
g4 (generic function with 1 method)

julia> g4()
0.8414709848078965

This is arguably a separate inference bug that should be addressed by properly modeling this in inference, but stems from the same underlying confusion around the evaluation semantics of the first argument of ccall.

Hidden generic function call inside :foreigncall

This one might be a bit academic, but as of #50074, there is a hidden call to Libdl.dlopen inside :foreigncall. This dynamic call edge is not modeled and thus
susceptible to #265-like issues, invisible to trimming, etc. Now, there is already a non-standard caching here, but I wanted to list it for completeness.

Implicit caching of dlsym

This one isn't so much a problem as it is an aspect of the current design that needs
to be preserved for performance. In particular, the codegen for :foreigncall currently looks something like (in pesudo C syntax)

static void *cache;
# Expr(:foreigncall, (lib, sym), ...)
{
	if (!_cache) {
		_cache = jl_lazy_load_and_lookup(lib, sym) # Potentially calls Libdl.dlopen internally
	}
	(*_cache)(...)
}

Plus some optimizations to fold away the lookup if it can be resolved statically by the JIT or to turn the lookup into a PLT-like structure.

Additional desirable features

An additional desirable feature that was discussed was that in the context of --trim, we would like to have the ability to statically link executables without assuming the presence of a dynamic linker at runtime, ideally while preserving the namespace scoping behavior of the current design. This is a little tricky, because the underlying systme linker generally does not have namespacing. How exactly to do this is outside the scope of this document, but the key consideration is that this constrains us to require a solution that juliac might be able to turn the dynamic references into static ones.

Proposed solutions

The first and most immediate question to answer is what the evaluation scope of the first argument of ccall is. I think there are roughly three reasonable answers:

  1. It gets evaluated in toplevel scope at definition time, i.e. the following would error:
function foo()
	ccall(sym, ...)
end # Error: UndefinedVarError(:sym)
const sym = ...
  1. It gets evaluated (with usual evaluation semantic) at the same time as the cache logic. I.e. we'd have the following:
foo() = ccall((println("Hello"); :sin), ...)

julia> foo()
Hello
0.8414709848078965

julia> foo() # Does not print the second time because the lookup is already cached
0.8414709848078965
  1. The expression and dlsym lookup get evaluated everytime

  2. The expresssion gets evaluated every time, but the dlsym lookup is cached the first time it gets evaluated (for a particular native code instance)

  3. The expression gets evaluated every time, but the dlsym lookup is cached the first time it gets evaluated (for a particular native code instance) plus gets recached when the value of expression changes

Discussion

Option 3 is the most straightforward behavior in that it is completely dynamic and does not rely on any compiler information. However, because of the lack of caching, it is also prohibitive. However, in general, I think everyone agreed that if it was fast, it would be a good semantic, which is a useful guiding principle for selecting which option to use.

Option 4 is somewhat reminiscent of what we have right now, except that we have extra requirements (i.e. the symbol name needs to be a constant expression). These requirements make the behavior of ccall very confusion, though to be fair, they also by default prohibit some problematic cases.

In particular, there's a question about non-constant cases like

julia> globalsin = :sin
julia> f() = ccall(globalsin, Float64, (Float64,), 1.0)
julia> f()
julia> globalsin = :cos
julia> f()

What should this do? Currently this case is disallowed due to the (semantically strange) constantness detection on the first argument. We do not have constantness detection anywhere else in the language, and in general our semantics are entirely value and type based, so we do need to decide some behavior for this case.

The tradeoff here is essentially one of performance vs surprise. Option 4 has detect performance, but a high surprise level. The answer would change whenever f gets re-codegen'ed which is not something that users are traditionally expected to have a mental model of. Option 2 has the same problem (but differs for ccall((println("Hello"); globalsin)))). Option 1 goes in the direction of retaining the performance while solving the surprise problem by never-reevaluating (although Revise might do so explicitly). Option 5 goes into the opposite direction of sacrificing some performance (in the - rare, unlikely - fully dynamic case), but reducing surprise by being closer to the native Option 3.

As a general design principle in Julia, we do tend to choose the most dynamic behavior (as long as it is still possible to optimize the common case), which would weigh in favor of option 5.

Implementation Considerations

Given the above considerations, I think the general consenus was that option 5 is preferred. It is close enough to the current semantics (and identical in the cases
that people actually use) that we should be able to do it without deprecation cycle.

To implement this, I had suggested the following lowering:

@assume_effects :consistent_once_per_process :effect_free :consistent_termination dlsym(...)::Ptr{Cvoid} = ... 

function do_call()
	# ccall((bar, lib), ...)
	ptr = inline_cache(dlsym(lib, bar))
	$(Expr(:foreigncall, ptr, ...))
end

where consistent_once_per_process is a new non-IPO effect annotation that specifies that the result of a call may be assumed :consistent within each process (i.e. may be cached for egal arguments). And inline_cache is a new_builtin that is semantically a no-op, but annotates to the optimizer to perform the caching optimization (using :consistent-cy inference to determine the scope of the region to cache). @vtjnash objected to this scheme on complexity grounds, @JeffBezanson objected on IR size grounds. However, I think people generally agreed that the semantics were reasonable.

Thus, the ultimate proposal is to treat the above as the semantic representation of what :foreigncall should do, but keep both the dlsym generic call and the inline_cache inside :foreigncall where they are now. The various places that need to
analyze this (inference, trimming, etc.) should then model :foreigncall to include the generic call to dlsym, including providing edges for it.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Apr 2, 2025

Is option 5 effectively the same as 3 except with caching as an optimization? Or is there some scenario where dlsym could theoretically give different answers for the same inputs? The inline_cache implementation seems to me like it could be very simple as long as the cache size is one—it's just a matter of doing the equivalent of this:

value = if lib === old_lib && bar === old_bar
    cache_value
else
    cache_value = dlsym(lib, bar)
end

The tricky part that makes it not implementable in Julia is that you want cache_value to be static per call-site.

@Keno
Copy link
Member Author

Keno commented Apr 2, 2025

Is option 5 effectively the same as 3 except with caching as an optimization?

Yes, under suitable assumptions on the behavior of dlsym that should be met by the libc one (but the semantics express no opinion that needs to be the only way to lookup symbols)

it's just a matter of doing the equivalent of this:

Correct. The objection is not to implementation difficulty, but to IR size or the explicit extra calls.

The tricky part that makes it not implementable in Julia is that you want cache_value to be static per call-site.

Yes

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Apr 3, 2025

Would it be worthwhile to have a language feature for creating and accessing static per-call site boxes? Seems like a generally useful feature. Given that feature, ccall lowering could be fairly straightforward and unmagical.

@xal-0
Copy link
Member

xal-0 commented Apr 30, 2025

language feature for creating and accessing static per-call site boxes

We have one by accident, haha:

julia> using Libdl; libc = dlopen("libc")
Ptr{Nothing}(0x0000000332eda988)

julia> eval(:(function foo()
           $(Expr(:var"toplevel-butfirst", 
                  :(ccall(Main.fptr, Cvoid, (Cstring,), "hello")),
                  :(const fptr = dlsym(libc, :puts))))
       end))
foo (generic function with 1 method)

julia> foo()
hello

#58279 removes the global handling from resolve_definition_effects, and it could be removed entirely if we moved the evaluation of the return and argument types to the toplevel during lowering too.

@StefanKarpinski
Copy link
Member

Yeah you can hack it with generated functions too, but I mean something less awful and official.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants