Skip to content

Conversation

brettwooldridge
Copy link
Contributor

Fairly simple refactor removing synchronized block and replacing with a reentrant read-write lock, increasing concurrency in the primary (exists in map) path which allows multiple reader threads concurrent access.

@matthiasblaesing
Copy link
Member

How is the performance evaluated? This can only be better than the synchronized block when the code path is indeed often enough hit in a contended situation. My gut feeling would be, that the overhead of the lock trumps the benefit the potential(!) parallel invocation brings.

@brettwooldridge
Copy link
Contributor Author

brettwooldridge commented Sep 27, 2025

This one would be hard to measure empirically, but only because of the number of callsites to NativeMappedConverter.getInstance(). It's called from everywhere all the time. Ok, that's an exaggeration, but there are so many callsites, including Pointer.getValue(), that frequent concurrent access is practically guaranteed.

Any time a Structure is initialized, initializeFields() iterates every field, and for every "NativeMapped" field type (Boolean, Byte, Short, Character, Integer, NativeLong, Long, Float, Double, String, WString, Buffer) a call to NativeMappedConverter.getInstance() is made.

If constructing Structure objects from multiple threads is not what JNA applications are doing under load, I guess I don't know what people are using JNA for.

The Structure case is just one of many callers to NativeMappedConverter.getInstance(). Every native function (Function) that is called with Java parameters of types listed above ultimately has to invoke this method via convertArguments() for each such parameter. Again, if applications are not invoking native functions from simultaneous threads under load, I don't know what they're doing.

EDIT: I'll generate a JMH test harness.

@brettwooldridge
Copy link
Contributor Author

brettwooldridge commented Sep 28, 2025

JMH Results

Epyc 7402 Proxmox VM 16 vCore

This was an extremely long run, approximately 40 minutes, threads 2x vCore count.

java -Xmx16g -Xms16g -jar target/benchmarks.jar -t 32 -f 20 -i 4 -wi 2

5.18.0
Benchmark                Mode  Cnt       Score       Error  Units
MyBenchmark.testMethod  thrpt   80  770748.381 ± 72194.287  ops/s

5.19.0-SNAPSHOT
Benchmark                Mode  Cnt       Score       Error  Units
MyBenchmark.testMethod  thrpt   80  781416.413 ± 66219.522  ops/s

JMH Harness

package eu.doppelhelix.jna.jmh;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.infra.Blackhole;
import com.sun.jna.Structure;

public class MyBenchmark {

    public static class MyStruct extends Structure {
        public byte boolValue;      // Java boolean as native byte (0=false, 1=true)
        public byte byteValue;      // Java byte
        public short shortValue;    // Java short
        public char charValue;      // Java char (may map to C wchar_t)
        public int intValue;        // Java int
        public long longValue;      // Java long
        public float floatValue;    // Java float
        public double doubleValue;  // Java double
        public String stringValue;  // JNA auto-converts to const char * or wchar_t *

        @Override
        protected java.util.List<String> getFieldOrder() {
            return java.util.Arrays.asList(
                    "boolValue",
                    "byteValue",
                    "shortValue",
                    "charValue",
                    "intValue",
                    "longValue",
                    "floatValue",
                    "doubleValue",
                    "stringValue"
            );
        }
    }

    @Benchmark
    public void testMethod(Blackhole blackhole) {
        MyStruct struct = new MyStruct();
        blackhole.consume(struct);
    }
}

@matthiasblaesing
Copy link
Member

Thank you. The performance difference is in the range I expected. The observed speedup is well within the error range.

I have a bad feeling integrating this given the recent fallouts from performance improvements.

The new code path introduces a race:

            return converters.computeIfAbsent(cls, (clz) -> {
                NativeMappedConverter nmc = new NativeMappedConverter(cls);
                return new SoftReference<>(nmc);
            }).get();

Rewritten this is:

            var converterRef = converters.computeIfAbsent(cls, (clz) -> {
                NativeMappedConverter nmc = new NativeMappedConverter(cls);
                return new SoftReference<>(nmc);
            });
           return converterRef.get();

What could happen now is, that GC collects the instance referenced by converterRef before the get() is executed. Before the change there is always a hardref to the NativeMappedConverter, so it can't ever be null.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants