Skip to content

Running Qwen3 XNNPACK on Android fails while setting up pretokenizer #10867

Closed
@kkimmk

Description

@kkimmk

I'm trying to run Qwen3-0.6B model on Android using XNNPACK, following the instructions in qwen3 example and step4 of llama example.

When running ./llama_main --model_path qwen3-0_6b.pte --tokenizer_path tokenizer.json --prompt "Hi" --seq_len 120 on Galaxy S23, I got the following error while setting up pretokenizer.

I 00:00:00.003396 executorch:cpuinfo_utils.cpp:62] Reading file /sys/devices/soc0/image_version
I 00:00:00.003583 executorch:main.cpp:76] Resetting threadpool with num threads = 4
I 00:00:00.008963 executorch:runner.cpp:90] Creating LLaMa runner: model_path=qwen3-0_6b.pte, tokenizer_path=tokenizer.json
Setting up pretokenizer...
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1747192804.212017   24371 re2.cc:237] Error parsing '((?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s...': invalid perl operator: (?!
RE2 failed to compile pattern with lookahead: (?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+
Error: invalid perl operator: (?!
Compile with SUPPORT_REGEX_LOOKAHEAD=ON to enable support for lookahead patterns.
libc++abi: terminating due to uncaught exception of type std::runtime_error: Error: 4
Aborted

It seems the runner now accepts .json format for the tokenizer. Is there anything I'm missing?

Environments

  • executorch main branch (b173722)
  • Android NDK r28b
  • Galaxy S23
  • Run install_executorch.sh --pybind xnnpack and examples/models/llama/install_requirements.sh.

Commands used (used commands in the instructions)

  • model
    python -m examples.models.llama.export_llama \
      --model qwen3-0_6b \
      --params examples/models/qwen3/0_6b_config.json \
      -kv \
      --use_sdpa_with_kv_cache \
      -d fp32 \
      -X \
      --xnnpack-extended-ops \
      -qmode 8da4w \
      --metadata '{"get_bos_id": 151644, "get_eos_ids":[151645]}' \
      --output_name="qwen3-0_6b.pte" \
      --verbose
    
  • runner
    cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
        -DANDROID_ABI=arm64-v8a \
        -DANDROID_PLATFORM=android-23 \
        -DCMAKE_INSTALL_PREFIX=cmake-out-android \
        -DCMAKE_BUILD_TYPE=Release \
        -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
        -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
        -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
        -DEXECUTORCH_ENABLE_LOGGING=1 \
        -DPYTHON_EXECUTABLE=python \
        -DEXECUTORCH_BUILD_XNNPACK=ON \
        -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
        -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
        -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
        -Bcmake-out-android .
    
    cmake --build cmake-out-android -j16 --target install --config Release
    
    cmake  -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
        -DANDROID_ABI=arm64-v8a \
        -DANDROID_PLATFORM=android-23 \
        -DCMAKE_INSTALL_PREFIX=cmake-out-android \
        -DCMAKE_BUILD_TYPE=Release \
        -DPYTHON_EXECUTABLE=python \
        -DEXECUTORCH_BUILD_XNNPACK=ON \
        -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
        -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
        -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
        -Bcmake-out-android/examples/models/llama \
        examples/models/llama
    
    cmake --build cmake-out-android/examples/models/llama -j16 --config Release
    

After, adb pushed .pte, tokenizer.json, llama_main to the device and executed with the command above.

Activity

kirklandsign

kirklandsign commented on May 14, 2025

@kirklandsign
Contributor

Hi @kkimmk

Please add -DSUPPORT_REGEX_LOOKAHEAD=ON

Something like

cmake  -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a \
    -DANDROID_PLATFORM=android-23 \
    -DCMAKE_INSTALL_PREFIX=cmake-out-android \
    -DCMAKE_BUILD_TYPE=Release \
    -DPYTHON_EXECUTABLE=python \
    -DSUPPORT_REGEX_LOOKAHEAD=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -Bcmake-out-android/examples/models/llama \
    examples/models/llama

kirklandsign

kirklandsign commented on May 14, 2025

@kirklandsign
Contributor
added
triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
on May 14, 2025
jackzhxng

jackzhxng commented on May 14, 2025

@jackzhxng
Contributor

Yup, let's do it

kkimmk

kkimmk commented on May 15, 2025

@kkimmk
Author

Adding -DSUPPORT_REGEX_LOOKAHEAD=ON solved the error. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @kkimmk@jackzhxng@kirklandsign

        Issue actions

          Running Qwen3 XNNPACK on Android fails while setting up pretokenizer · Issue #10867 · pytorch/executorch