ggml: allow casting between f32 and i32 #15783

ngxson · 2025-09-04T02:33:16Z

Motivation:

When working with kyutai-mimi, at some points, I realized that I cannot do further calculation from the output of ggml_argmax or ggml_top_k due to missing an op to convert i32 --> f32
Allowing Llama 4 attn_scale to be calculated on cgraph (just pointing this out, but probably we don't need to change this code)
This discussion: model : add GroveMoE support #15510 (comment) (cc @CISC)
Maybe useful for future models

Note: casting from f32 --> i32 will discard the fractional part

Planned to implement it on these backends:

CPU
Metal
CUDA
Vulkan

test-backend-ops:

  CPY(type_src=f32,type_dst=i32,ne=[256,2,3,4],permute_src=[0,0,0,0],permute_dst=[0,0,0,0]): OK
  CPY(type_src=f32,type_dst=i32,ne=[256,2,3,4],permute_src=[1,0,2,3],permute_dst=[0,0,0,0]): OK
  CPY(type_src=i32,type_dst=f32,ne=[256,2,3,4],permute_src=[0,0,0,0],permute_dst=[0,0,0,0]): OK
  CPY(type_src=i32,type_dst=f32,ne=[256,2,3,4],permute_src=[1,0,2,3],permute_dst=[0,0,0,0]): OK
  11841/11841 tests passed
  Backend Metal: OK

  CPY(type_src=f32,type_dst=i32,ne=[256,2,3,4],permute_src=[0,0,0,0],permute_dst=[0,0,0,0]): OK
  CPY(type_src=f32,type_dst=i32,ne=[256,2,3,4],permute_src=[1,0,2,3],permute_dst=[0,0,0,0]): OK
  CPY(type_src=i32,type_dst=f32,ne=[256,2,3,4],permute_src=[0,0,0,0],permute_dst=[0,0,0,0]): OK
  CPY(type_src=i32,type_dst=f32,ne=[256,2,3,4],permute_src=[1,0,2,3],permute_dst=[0,0,0,0]): OK
  11841/11841 tests passed
  Backend CUDA0: OK

  CPY(type_src=f32,type_dst=i32,ne=[256,2,3,4],permute_src=[0,0,0,0],permute_dst=[0,0,0,0]): OK
  CPY(type_src=f32,type_dst=i32,ne=[256,2,3,4],permute_src=[1,0,2,3],permute_dst=[0,0,0,0]): OK
  CPY(type_src=i32,type_dst=f32,ne=[256,2,3,4],permute_src=[0,0,0,0],permute_dst=[0,0,0,0]): OK
  CPY(type_src=i32,type_dst=f32,ne=[256,2,3,4],permute_src=[1,0,2,3],permute_dst=[0,0,0,0]): OK
  11841/11841 tests passed
  Backend Vulkan0: OK

slaren · 2025-09-04T16:38:41Z

Note: casting from f32 --> i32 will be equivalent to floor()

For C/C++, the behavior of float to int cast is to discard the fractional part, truncating the value towards zero. For negative values, this is not the same as floor().

ngxson · 2025-09-04T16:50:58Z

@slaren thanks, I've updated the test and comment to reflect this. according to the test, the behavior is currently the same on all backends

CISC

A test that actually verifies that the cast produces the intended values (ie, 1.5->1, 1->1.0. etc) would be nice I guess.

ngxson · 2025-09-05T13:18:29Z

Yes that would be nice, and also be useful for many other ops. It can also act as examples for how to use certain ops. However, we need to adapt the code of test-backend-ops to support this, which can be quite complicated.

CISC · 2025-09-05T13:24:08Z

Yes that would be nice, and also be useful for many other ops. It can also act as examples for how to use certain ops. However, we need to adapt the code of test-backend-ops to support this, which can be quite complicated.

Yeah, figured as much, another thing on the collective consciousness might-TODO-list. :)

ggerganov · 2025-09-05T13:24:51Z

Yes that would be nice, and also be useful for many other ops. It can also act as examples for how to use certain ops. However, we need to adapt the code of test-backend-ops to support this, which can be quite complicated.

I think we can implement this by setting the suitable values in initialize_tensors. For example, setting setting values of -0.5 will cover that all backends are truncating towards zero.

ngxson · 2025-09-05T13:41:50Z

I think we can implement this by setting the suitable values in initialize_tensors. For example, setting setting values of -0.5 will cover that all backends are truncating towards zero.

Yes that's kinda what I'm doing, I set the range to [-150.0, 150.0] (basically copy the same code from test_gelu). So, we should randomly have negative values, which confirms the consistent behavior on all backends.

0cc4m

The Vulkan change looks good,

ggerganov · 2025-09-07T17:45:28Z

ggml/src/ggml-cpu/ops.cpp

+            } else if (dst->type == GGML_TYPE_I32) {
+                size_t id = 0;
+                int32_t * dst_ptr = (int32_t *) dst->data;
+
+                for (int i03 = 0; i03 < ne03; i03++) {
+                    for (int i02 = 0; i02 < ne02; i02++) {
+                        id += ne00 * ir0;
+                        for (int i01 = ir0; i01 < ir1; i01++) {
+                            for (int i00 = 0; i00 < ne00; i00++) {
+                                const float * src0_ptr = (float *) ((char *) src0->data + i00*nb00 + i01*nb01 + i02*nb02 + i03*nb03);
+
+                                dst_ptr[id] = *src0_ptr;
+                                id++;
+                            }
+                        }
+                        id += ne00 * (ne01 - ir1);
+                    }
+                }


Should we merge this into the F32 branch above?

Yes and indeed I also want to migrate some of these codes into template function. WDYT?

Yes, refactoring this code is welcome.

I'll merge this PR as-is and will open another PR to refactor this code

ggml: allow casting between f32 and i32

f3b489b

github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Sep 4, 2025

ngxson added 4 commits September 4, 2025 12:59

fix cuda

d4f78de

add vulkan

4f57dda

fix CPU non-cont

fee65fa

add non-cont test case

01af2c2

github-actions bot added the Vulkan Issues specific to the Vulkan backend label Sep 4, 2025

add note

60e8f20

ngxson marked this pull request as ready for review September 4, 2025 16:29

ngxson requested a review from 0cc4m as a code owner September 4, 2025 16:29

ngxson requested review from CISC, ggerganov and slaren September 4, 2025 16:29

ngxson added 3 commits September 4, 2025 23:46

extend test number range

9f5d5cd

correct note

0a2b23c

add cont version for vulkan

65b19a0

CISC approved these changes Sep 4, 2025

View reviewed changes

0cc4m approved these changes Sep 7, 2025

View reviewed changes

ggerganov approved these changes Sep 7, 2025

View reviewed changes

ngxson merged commit 9fcb29f into ggml-org:master Sep 8, 2025
48 checks passed

CISC mentioned this pull request Sep 17, 2025

cuda : add missing F32<->I32 entries in ggml_cuda_cpy_fn #16060

Merged

ngxson mentioned this pull request Sep 18, 2025

ggml : refactor forward_dup for cpu backend #16062

Merged

ggml: allow casting between f32 and i32 #15783

ggml: allow casting between f32 and i32 #15783

Uh oh!

Conversation

ngxson commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Sep 4, 2025

Uh oh!

ngxson commented Sep 4, 2025

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

ngxson commented Sep 5, 2025

Uh oh!

CISC commented Sep 5, 2025

Uh oh!

ggerganov commented Sep 5, 2025

Uh oh!

ngxson commented Sep 5, 2025

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

ggerganov Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ngxson commented Sep 4, 2025 •

edited

Loading