Skip to content
This repository was archived by the owner on Apr 28, 2025. It is now read-only.

Add floorf16 and floorf128 #437

Merged
merged 3 commits into from
Jan 22, 2025
Merged

Conversation

tgross35
Copy link
Contributor

@tgross35 tgross35 commented Jan 13, 2025

Add a generic version of floor. Additionally, make use of this version to implement floor and floorf.

Similar to ceil, musl'f ceilf routine seems to work better for all float widths than the ceil algorithm. Trying with the ceil (f64) algorithm produced the following regressions:

icount::icount_bench_floor_group::icount_bench_floor logspace:setup_floor()
Performance has regressed: Instructions (14064 > 13171) regressed by +6.78005% (>+5.00000)
  Baselines:                      softfloat|softfloat
  Instructions:                       14064|13171                (+6.78005%) [+1.06780x]
  L1 Hits:                            16821|15802                (+6.44855%) [+1.06449x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               8|9                    (-11.1111%) [-1.12500x]
  Total read+write:                   16829|15811                (+6.43856%) [+1.06439x]
  Estimated Cycles:                   17101|16117                (+6.10535%) [+1.06105x]
icount::icount_bench_floorf128_group::icount_bench_floorf128 logspace:setup_floorf128()
  Baselines:                      softfloat|softfloat
  Instructions:                      166868|N/A                  (*********)
  L1 Hits:                           221429|N/A                  (*********)
  L2 Hits:                                1|N/A                  (*********)
  RAM Hits:                              34|N/A                  (*********)
  Total read+write:                  221464|N/A                  (*********)
  Estimated Cycles:                  222624|N/A                  (*********)
icount::icount_bench_floorf16_group::icount_bench_floorf16 logspace:setup_floorf16()
  Baselines:                      softfloat|softfloat
  Instructions:                      143029|N/A                  (*********)
  L1 Hits:                           176517|N/A                  (*********)
  L2 Hits:                                1|N/A                  (*********)
  RAM Hits:                              13|N/A                  (*********)
  Total read+write:                  176531|N/A                  (*********)
  Estimated Cycles:                  176977|N/A                  (*********)
icount::icount_bench_floorf_group::icount_bench_floorf logspace:setup_floorf()
Performance has regressed: Instructions (14732 > 10441) regressed by +41.0976% (>+5.00000)
  Baselines:                      softfloat|softfloat
  Instructions:                       14732|10441                (+41.0976%) [+1.41098x]
  L1 Hits:                            17616|13027                (+35.2268%) [+1.35227x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               8|6                    (+33.3333%) [+1.33333x]
  Total read+write:                   17624|13033                (+35.2260%) [+1.35226x]
  Estimated Cycles:                   17896|13237                (+35.1968%) [+1.35197x]

Add floorf16 and floorf128

Use the generic algorithms to provide implementations for these routines.

@tgross35 tgross35 force-pushed the generic-floor branch 2 times, most recently from 07973f0 to a8069e5 Compare January 22, 2025 08:26
Additionally, make use of this version to implement `floor` and
`floorf`.

Similar to `ceil`, musl'f `ceilf` routine seems to work better for all
float widths than the `ceil` algorithm. Trying with the `ceil` (`f64`)
algorithm produced the following regressions:

    icount::icount_bench_floor_group::icount_bench_floor logspace:setup_floor()
    Performance has regressed: Instructions (14064 > 13171) regressed by +6.78005% (>+5.00000)
      Baselines:                      softfloat|softfloat
      Instructions:                       14064|13171                (+6.78005%) [+1.06780x]
      L1 Hits:                            16821|15802                (+6.44855%) [+1.06449x]
      L2 Hits:                                0|0                    (No change)
      RAM Hits:                               8|9                    (-11.1111%) [-1.12500x]
      Total read+write:                   16829|15811                (+6.43856%) [+1.06439x]
      Estimated Cycles:                   17101|16117                (+6.10535%) [+1.06105x]
    icount::icount_bench_floorf128_group::icount_bench_floorf128 logspace:setup_floorf128()
      Baselines:                      softfloat|softfloat
      Instructions:                      166868|N/A                  (*********)
      L1 Hits:                           221429|N/A                  (*********)
      L2 Hits:                                1|N/A                  (*********)
      RAM Hits:                              34|N/A                  (*********)
      Total read+write:                  221464|N/A                  (*********)
      Estimated Cycles:                  222624|N/A                  (*********)
    icount::icount_bench_floorf16_group::icount_bench_floorf16 logspace:setup_floorf16()
      Baselines:                      softfloat|softfloat
      Instructions:                      143029|N/A                  (*********)
      L1 Hits:                           176517|N/A                  (*********)
      L2 Hits:                                1|N/A                  (*********)
      RAM Hits:                              13|N/A                  (*********)
      Total read+write:                  176531|N/A                  (*********)
      Estimated Cycles:                  176977|N/A                  (*********)
    icount::icount_bench_floorf_group::icount_bench_floorf logspace:setup_floorf()
    Performance has regressed: Instructions (14732 > 10441) regressed by +41.0976% (>+5.00000)
      Baselines:                      softfloat|softfloat
      Instructions:                       14732|10441                (+41.0976%) [+1.41098x]
      L1 Hits:                            17616|13027                (+35.2268%) [+1.35227x]
      L2 Hits:                                0|0                    (No change)
      RAM Hits:                               8|6                    (+33.3333%) [+1.33333x]
      Total read+write:                   17624|13033                (+35.2260%) [+1.35226x]
      Estimated Cycles:                   17896|13237                (+35.1968%) [+1.35197x]
Use the generic algorithms to provide implementations for these
routines.
@tgross35
Copy link
Contributor Author

icount shows improvements:

icount::icount_bench_floor_group::icount_bench_floor logspace:setup_floor()
  Baselines:                      softfloat|softfloat
  Instructions:                        9795|13171                (-25.6321%) [-1.34467x]
  L1 Hits:                            12288|15802                (-22.2377%) [-1.28597x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               7|9                    (-22.2222%) [-1.28571x]
  Total read+write:                   12295|15811                (-22.2377%) [-1.28597x]
  Estimated Cycles:                   12533|16117                (-22.2374%) [-1.28597x]
icount::icount_bench_floorf128_group::icount_bench_floorf128 logspace:setup_floorf128()
  Baselines:                      softfloat|softfloat
  Instructions:                       50273|N/A                  (*********)
  L1 Hits:                            68669|N/A                  (*********)
  L2 Hits:                                1|N/A                  (*********)
  RAM Hits:                              25|N/A                  (*********)
  Total read+write:                   68695|N/A                  (*********)
  Estimated Cycles:                   69549|N/A                  (*********)
icount::icount_bench_floorf16_group::icount_bench_floorf16 logspace:setup_floorf16()
  Baselines:                      softfloat|softfloat
  Instructions:                       37593|N/A                  (*********)
  L1 Hits:                            47888|N/A                  (*********)
  L2 Hits:                                1|N/A                  (*********)
  RAM Hits:                              12|N/A                  (*********)
  Total read+write:                   47901|N/A                  (*********)
  Estimated Cycles:                   48313|N/A                  (*********)
icount::icount_bench_floorf_group::icount_bench_floorf logspace:setup_floorf()
  Baselines:                      softfloat|softfloat
  Instructions:                       10395|10441                (-0.44057%) [-1.00443x]
  L1 Hits:                            12980|13027                (-0.36079%) [-1.00362x]
  L2 Hits:                                2|0                    (+++inf+++) [+++inf+++]
  RAM Hits:                               5|6                    (-16.6667%) [-1.20000x]
  Total read+write:                   12987|13033                (-0.35295%) [-1.00354x]
  Estimated Cycles:                   13165|13237                (-0.54393%) [-1.00547x]

@tgross35 tgross35 enabled auto-merge January 22, 2025 08:55
@tgross35 tgross35 merged commit 5e9ded4 into rust-lang:master Jan 22, 2025
35 checks passed
@tgross35 tgross35 deleted the generic-floor branch January 22, 2025 11:02
tgross35 added a commit that referenced this pull request Apr 18, 2025
Add `floorf16` and `floorf128`
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant