-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
When writing performance-critical code it often leads to code duplication.
Let's say we wanted to make a method that applies an effect on an image, in our case we want to apply a gray-scale and an optional invert. The code could look like this:
public class Effect
{
public static void Apply(Bitmap bmp, GreyscaleMethod grayscaleMethod, bool invert)
{
// read bitmap data
int w = bmp.Width, h = bmp.Height;
var data = bmp.LockBits(new Rectangle(0, 0, w, h), ImageLockMode.ReadWrite, bmp.PixelFormat);
if (bmp.PixelFormat != PixelFormat.Format32bppArgb)
throw new InvalidOperationException($"Unsupported pixel format: {bmp.PixelFormat}");
var s = data.Stride;
unsafe
{
var ptr = (byte*)data.Scan0;
for (int y = 0; y < h; y++) {
for (int x = 0; x < w; x++) {
// read RGB (not quite optimized, but that's not the point)
int offset = y * s + x;
int r = ptr[offset + 1];
int g = ptr[offset + 2];
int b = ptr[offset + 3];
// apply effects per pixel
if (grayscaleMethod == GreyscaleMethod.Average) {
r = g = b = (r + g + b) / 3;
} else if (grayscaleMethod == GreyscaleMethod.Luminance) {
r = g = b = (int)(r * 0.2126 + g * 0.7152 + b * 0722);
}
if (invert) {
r = 255 - r;
g = 255 - g;
b = 255 - b;
}
// write RGB
ptr[offset + 1] = (byte)r;
ptr[offset + 2] = (byte)g;
ptr[offset + 3] = (byte)b;
}
}
}
bmp.UnlockBits(data);
}
}
public enum GreyscaleMethod
{
None,
Average,
Luminance,
}
However if we expect the invert to be only rarely used, that code is slower than it can be because of the constant if (invert)
check inside the performance-critical inner loop. We could of course create another method that gets called when invert
is false, but that leads to code duplication, is harder to maintain, etc.
What we would need to have both optimal performance and code reuse is a way to get the compiler to generate 2 methods at compile time depending on the value of invert
. Without any new syntax the code might look like this:
public class Effect
{
private static void Apply<invert>(Bitmap bmp, GreyscaleMethod grayscaleMethod)
where invert : Bool
{
// [...] read bitmap data
unsafe
{
var ptr = (byte*)data.Scan0;
for (int y = 0; y < h; y++) {
for (int x = 0; x < w; x++) {
// [...] read RGB
// apply effects per pixel
if (grayscaleMethod == GreyscaleMethod.Average) {
r = g = b = (r + g + b) / 3;
} else if (grayscaleMethod == GreyscaleMethod.Luminance) {
r = g = b = (int)(r * 0.2126 + g * 0.7152 + b * 0722);
}
if (typeof(invert) == typeof(True)) { // type check
r = 255 - r;
g = 255 - g;
b = 255 - b;
}
// [...] write RGB
}
}
}
bmp.UnlockBits(data);
}
}
public class False : Bool { }
public class True : Bool { }
public class Bool { }
Now that check if a compile-time constant, so the compiler could remove the type-condition and its block away when invert
is False
, and remove the type-condition but leave its block when True
, leading to performance optimal code in both cases without code duplication.
However does the compiler (or even the JIT) do that? According to this stackoverflow answer it currently does not.
This is a proposal to improve the compiler (or JIT) to do that sort of code inlining (through method duplication) for compile-time constant checks.
If this were implemented, we can optimize the code even further by doing the same with the grayscaleMethod
parameter:
public class Effect
{
private static void Apply<invert, greyscaleMethod>(Bitmap bmp)
where invert : Bool
where greyscaleMethod : GreyscaleMethodEnum
{
// [...] read bitmap data
unsafe
{
var ptr = (byte*)data.Scan0;
for (int y = 0; y < h; y++) {
for (int x = 0; x < w; x++) {
// [...] read RGB
// apply effects per pixel
if (typeof(greyscaleMethod) == typeof(GreyscaleMethod_Average)) {
r = g = b = (r + g + b) / 3;
} else if (typeof(greyscaleMethod) == typeof(GreyscaleMethod_Luminance)) {
r = g = b = (int)(r * 0.2126 + g * 0.7152 + b * 0722);
}
if (typeof(invert) == typeof(True)) {
r = 255 - r;
g = 255 - g;
b = 255 - b;
}
// [...] write RGB
}
}
}
bmp.UnlockBits(data);
}
}
public class GreyscaleMethod_None : GreyscaleMethodEnum { }
public class GreyscaleMethod_Average : GreyscaleMethodEnum { }
public class GreyscaleMethod_Luminance : GreyscaleMethodEnum { }
public class GreyscaleMethodEnum { }
Doing the same optimization through code duplication would require 6 methods, and the number would increase exponentially with the number of parameters. However the compiler would know to only generate the methods which are actually used in the code.