I notice that people often use construction like this:
They think that having a call to the Array.Length on each iteration will make CLR to take more time to execute the code. To avoid it they store the length value in a local variable.
Let’s find out (once and for all !) if this is a viable thing or using a temporary variable is a waste of time.
To start, let’s examine these C# methods:
Here is how it looks after been processed by the JIT compiler (for .NET Framework 4.7.2 under LegacyJIT-x86):
It’s trivial to notice that they have the exact same number of assembler instructions — 15. Even the logic of these instructions is almost the same. There’s a slight difference in the order of initializing variables and comparisons on whether the cycle should continue. We can note that in both cases the array length is registered two times before the cycle:
It turns out that both methods will compile into the exact same code, but the first one is written faster, even though there isn’t any benefit in terms of execution time.
The assembler code above led me to some thoughts and I decided to check a couple more methods:
Now the current element and array length are being added up, but in the first case the array length is being requested every time, and in the second case it’s saved once into a local variable. Let’s look at the assembler code of these methods:
Once again, the number of instructions are the same, as well as (almost) the instructions themselves. The only difference is the order of initializing variables and the check condition for continuation of the cycle. You can note that in the calculation of sum, only first length of array is taken into account. It’s obvious that this:
in all four methods is an inlined array bounds checking and it’s executed for each element of the array.
We can already make the first conclusion: using an extra variable to try to speed up the cycle is a waste of time, since the compiler will do it for you anyway. The only reason to store a length array into a variable is to make the code more readable.
ForEach is another situation entirely. Consider the following three methods:
And here’s the code after JIT:
The first thing that comes to mind is that it takes less assembler instructions than the for cycle (for example, for simple element summation it took 12 instructions in foreach, but 15 in for).
Overall, here are results of for vs foreach benchmark for 1 million-element arrays:
And for
ForEach walks through the array a lot quicker than for. Why? To find out, we need to compare the code after JIT:
Let’s look at ForEachWithoutLength. The array length is requested only once and there aren’t any checks for the array boundaries. That happens because the ForEach cycle first restricts changing the collection inside the cycle, and second one won’t ever go outside the collection. Due to that, JIT can afford to remove the checks array boundaries.
Now let’s look carefully at ForEachWithLengthWIthoutLocalVariable. There’s only one strange part, where sum+=length happens not to previously saved local variable arrayLength, but to a new one that the app requests from memory each time. That means, there will be N+1 memory requests for array length, where N is an array length.
And now we come to ForEachWithLengthWithLocalVariable. The code there is exactly the same as in the previous example, except the handling of the array length. The compiler once again generated a local variable arrayLength that’s used to check if the array is empty, but the compiler still honestly saved our stated local variable length, and that’s what’s used in the summation inside the cycle. It turns out that this method requests the array length from memory only twice. The difference is very hard to notice in the real world.
In all cases, assembler code turned out so simple because the methods themselves are simple. If the methods had more parameters, it would have to work with the stack, variables might get stores outside of registers, there would’ve been more checks, but the main logic would remains the same: introducing a local variable for array length is only useful for making more readable code. It also turned out that Foreach often walks through the array faster than For.
var length = array.Length;
for (int i = 0; i < length; i++) {
//do smth
}
They think that having a call to the Array.Length on each iteration will make CLR to take more time to execute the code. To avoid it they store the length value in a local variable.
Let’s find out (once and for all !) if this is a viable thing or using a temporary variable is a waste of time.
To start, let’s examine these C# methods:
public int WithoutVariable() {
int sum = 0;
for (int i = 0; i < array.Length; i++) {
sum += array[i];
}
return sum;
}
public int WithVariable() {
int sum = 0;
int length = array.Length;
for (int i = 0; i < length; i++) {
sum += array[i];
}
return sum;
}
Here is how it looks after been processed by the JIT compiler (for .NET Framework 4.7.2 under LegacyJIT-x86):
WithoutVariable() ;int sum = 0; xor edi, edi ;int i = 0; xor esi, esi ;int[] localRefToArray = this.array; mov edx, dword ptr [ecx+4] ;int arrayLength = localRefToArray.Length; mov ecx, dword ptr [edx+4] ;if (arrayLength == 0) return sum; test ecx, ecx jle exit ;int arrayLength2 = localRefToArray.Length; mov eax, dword ptr [edx+4] ;if (i >= arrayLength2) ; throw new IndexOutOfRangeException(); loop: cmp esi, eax jae 056e2d31 ;sum += localRefToArray[i]; add edi, dword ptr [edx+esi*4+8] ;i++; inc esi ;if (i < arrayLength) goto loop cmp ecx, esi jg loop ;return sum; exit: mov eax, edi |
WithVariable() ;int sum = 0; xor esi, esi ;int[] localRefToArray = this.array; mov edx, dword ptr [ecx+4] ;int arrayLength = localRefToArray.Length; mov edi, dword ptr [edx+4] ;int i = 0; xor eax, eax ;if (arrayLength == 0) return sum; test edi, edi jle exit ;int arrayLength2 = localRefToArray.Length; mov ecx, dword ptr [edx+4] ;if (i >= arrayLength2) ; throw new IndexOutOfRangeException(); loop: cmp eax, ecx jae 05902d31 ;sum += localRefToArray[i]; add esi, dword ptr [edx+eax*4+8] ;i++; inc eax ;if (i < arrayLength) goto loop cmp eax, edi jl loop ;return sum; exit: mov eax, esi |
Comparison in Meld:
It’s trivial to notice that they have the exact same number of assembler instructions — 15. Even the logic of these instructions is almost the same. There’s a slight difference in the order of initializing variables and comparisons on whether the cycle should continue. We can note that in both cases the array length is registered two times before the cycle:
- To check for 0 (arrayLength)
- Into the temporary variable for checking the cycle condition (arrayLength2).
It turns out that both methods will compile into the exact same code, but the first one is written faster, even though there isn’t any benefit in terms of execution time.
The assembler code above led me to some thoughts and I decided to check a couple more methods:
public int WithoutVariable() {
int sum = 0;
for(int i = 0; i < array.Length; i++) {
sum += array[i] + array.Length;
}
return sum;
}
public int WithVariable() {
int sum = 0;
int length = array.Length;
for(int i = 0; i < length; i++) {
sum += array[i] + length;
}
return sum;
}
Now the current element and array length are being added up, but in the first case the array length is being requested every time, and in the second case it’s saved once into a local variable. Let’s look at the assembler code of these methods:
WithoutVariable() int sum = 0; xor edi, edi int i = 0; xor esi, esi int[] localRefToArray = this.array; mov edx, dword ptr [ecx+4] int arrayLength = localRefToArray.Length; mov ebx, dword ptr [edx+4] if (arrayLength == 0) return sum; test ebx, ebx jle exit int arrayLength2 = localRefToArray.Length; mov ecx, dword ptr [edx+4] if (i >= arrayLength2) throw new IndexOutOfRangeException(); loop: cmp esi, ecx jae 05562d39 int t = array[i]; mov eax, dword ptr [edx+esi*4+8] t += sum; add eax, edi t+= arrayLength; add eax, ebx sum = t; mov edi, eax i++; inc esi if (i < arrayLength) goto loop cmp ebx, esi jg loop return sum; exit: mov eax, edi |
WithVariable() int sum = 0; xor esi, esi int[] localRefToArray = this.array; mov edx, dword ptr [ecx+4] int arrayLength = localRefToArray.Length; mov ebx, dword ptr [edx+4] int i = 0; xor ecx, ecx if (arrayLength == 0) (return sum;) test ebx, ebx jle exit int arrayLength2 = localRefToArray.Length; mov edi, dword ptr [edx+4] if (i >= arrayLength2) throw new IndexOutOfRangeException(); loop: cmp ecx, edi jae 04b12d39 int t = array[i]; mov eax, dword ptr [edx+ecx*4+8] t += sum; add eax, esi t+= arrayLength; add eax, ebx sum = t; mov esi, eax i++; inc ecx if (i < arrayLength) goto loop cmp ecx, ebx jl loop return sum; exit: mov eax, esi |
Comparison in Meld:
Once again, the number of instructions are the same, as well as (almost) the instructions themselves. The only difference is the order of initializing variables and the check condition for continuation of the cycle. You can note that in the calculation of sum, only first length of array is taken into account. It’s obvious that this:
int arrayLength2 = localRefToArray.Length;
mov edi, dword ptr [edx+4]
if (i >=arrayLength2) throw new IndexOutOfRangeException();
cmp ecx, edi
jae 04b12d39
in all four methods is an inlined array bounds checking and it’s executed for each element of the array.
We can already make the first conclusion: using an extra variable to try to speed up the cycle is a waste of time, since the compiler will do it for you anyway. The only reason to store a length array into a variable is to make the code more readable.
ForEach is another situation entirely. Consider the following three methods:
public int ForEachWithoutLength() {
int sum = 0;
foreach (int i in array) {
sum += i;
}
return sum;
}
public int ForEachWithLengthWithoutLocalVariable() {
int sum = 0;
foreach (int i in array) {
sum += i + array.Length;
}
return sum;
}
public int ForEachWithLengthWithLocalVariable() {
int sum = 0;
int length = array.Length;
foreach (int i in array) {
sum += i + length;
}
return sum;
}
And here’s the code after JIT:
ForEachWithoutLength()
;int sum = 0;
xor esi, esi
;int[] localRefToArray = this.array;
mov ecx, dword ptr [ecx+4]
;int i = 0;
xor edx, edx
;int arrayLength = localRefToArray.Length;
mov edi, dword ptr [ecx+4]
;if (arrayLength == 0) goto exit;
test edi, edi
jle exit
;int t = array[i];
loop:
mov eax, dword ptr [ecx+edx*4+8]
;sum+=i;
add esi, eax
;i++;
inc edx
;if (i < arrayLength) goto loop
cmp edi, edx
jg loop
;return sum;
exit:
mov eax, esi
xor esi, esi
;int[] localRefToArray = this.array;
mov ecx, dword ptr [ecx+4]
;int i = 0;
xor edx, edx
;int arrayLength = localRefToArray.Length;
mov edi, dword ptr [ecx+4]
;if (arrayLength == 0) goto exit;
test edi, edi
jle exit
;int t = array[i];
loop:
mov eax, dword ptr [ecx+edx*4+8]
;sum+=i;
add esi, eax
;i++;
inc edx
;if (i < arrayLength) goto loop
cmp edi, edx
jg loop
;return sum;
exit:
mov eax, esi
ForEachWithLengthWithoutLocalVariable()
;int sum = 0;
xor esi, esi
;int[] localRefToArray = this.array;
mov ecx, dword ptr [ecx+4]
;int i = 0;
xor edx, edx
;int arrayLength = localRefToArray.Length;
mov edi, dword ptr [ecx+4]
;if (arrayLength == 0) goto exit
test edi, edi
jle exit
;int t = array[i];
loop:
mov eax, dword ptr [ecx+edx*4+8]
;sum+=i;
add esi, eax
;sum+=localRefToArray.Length;
add esi, dword ptr [ecx+4]
;i++;
inc edx
;if (i < arrayLength) goto loop
cmp edi, edx
jg loop
;return sum;
exit:
mov eax, esi
xor esi, esi
;int[] localRefToArray = this.array;
mov ecx, dword ptr [ecx+4]
;int i = 0;
xor edx, edx
;int arrayLength = localRefToArray.Length;
mov edi, dword ptr [ecx+4]
;if (arrayLength == 0) goto exit
test edi, edi
jle exit
;int t = array[i];
loop:
mov eax, dword ptr [ecx+edx*4+8]
;sum+=i;
add esi, eax
;sum+=localRefToArray.Length;
add esi, dword ptr [ecx+4]
;i++;
inc edx
;if (i < arrayLength) goto loop
cmp edi, edx
jg loop
;return sum;
exit:
mov eax, esi
ForEachWithLengthWithLocalVariable()
;int sum = 0;
xor esi, esi
;int[] localRefToArray = this.array;
mov edx, dword ptr [ecx+4]
;int length = localRefToArray.Length;
mov ebx, dword ptr [edx+4]
;int i = 0;
xor ecx, ecx
;int arrayLength = localRefToArray.Length;
mov edi, dword ptr [edx+4]
;if (arrayLength == 0) goto exit;
test edi, edi
jle exit
;int t = array[i];
loop:
mov eax, dword ptr [edx+ecx*4+8]
;sum+=i;
add esi, eax
;sum+=length ;
add esi, ebx
;i++;
inc ecx
;if (i < arrayLength) goto loop
cmp edi, ecx
jg loop
;return sum;
exit:
mov eax, esi
xor esi, esi
;int[] localRefToArray = this.array;
mov edx, dword ptr [ecx+4]
;int length = localRefToArray.Length;
mov ebx, dword ptr [edx+4]
;int i = 0;
xor ecx, ecx
;int arrayLength = localRefToArray.Length;
mov edi, dword ptr [edx+4]
;if (arrayLength == 0) goto exit;
test edi, edi
jle exit
;int t = array[i];
loop:
mov eax, dword ptr [edx+ecx*4+8]
;sum+=i;
add esi, eax
;sum+=length ;
add esi, ebx
;i++;
inc ecx
;if (i < arrayLength) goto loop
cmp edi, ecx
jg loop
;return sum;
exit:
mov eax, esi
The first thing that comes to mind is that it takes less assembler instructions than the for cycle (for example, for simple element summation it took 12 instructions in foreach, but 15 in for).
Comparison
Overall, here are results of for vs foreach benchmark for 1 million-element arrays:
sum+=array[i];
Method |
ItemsCount |
Mean |
Error |
StdDev |
Median |
Ratio |
RatioSD |
ForEach |
1000000 |
1.401 ms |
0.2691 ms |
0.7935 ms |
1.694 ms |
1.00 |
0.00 |
For |
1000000 |
1.586 ms |
0.3204 ms |
0.9447 ms |
1.740 ms |
1.23 |
0.65 |
sum+=array[i] + array.Length;
Method |
ItemsCount |
Mean |
Error |
StdDev |
Median |
Ratio |
RatioSD |
ForEach |
1000000 |
1.703 ms |
0.3010 ms |
0.8874 ms |
1.726 ms |
1.00 |
0.00 |
For |
1000000 |
1.715 ms |
0.2859 ms |
0.8430 ms |
1.956 ms |
1.13 |
0.56 |
ForEach walks through the array a lot quicker than for. Why? To find out, we need to compare the code after JIT:
Comparison of all three foreach options
Let’s look at ForEachWithoutLength. The array length is requested only once and there aren’t any checks for the array boundaries. That happens because the ForEach cycle first restricts changing the collection inside the cycle, and second one won’t ever go outside the collection. Due to that, JIT can afford to remove the checks array boundaries.
Now let’s look carefully at ForEachWithLengthWIthoutLocalVariable. There’s only one strange part, where sum+=length happens not to previously saved local variable arrayLength, but to a new one that the app requests from memory each time. That means, there will be N+1 memory requests for array length, where N is an array length.
And now we come to ForEachWithLengthWithLocalVariable. The code there is exactly the same as in the previous example, except the handling of the array length. The compiler once again generated a local variable arrayLength that’s used to check if the array is empty, but the compiler still honestly saved our stated local variable length, and that’s what’s used in the summation inside the cycle. It turns out that this method requests the array length from memory only twice. The difference is very hard to notice in the real world.
In all cases, assembler code turned out so simple because the methods themselves are simple. If the methods had more parameters, it would have to work with the stack, variables might get stores outside of registers, there would’ve been more checks, but the main logic would remains the same: introducing a local variable for array length is only useful for making more readable code. It also turned out that Foreach often walks through the array faster than For.