Tasos Vogiatzoglou asked a question on the microsoft.public.dotnet.framework.clr newsgroup, wondering why tail. call was a slow MSIL sequence on the current .NET 2.0 CLR. My reply, including analysis, is below.
I assume that if the tail. command is supported by the jit (I do not think that is supported) it will be used in rather rare conditions of fully trusted code and perhaps code that does not access the execution stack (via stacktrace or sth) .
I think it is not optimized because no mainstream language currently uses it. It certainly is implemented in so far as the stack does not grow when you use tail. call to jump to the start of a method.
514 ms (with tailcall) / 77 ms (without tailcall). I cannot understand this ... Can anyone provide any helpful insight ?The fact is, tail. call is not optimized via JIT to a jump currently. And when you try to debug all this under VS 2005, it lies to you about the code!
I started with this:
using System; class App { static double ArithmeticSum(int number, double result) { if (number == 0) return result; return ArithmeticSum(number - 1, number + result); } static void Main() { double result = 0; for (int i = 0; i < 10000; ++i) result = ArithmeticSum(10000, 1); Console.WriteLine(result); } }I disassembled with ildasm, and rewrote ArithmeticSum to use tail. call:
// ... IL_000c: ldarg.1 IL_000d: stloc.0 IL_000e: br.s exit // ... IL_0015: ldarg.1 IL_0016: add tail. call float64 App::ArithmeticSum(int32, float64) ret exit: ldloc.0 ret } // end of method App::ArithmeticSumI reassembled with ilasm /debug=OPT and, like you, I found that the tail-call version was much slower. So, I started "devenv /debugexe Test.exe", and stepped into the code. This is what VS says about the JIT compiled code in the disassembly window:
IL_0000: nop 00000000 push ebp 00000001 mov ebp,esp 00000003 push edi 00000004 push esi 00000005 push ebx 00000006 push eax 00000007 fld qword ptr [ebp+8] IL_0001: ldarg.0 0000000a test ecx,ecx 0000000c setne al 0000000f movzx eax,al IL_0009: ldloc.1 00000012 test eax,eax 00000014 jne 00000018 IL_000c: ldarg.1 00000016 jmp 00000039 IL_0010: ldarg.0 00000018 mov dword ptr [ebp-10h],ecx 0000001b fild dword ptr [ebp-10h] 0000001e faddp st(1),st 00000020 sub esp,8 00000023 fstp qword ptr [esp] 00000026 dec ecx 00000027 mov eax,dword ptr ds:[00923028h] 0000002d push 2 0000002f push 2 00000031 push 1 00000033 push eax 00000034 call 791B69B0 // <------- NOTE 00000039 pop ecx exit: ldloc.0 0000003a pop ebx 0000003b pop esi 0000003c pop edi 0000003d pop ebp 0000003e ret 8The important bit to note is the call to 791B69B0. Even in VS 2005 mixed mode debugging, it won't let you step into this code. When you try to step into it, the instruction pointer jumps back to the start of the method - effectively the call is implementing the tail call, but VS is "helpfully" hiding the details. (Vance Morrison at MSFT shares my annoyance with this "feature".) So, its time to crack open SOS: In immediate window:
.load sos !u 791B69B0And I got this:
Unmanaged code 791B69B0 F8 clc 791B69B1 ?? db ffh 791B69B2 ?? db ffh 791B69B3 FF0400 inc dword ptr [eax+eax] 791B69B6 0000 add byte ptr [eax],al 791B69B8 0100 add dword ptr [eax],eax 791B69BA 0000 add byte ptr [eax],al 791B69BC 0000 add byte ptr [eax],al 791B69BE 0C02 or al,2 791B69C0 1000 adc byte ptr [eax],alThere's something fishy going on here: this code isn't meaningfully executable! So, I looked up my dear friend App.ArithmeticSum:
!name2ee Test.exe App.ArithmeticSum Module: 00922c14 (Test.exe) Token: 0x06000001 MethodDesc: 00922fd8 Name: App.ArithmeticSum(Int32, Double) JITTED Code Address: 00de0100 !u 00de0100 Normal JIT generated code App.ArithmeticSum(Int32, Double) Begin 00de0100, size 41 >>> 00DE0100 55 push ebp 00DE0101 8BEC mov ebp,esp 00DE0103 57 push edi 00DE0104 56 push esi 00DE0105 53 push ebx 00DE0106 50 push eax 00DE0107 DD4508 fld qword ptr [ebp+8] 00DE010A 85C9 test ecx,ecx 00DE010C 0F95C0 setne al 00DE010F 0FB6C0 movzx eax,al 00DE0112 85C0 test eax,eax 00DE0114 7502 jne 00DE0118 00DE0116 EB21 jmp 00DE0139 00DE0118 894DF0 mov dword ptr [ebp-10h],ecx 00DE011B DB45F0 fild dword ptr [ebp-10h] 00DE011E DEC1 faddp st(1),st 00DE0120 83EC08 sub esp,8 00DE0123 DD1C24 fstp qword ptr [esp] 00DE0126 49 dec ecx 00DE0127 8B0528309200 mov eax,dword ptr ds:[00923028h] 00DE012D 6A02 push 2 00DE012F 6A02 push 2 00DE0131 6A01 push 1 00DE0133 50 push eax 00DE0134 E877691B79 call 79F96AB0 (JitHelp: CORINFO_HELP_TAILCALL) 00DE0139 59 pop ecx 00DE013A 5B pop ebx 00DE013B 5E pop esi 00DE013C 5F pop edi 00DE013D 5D pop ebp 00DE013E C20800 ret 8Here, note the LIE THAT VISUAL STUDIO TOLD!
call 79F96AB0 (JitHelp: CORINFO_HELP_TAILCALL)This address, 79F96AB0, is different from the one in VS's most excellent diassembly view, 791B69B0. A peek inside this method (which, as we can see from timings, must be quite expensive and is thus probably pretty complex):
!u 79F96AB0 Unmanaged code 79F96AB0 FF155012387A call dword ptr ds:[7A381250h] 79F96AB6 50 push eax 79F96AB7 51 push ecx 79F96AB8 52 push edx 79F96AB9 F6058C44397AFF test byte ptr ds:[7A39448Ch],0FFh 79F96AC0 7409 je 79F96ACB 79F96AC2 F740045F000000 test dword ptr [eax+4],5Fh 79F96AC9 7422 je 79F96AED 79F96ACB 68DDDDDDDD push 0DDDDDDDDh 79F96AD0 68CCCCCCCC push 0CCCCCCCChWonder what this is doing? A quick grep through the SSCLI 2.0 sources gives this line:
./inc/jithelpers.h: JITHELPER(CORINFO_HELP_TAILCALL, JIT_TailCall)Another grep for this JIT_TailCall gives this:
./vm/i386/jithelp.asm:PUBLIC JIT_TailCallA peek inside this file gives this information (excerpted):
call _GetThread ; eax = Thread* push eax ; Thread* ; save ArgumentRegisters push ecx push edx ExtraSpace = 12 ; pThread, ecx, edx ; For GC stress, we always need to trip for GC test _g_TailCanSkipTripForGC, 0FFh jz TripForGC ; Trip for GC only if necessary test dword ptr [eax+Thread_m_State], TS_CatchAtSafePoint_ASM jz NoTripForGC TripForGC: ; Create a MachState struct on the stack ; return address is already on the stack, but is separated from stack ; arguments by the extra arguments of JIT_TailCall. So we cant use it directly push 0DDDDDDDDh ; Esp on unwind. Not needed as we it is deduced from the target method push 0CCCCCCCChThis looks like an exact match for the disassembled code above, especially given the magic numbers pushed. So, if one is looking for reasons why tail. call is slow, one must look inside clr/src/vm/i386/jithelp.asm to see the work it is doing. Here are some reasons:
-
The JIT doesn't attempt in any way to optimize for the tail. call case, so it doesn't generate machine code which would be compatible with a simple JMP.
The JIT_TailCall must work for all possible calls, in the presence of exception propagation, and be tolerent of possible GCs.
No comments:
Post a Comment