From: Ken Thomases Subject: ntdll: Optimize NtCurrentTeb() for Mac 64-bit. Message-Id: Date: Fri, 5 Jun 2015 01:18:54 -0500 It's called a lot and is already slower than other platforms because it's not inlined. No need for it to be slower than necessary. --- dlls/ntdll/signal_x86_64.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/dlls/ntdll/signal_x86_64.c b/dlls/ntdll/signal_x86_64.c index af8c88d..9aaa7c5 100644 --- a/dlls/ntdll/signal_x86_64.c +++ b/dlls/ntdll/signal_x86_64.c @@ -3700,7 +3700,14 @@ __ASM_STDCALL_FUNC( DbgUserBreakPoint, 0, "int $3; ret") #ifdef __APPLE__ TEB * WINAPI NtCurrentTeb(void) { - return pthread_getspecific( teb_key ); + TEB *ret; + /* Because of the different calling conventions of this function vs. + pthread_getspecific(), the compiler would generate code to save off and + restore %rsi and %xmm6 through %xmm15 if we were to call it normally. We + happen to know that pthread_getspecific() only touches %rax. Write the + call ourselves to avoid the extra work. */ + __asm__ ( "call " __ASM_NAME("pthread_getspecific") : "=a" (ret) : "D" (teb_key) ); + return ret; } #else __ASM_STDCALL_FUNC( NtCurrentTeb, 0, ".byte 0x65\n\tmovq 0x30,%rax\n\tret" )