diff doc/manual/callconvs/callconv_x64.tex @ 0:3e629dc19168

initial from svn dyncall-1745
author Daniel Adler
date Thu, 19 Mar 2015 22:24:28 +0100
parents
children 7ca46969e0ad
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/manual/callconvs/callconv_x64.tex	Thu Mar 19 22:24:28 2015 +0100
@@ -0,0 +1,239 @@
+%//////////////////////////////////////////////////////////////////////////////
+%
+% Copyright (c) 2007,2009 Daniel Adler <dadler@uni-goettingen.de>, 
+%                         Tassilo Philipp <tphilipp@potion-studios.com>
+%
+% Permission to use, copy, modify, and distribute this software for any
+% purpose with or without fee is hereby granted, provided that the above
+% copyright notice and this permission notice appear in all copies.
+%
+% THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+% WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+% MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+% ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+% WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+% ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+% OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+%
+%//////////////////////////////////////////////////////////////////////////////
+
+% ==================================================
+% x64
+% ==================================================
+\subsection{x64 Calling Convention}
+
+
+\paragraph{Overview}
+
+The x64 (64bit) architecture designed by AMD is based on Intel's x86 (32bit)
+architecture, supporting it natively. It is sometimes referred to as x86-64,
+AMD64, or, cloned by Intel, EM64T or Intel64.\\
+On this processor, a word is defined to be 16 bits in size, a dword 32 bits
+and a qword 64 bits. Note that this is due to historical reasons (terminology
+didn't change with the introduction of 32 and 64 bit processors).\\
+The x64 calling convention for MS Windows \cite{x64Win} differs from the
+SystemV x64 calling convention \cite{x64SysV} used by Linux/*BSD/...
+Note that this is not the only difference between these operating systems. The
+64 bit programming model in use by 64 bit windows is LLP64, meaning that the C
+types int and long remain 32 bits in size, whereas long long becomes 64 bits.
+Under Linux/*BSD/... it's LP64.\\
+\\
+Compared to the x86 architecture, the 64 bit versions of the registers are
+called rax, rbx, etc.. Furthermore, there are eight new general purpose
+registers r8-r15.
+
+
+
+\paragraph{\product{dyncall} support}
+
+\product{dyncall} supports the MS Windows and System V calling convention.\\
+\\
+
+
+
+\subsubsection{MS Windows}
+
+\paragraph{Registers and register usage}
+
+\begin{table}[h]
+\begin{tabular}{3 B}
+\hline
+Name                & Brief description\\
+\hline
+{\bf rax}           & scratch, return value\\
+{\bf rbx}           & permanent\\
+{\bf rcx}           & scratch, parameter 0 if integer or pointer\\
+{\bf rdx}           & scratch, parameter 1 if integer or pointer\\
+{\bf rdi}           & permanent\\
+{\bf rsi}           & permanent\\
+{\bf rbp}           & permanent, may be used ase frame pointer\\
+{\bf rsp}           & stack pointer\\
+{\bf r8-r9}         & scratch, parameter 2 and 3 if integer or pointer\\
+{\bf r10-r11}       & scratch, permanent if required by caller (used for syscall/sysret)\\
+{\bf r12-r15}       & permanent\\
+{\bf xmm0}          & scratch, floating point parameter 0, floating point return value\\
+{\bf xmm1-xmm3}     & scratch, floating point parameters 1-3\\
+{\bf xmm4-xmm5}     & scratch, permanent if required by caller\\
+{\bf xmm6-xmm15}    & permanent\\
+\hline
+\end{tabular}
+\caption{Register usage on x64 MS Windows platform}
+\end{table}
+
+\paragraph{Parameter passing}
+
+\begin{itemize}
+\item stack parameter order: right-to-left
+\item caller cleans up the stack
+\item first 4 integer/pointer parameters are passed via rcx, rdx, r8, r9 (from left to right), others are pushed on stack (there is a
+preserve area for the first 4)
+\item float and double parameters are passed via xmm0l-xmm3l
+\item first 4 parameters are passed via the correct register depending on the parameter type - with mixed float and int parameters,
+some registers are left out (e.g. first parameter ends up in rcx or xmm0, second in rdx or xmm1, etc.)
+\item parameters in registers are right justified
+\item parameters \textless\ 64bits are not zero extended - zero the upper bits contiaining garbage if needed (but they are always
+passed as a qword)
+\item parameters \textgreater\ 64 bit are passed by reference
+\item if callee takes address of a parameter, first 4 parameters must be dumped (to the reserved space on the stack) - for
+floating point parameters, value must be stored in integer AND floating point register
+\item caller cleans up the stack, not the callee (like cdecl)
+\item stack is always 16byte aligned - since return address is 64 bits in size, stacks with an odd number of parameters are
+already aligned
+\item ellipsis calls take floating point values in int and float registers (single precision floats are promoted to double precision
+as defined for ellipsis calls)
+\item if size of parameters \textgreater\ 1 page of memory (usually between 4k and 64k), chkstk must be called
+\end{itemize}
+
+
+\paragraph{Return values}
+
+\begin{itemize}
+\item return values of pointer or integral type (\textless=\ 64 bits) are returned via the rax register
+\item floating point types are returned via the xmm0 register
+\item for types \textgreater\ 64 bits, a secret first parameter with an address to the return value is passed
+\end{itemize}
+
+
+\paragraph{Stack layout}
+
+Stack frame is always 16-byte aligned. Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+\hhline{~-~~}
+                                  & \vdots                     &                                &                              \\
+\hhline{~=~~}
+local data                        &                            &                                & \mrrbrace{9}{caller's frame} \\
+\hhline{~-~~}
+\mrlbrace{7}{parameter area}      & \ldots                     & \mrrbrace{3}{stack parameters} &                              \\
+                                  & \ldots                     &                                &                              \\
+                                  & \ldots                     &                                &                              \\
+                                  & r9 or xmm3                 & \mrrbrace{4}{spill area}       &                              \\
+                                  & r8 or xmm2                 &                                &                              \\
+                                  & rdx or xmm1                &                                &                              \\
+                                  & rcx or xmm0                &                                &                              \\
+\hhline{~-~~}
+                                  & return address             &                                &                              \\
+\hhline{~=~~}
+local data                        &                            &                                & \mrrbrace{3}{current frame}  \\
+\hhline{~-~~}
+parameter area                    &                            &                                &                              \\
+\hhline{~-~~}
+                                  & \vdots                     &                                &                              \\
+\hhline{~-~~}
+\end{tabular}
+\caption{Stack layout on x64 Microsoft platform}
+\end{figure}
+
+
+
+\newpage
+
+\subsubsection{System V (Linux / *BSD / MacOS X)}
+
+\paragraph{Registers and register usage}
+
+\begin{table}[h]
+\begin{tabular}{3 B}
+\hline
+Name                & Brief description\\
+\hline
+{\bf rax}           & scratch, return value\\
+{\bf rbx}           & permanent\\
+{\bf rcx}           & scratch, parameter 3 if integer or pointer\\
+{\bf rdx}           & scratch, parameter 2 if integer or pointer, return value\\
+{\bf rdi}           & scratch, parameter 0 if integer or pointer\\
+{\bf rsi}           & scratch, parameter 1 if integer or pointer\\
+{\bf rbp}           & permanent, may be used ase frame pointer\\
+{\bf rsp}           & stack pointer\\
+{\bf r8-r9}         & scratch, parameter 4 and 5 if integer or pointer\\
+{\bf r10-r11}       & scratch\\
+{\bf r12-r15}       & permanent\\
+{\bf xmm0}          & scratch, floating point parameters 0, floating point return value\\
+{\bf xmm1-xmm7}     & scratch, floating point parameters 1-7\\
+{\bf xmm8-xmm15}    & scratch\\
+{\bf st0-st1}       & scratch, 16 byte floating point return value\\
+{\bf st2-st7}       & scratch\\
+\hline
+\end{tabular}
+\caption{Register usage on x64 System V (Linux/*BSD)}
+\end{table}
+
+\paragraph{Parameter passing}
+
+\begin{itemize}
+\item stack parameter order: right-to-left
+\item caller cleans up the stack
+\item first 6 integer/pointer parameters are passed via rdi, rsi, rdx, rcx, r8, r9
+\item first 8 floating point parameters \textless=\ 64 bits are passed via xmm0l-xmm7l
+\item parameters in registers are right justified
+\item parameters that are not passed via registers are pushed onto the stack
+\item parameters \textless\ 64bits are not zero extended - zero the upper bits contiaining garbage if needed (but they are always
+passed as a qword)
+\item integer/pointer parameters \textgreater\ 64 bit are passed via 2 registers
+\item if callee takes address of a parameter, number of used xmm registers is passed silently in al (passed number mustn't be
+exact but an upper bound on the number of used xmm registers)
+\item stack is always 16byte aligned - since return address is 64 bits in size, stacks with an odd number of parameters are
+already aligned
+\end{itemize}
+
+
+\paragraph{Return values}
+
+\begin{itemize}
+\item return values of pointer or integral type (\textless=\ 64 bits) are returned via the rax register
+\item floating point types are returned via the xmm0 register
+\item for types \textgreater\ 64 bits, a secret first parameter with an address to the return value is passed - the passed in address
+will be returned in rax
+\item floating point values \textgreater\ 64 bits are returned via st0 and st1
+\end{itemize}
+
+
+\paragraph{Stack layout}
+
+Stack frame is always 16-byte aligned. Note that there is no spill area.
+Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+\hhline{~-~~}
+                                  & \vdots                     &                                &                              \\
+\hhline{~=~~}
+local data                        &                            &                                & \mrrbrace{5}{caller's frame} \\
+\hhline{~-~~}
+\mrlbrace{3}{parameter area}      & \ldots                     & \mrrbrace{3}{stack parameters} &                              \\
+                                  & \ldots                     &                                &                              \\
+                                  & \ldots                     &                                &                              \\
+\hhline{~-~~}
+                                  & return address             &                                &                              \\
+\hhline{~=~~}
+local data                        &                            &                                & \mrrbrace{3}{current frame}  \\
+\hhline{~-~~}
+parameter area                    &                            &                                &                              \\
+\hhline{~-~~}
+                                  & \vdots                     &                                &                              \\
+\hhline{~-~~}
+\end{tabular}
+\caption{Stack layout on x64 System V (Linux/*BSD)}
+\end{figure}
+