Mercurial > pub > dyncall > dyncall
comparison doc/manual/callconvs/callconv_x64.tex @ 0:3e629dc19168
initial from svn dyncall-1745
author | Daniel Adler |
---|---|
date | Thu, 19 Mar 2015 22:24:28 +0100 |
parents | |
children | 7ca46969e0ad |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:3e629dc19168 |
---|---|
1 %////////////////////////////////////////////////////////////////////////////// | |
2 % | |
3 % Copyright (c) 2007,2009 Daniel Adler <dadler@uni-goettingen.de>, | |
4 % Tassilo Philipp <tphilipp@potion-studios.com> | |
5 % | |
6 % Permission to use, copy, modify, and distribute this software for any | |
7 % purpose with or without fee is hereby granted, provided that the above | |
8 % copyright notice and this permission notice appear in all copies. | |
9 % | |
10 % THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES | |
11 % WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF | |
12 % MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR | |
13 % ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES | |
14 % WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN | |
15 % ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF | |
16 % OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. | |
17 % | |
18 %////////////////////////////////////////////////////////////////////////////// | |
19 | |
20 % ================================================== | |
21 % x64 | |
22 % ================================================== | |
23 \subsection{x64 Calling Convention} | |
24 | |
25 | |
26 \paragraph{Overview} | |
27 | |
28 The x64 (64bit) architecture designed by AMD is based on Intel's x86 (32bit) | |
29 architecture, supporting it natively. It is sometimes referred to as x86-64, | |
30 AMD64, or, cloned by Intel, EM64T or Intel64.\\ | |
31 On this processor, a word is defined to be 16 bits in size, a dword 32 bits | |
32 and a qword 64 bits. Note that this is due to historical reasons (terminology | |
33 didn't change with the introduction of 32 and 64 bit processors).\\ | |
34 The x64 calling convention for MS Windows \cite{x64Win} differs from the | |
35 SystemV x64 calling convention \cite{x64SysV} used by Linux/*BSD/... | |
36 Note that this is not the only difference between these operating systems. The | |
37 64 bit programming model in use by 64 bit windows is LLP64, meaning that the C | |
38 types int and long remain 32 bits in size, whereas long long becomes 64 bits. | |
39 Under Linux/*BSD/... it's LP64.\\ | |
40 \\ | |
41 Compared to the x86 architecture, the 64 bit versions of the registers are | |
42 called rax, rbx, etc.. Furthermore, there are eight new general purpose | |
43 registers r8-r15. | |
44 | |
45 | |
46 | |
47 \paragraph{\product{dyncall} support} | |
48 | |
49 \product{dyncall} supports the MS Windows and System V calling convention.\\ | |
50 \\ | |
51 | |
52 | |
53 | |
54 \subsubsection{MS Windows} | |
55 | |
56 \paragraph{Registers and register usage} | |
57 | |
58 \begin{table}[h] | |
59 \begin{tabular}{3 B} | |
60 \hline | |
61 Name & Brief description\\ | |
62 \hline | |
63 {\bf rax} & scratch, return value\\ | |
64 {\bf rbx} & permanent\\ | |
65 {\bf rcx} & scratch, parameter 0 if integer or pointer\\ | |
66 {\bf rdx} & scratch, parameter 1 if integer or pointer\\ | |
67 {\bf rdi} & permanent\\ | |
68 {\bf rsi} & permanent\\ | |
69 {\bf rbp} & permanent, may be used ase frame pointer\\ | |
70 {\bf rsp} & stack pointer\\ | |
71 {\bf r8-r9} & scratch, parameter 2 and 3 if integer or pointer\\ | |
72 {\bf r10-r11} & scratch, permanent if required by caller (used for syscall/sysret)\\ | |
73 {\bf r12-r15} & permanent\\ | |
74 {\bf xmm0} & scratch, floating point parameter 0, floating point return value\\ | |
75 {\bf xmm1-xmm3} & scratch, floating point parameters 1-3\\ | |
76 {\bf xmm4-xmm5} & scratch, permanent if required by caller\\ | |
77 {\bf xmm6-xmm15} & permanent\\ | |
78 \hline | |
79 \end{tabular} | |
80 \caption{Register usage on x64 MS Windows platform} | |
81 \end{table} | |
82 | |
83 \paragraph{Parameter passing} | |
84 | |
85 \begin{itemize} | |
86 \item stack parameter order: right-to-left | |
87 \item caller cleans up the stack | |
88 \item first 4 integer/pointer parameters are passed via rcx, rdx, r8, r9 (from left to right), others are pushed on stack (there is a | |
89 preserve area for the first 4) | |
90 \item float and double parameters are passed via xmm0l-xmm3l | |
91 \item first 4 parameters are passed via the correct register depending on the parameter type - with mixed float and int parameters, | |
92 some registers are left out (e.g. first parameter ends up in rcx or xmm0, second in rdx or xmm1, etc.) | |
93 \item parameters in registers are right justified | |
94 \item parameters \textless\ 64bits are not zero extended - zero the upper bits contiaining garbage if needed (but they are always | |
95 passed as a qword) | |
96 \item parameters \textgreater\ 64 bit are passed by reference | |
97 \item if callee takes address of a parameter, first 4 parameters must be dumped (to the reserved space on the stack) - for | |
98 floating point parameters, value must be stored in integer AND floating point register | |
99 \item caller cleans up the stack, not the callee (like cdecl) | |
100 \item stack is always 16byte aligned - since return address is 64 bits in size, stacks with an odd number of parameters are | |
101 already aligned | |
102 \item ellipsis calls take floating point values in int and float registers (single precision floats are promoted to double precision | |
103 as defined for ellipsis calls) | |
104 \item if size of parameters \textgreater\ 1 page of memory (usually between 4k and 64k), chkstk must be called | |
105 \end{itemize} | |
106 | |
107 | |
108 \paragraph{Return values} | |
109 | |
110 \begin{itemize} | |
111 \item return values of pointer or integral type (\textless=\ 64 bits) are returned via the rax register | |
112 \item floating point types are returned via the xmm0 register | |
113 \item for types \textgreater\ 64 bits, a secret first parameter with an address to the return value is passed | |
114 \end{itemize} | |
115 | |
116 | |
117 \paragraph{Stack layout} | |
118 | |
119 Stack frame is always 16-byte aligned. Stack directly after function prolog:\\ | |
120 | |
121 \begin{figure}[h] | |
122 \begin{tabular}{5|3|1 1} | |
123 \hhline{~-~~} | |
124 & \vdots & & \\ | |
125 \hhline{~=~~} | |
126 local data & & & \mrrbrace{9}{caller's frame} \\ | |
127 \hhline{~-~~} | |
128 \mrlbrace{7}{parameter area} & \ldots & \mrrbrace{3}{stack parameters} & \\ | |
129 & \ldots & & \\ | |
130 & \ldots & & \\ | |
131 & r9 or xmm3 & \mrrbrace{4}{spill area} & \\ | |
132 & r8 or xmm2 & & \\ | |
133 & rdx or xmm1 & & \\ | |
134 & rcx or xmm0 & & \\ | |
135 \hhline{~-~~} | |
136 & return address & & \\ | |
137 \hhline{~=~~} | |
138 local data & & & \mrrbrace{3}{current frame} \\ | |
139 \hhline{~-~~} | |
140 parameter area & & & \\ | |
141 \hhline{~-~~} | |
142 & \vdots & & \\ | |
143 \hhline{~-~~} | |
144 \end{tabular} | |
145 \caption{Stack layout on x64 Microsoft platform} | |
146 \end{figure} | |
147 | |
148 | |
149 | |
150 \newpage | |
151 | |
152 \subsubsection{System V (Linux / *BSD / MacOS X)} | |
153 | |
154 \paragraph{Registers and register usage} | |
155 | |
156 \begin{table}[h] | |
157 \begin{tabular}{3 B} | |
158 \hline | |
159 Name & Brief description\\ | |
160 \hline | |
161 {\bf rax} & scratch, return value\\ | |
162 {\bf rbx} & permanent\\ | |
163 {\bf rcx} & scratch, parameter 3 if integer or pointer\\ | |
164 {\bf rdx} & scratch, parameter 2 if integer or pointer, return value\\ | |
165 {\bf rdi} & scratch, parameter 0 if integer or pointer\\ | |
166 {\bf rsi} & scratch, parameter 1 if integer or pointer\\ | |
167 {\bf rbp} & permanent, may be used ase frame pointer\\ | |
168 {\bf rsp} & stack pointer\\ | |
169 {\bf r8-r9} & scratch, parameter 4 and 5 if integer or pointer\\ | |
170 {\bf r10-r11} & scratch\\ | |
171 {\bf r12-r15} & permanent\\ | |
172 {\bf xmm0} & scratch, floating point parameters 0, floating point return value\\ | |
173 {\bf xmm1-xmm7} & scratch, floating point parameters 1-7\\ | |
174 {\bf xmm8-xmm15} & scratch\\ | |
175 {\bf st0-st1} & scratch, 16 byte floating point return value\\ | |
176 {\bf st2-st7} & scratch\\ | |
177 \hline | |
178 \end{tabular} | |
179 \caption{Register usage on x64 System V (Linux/*BSD)} | |
180 \end{table} | |
181 | |
182 \paragraph{Parameter passing} | |
183 | |
184 \begin{itemize} | |
185 \item stack parameter order: right-to-left | |
186 \item caller cleans up the stack | |
187 \item first 6 integer/pointer parameters are passed via rdi, rsi, rdx, rcx, r8, r9 | |
188 \item first 8 floating point parameters \textless=\ 64 bits are passed via xmm0l-xmm7l | |
189 \item parameters in registers are right justified | |
190 \item parameters that are not passed via registers are pushed onto the stack | |
191 \item parameters \textless\ 64bits are not zero extended - zero the upper bits contiaining garbage if needed (but they are always | |
192 passed as a qword) | |
193 \item integer/pointer parameters \textgreater\ 64 bit are passed via 2 registers | |
194 \item if callee takes address of a parameter, number of used xmm registers is passed silently in al (passed number mustn't be | |
195 exact but an upper bound on the number of used xmm registers) | |
196 \item stack is always 16byte aligned - since return address is 64 bits in size, stacks with an odd number of parameters are | |
197 already aligned | |
198 \end{itemize} | |
199 | |
200 | |
201 \paragraph{Return values} | |
202 | |
203 \begin{itemize} | |
204 \item return values of pointer or integral type (\textless=\ 64 bits) are returned via the rax register | |
205 \item floating point types are returned via the xmm0 register | |
206 \item for types \textgreater\ 64 bits, a secret first parameter with an address to the return value is passed - the passed in address | |
207 will be returned in rax | |
208 \item floating point values \textgreater\ 64 bits are returned via st0 and st1 | |
209 \end{itemize} | |
210 | |
211 | |
212 \paragraph{Stack layout} | |
213 | |
214 Stack frame is always 16-byte aligned. Note that there is no spill area. | |
215 Stack directly after function prolog:\\ | |
216 | |
217 \begin{figure}[h] | |
218 \begin{tabular}{5|3|1 1} | |
219 \hhline{~-~~} | |
220 & \vdots & & \\ | |
221 \hhline{~=~~} | |
222 local data & & & \mrrbrace{5}{caller's frame} \\ | |
223 \hhline{~-~~} | |
224 \mrlbrace{3}{parameter area} & \ldots & \mrrbrace{3}{stack parameters} & \\ | |
225 & \ldots & & \\ | |
226 & \ldots & & \\ | |
227 \hhline{~-~~} | |
228 & return address & & \\ | |
229 \hhline{~=~~} | |
230 local data & & & \mrrbrace{3}{current frame} \\ | |
231 \hhline{~-~~} | |
232 parameter area & & & \\ | |
233 \hhline{~-~~} | |
234 & \vdots & & \\ | |
235 \hhline{~-~~} | |
236 \end{tabular} | |
237 \caption{Stack layout on x64 System V (Linux/*BSD)} | |
238 \end{figure} | |
239 |