Mercurial > pub > dyncall > dyncall
annotate doc/manual/callconvs/callconv_x64.tex @ 493:75cb8f79d725
- doc and disas examples update about C++ non-trivial aggregates
author | Tassilo Philipp |
---|---|
date | Mon, 21 Mar 2022 14:46:38 +0100 |
parents | d160046da104 |
children | fc614cb865c6 |
rev | line source |
---|---|
0 | 1 %////////////////////////////////////////////////////////////////////////////// |
2 % | |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
3 % Copyright (c) 2007-2019 Daniel Adler <dadler@uni-goettingen.de>, |
0 | 4 % Tassilo Philipp <tphilipp@potion-studios.com> |
5 % | |
6 % Permission to use, copy, modify, and distribute this software for any | |
7 % purpose with or without fee is hereby granted, provided that the above | |
8 % copyright notice and this permission notice appear in all copies. | |
9 % | |
10 % THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES | |
11 % WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF | |
12 % MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR | |
13 % ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES | |
14 % WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN | |
15 % ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF | |
16 % OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. | |
17 % | |
18 %////////////////////////////////////////////////////////////////////////////// | |
19 | |
20 % ================================================== | |
21 % x64 | |
22 % ================================================== | |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
23 \subsection{x64 Calling Conventions} |
0 | 24 |
25 | |
26 \paragraph{Overview} | |
27 | |
28 The x64 (64bit) architecture designed by AMD is based on Intel's x86 (32bit) | |
29 architecture, supporting it natively. It is sometimes referred to as x86-64, | |
30 AMD64, or, cloned by Intel, EM64T or Intel64.\\ | |
31 On this processor, a word is defined to be 16 bits in size, a dword 32 bits | |
32 and a qword 64 bits. Note that this is due to historical reasons (terminology | |
33 didn't change with the introduction of 32 and 64 bit processors).\\ | |
34 The x64 calling convention for MS Windows \cite{x64Win} differs from the | |
35 SystemV x64 calling convention \cite{x64SysV} used by Linux/*BSD/... | |
36 Note that this is not the only difference between these operating systems. The | |
37 64 bit programming model in use by 64 bit windows is LLP64, meaning that the C | |
38 types int and long remain 32 bits in size, whereas long long becomes 64 bits. | |
39 Under Linux/*BSD/... it's LP64.\\ | |
40 \\ | |
41 Compared to the x86 architecture, the 64 bit versions of the registers are | |
42 called rax, rbx, etc.. Furthermore, there are eight new general purpose | |
95 | 43 registers r8-r15.\\ |
0 | 44 |
45 | |
46 | |
47 \paragraph{\product{dyncall} support} | |
48 | |
340 | 49 Currently, the MS Windows and System V calling conventions are supported.\\ |
50 \product{Dyncall} can also be used to issue syscalls on System V platforms by | |
51 using the syscall number as target parameter and selecting the correct mode. | |
0 | 52 |
53 \subsubsection{MS Windows} | |
54 | |
55 \paragraph{Registers and register usage} | |
56 | |
57 \begin{table}[h] | |
77 | 58 \begin{tabular*}{0.95\textwidth}{3 B} |
0 | 59 Name & Brief description\\ |
60 \hline | |
61 {\bf rax} & scratch, return value\\ | |
62 {\bf rbx} & permanent\\ | |
63 {\bf rcx} & scratch, parameter 0 if integer or pointer\\ | |
64 {\bf rdx} & scratch, parameter 1 if integer or pointer\\ | |
65 {\bf rdi} & permanent\\ | |
66 {\bf rsi} & permanent\\ | |
276 | 67 {\bf rbp} & permanent, may be used as frame pointer\\ |
0 | 68 {\bf rsp} & stack pointer\\ |
69 {\bf r8-r9} & scratch, parameter 2 and 3 if integer or pointer\\ | |
70 {\bf r10-r11} & scratch, permanent if required by caller (used for syscall/sysret)\\ | |
71 {\bf r12-r15} & permanent\\ | |
72 {\bf xmm0} & scratch, floating point parameter 0, floating point return value\\ | |
73 {\bf xmm1-xmm3} & scratch, floating point parameters 1-3\\ | |
74 {\bf xmm4-xmm5} & scratch, permanent if required by caller\\ | |
75 {\bf xmm6-xmm15} & permanent\\ | |
76 | 76 \end{tabular*} |
0 | 77 \caption{Register usage on x64 MS Windows platform} |
78 \end{table} | |
79 | |
80 \paragraph{Parameter passing} | |
81 | |
82 \begin{itemize} | |
83 \item stack parameter order: right-to-left | |
84 \item caller cleans up the stack | |
85 \item first 4 integer/pointer parameters are passed via rcx, rdx, r8, r9 (from left to right), others are pushed on stack (there is a | |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
86 spill area for the first 4) |
493
75cb8f79d725
- doc and disas examples update about C++ non-trivial aggregates
Tassilo Philipp
parents:
486
diff
changeset
|
87 \item {\it non-trivial} C++ aggregates (as defined by the language), are passed indirectly via a pointer to a copy of the aggregate, no matter the size |
467 | 88 \item aggregates (structs and unions) \textless\ 64 bits are passed like equal-sized integers |
0 | 89 \item float and double parameters are passed via xmm0l-xmm3l |
90 \item first 4 parameters are passed via the correct register depending on the parameter type - with mixed float and int parameters, | |
91 some registers are left out (e.g. first parameter ends up in rcx or xmm0, second in rdx or xmm1, etc.) | |
92 \item parameters in registers are right justified | |
93 \item parameters \textless\ 64bits are not zero extended - zero the upper bits contiaining garbage if needed (but they are always | |
94 passed as a qword) | |
493
75cb8f79d725
- doc and disas examples update about C++ non-trivial aggregates
Tassilo Philipp
parents:
486
diff
changeset
|
95 \item parameters \textgreater\ 64 bits are passed by via a pointer to a copy (for aggregate types, that caller-allocated memory must be 16-byte aligned) |
0 | 96 \item if callee takes address of a parameter, first 4 parameters must be dumped (to the reserved space on the stack) - for |
97 floating point parameters, value must be stored in integer AND floating point register | |
98 \item caller cleans up the stack, not the callee (like cdecl) | |
99 \item stack is always 16byte aligned - since return address is 64 bits in size, stacks with an odd number of parameters are | |
100 already aligned | |
331 | 101 \item ellipsis calls take floating point values in int and float registers (single precision floats are promoted to double precision as |
102 required by ellipsis calls) | |
0 | 103 \item if size of parameters \textgreater\ 1 page of memory (usually between 4k and 64k), chkstk must be called |
104 \end{itemize} | |
105 | |
106 | |
107 \paragraph{Return values} | |
108 | |
109 \begin{itemize} | |
486
d160046da104
doc cleanup: removed outdated/wrong info and fixed wrong value size specs
Tassilo Philipp
parents:
467
diff
changeset
|
110 \item return values of pointer, integral or aggregate (structs and unions) type (\textless=\ 64 bits) are returned via the rax register |
0 | 111 \item floating point types are returned via the xmm0 register |
493
75cb8f79d725
- doc and disas examples update about C++ non-trivial aggregates
Tassilo Philipp
parents:
486
diff
changeset
|
112 \item for any other type \textgreater\ 64 bits (or for {\it non-trivial} C++ aggregates of any size), a hidden first parameter, with an address to the |
75cb8f79d725
- doc and disas examples update about C++ non-trivial aggregates
Tassilo Philipp
parents:
486
diff
changeset
|
113 return value is passed (for C++ thiscalls it is passed as {\bf second} parameter, after the this pointer) |
0 | 114 \end{itemize} |
115 | |
116 | |
117 \paragraph{Stack layout} | |
118 | |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
119 Stack frame is always 16-byte aligned. |
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
120 % verified/amended: TP nov 2019 (@@@ no doc/disas_examples/x64.win.disas, yet...@@@) |
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
121 Stack directly after function prolog:\\ |
0 | 122 |
123 \begin{figure}[h] | |
124 \begin{tabular}{5|3|1 1} | |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
125 & \vdots & & \\ |
0 | 126 \hhline{~=~~} |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
127 register save area & \hspace{4cm} & & \mrrbrace{10}{caller's frame} \\ |
0 | 128 \hhline{~-~~} |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
129 local data & & & \\ |
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
130 \hhline{~-~~} |
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
131 \mrlbrace{7}{parameter area} & arg n-1 & \mrrbrace{3}{stack parameters} & \\ |
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
132 & \ldots & & \\ |
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
133 & arg 4 & & \\ |
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
134 & r9 or xmm3 & \mrrbrace{4}{spill area} & \\ |
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
135 & r8 or xmm2 & & \\ |
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
136 & rdx or xmm1 & & \\ |
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
137 & rcx or xmm0 & & \\ |
0 | 138 \hhline{~-~~} |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
139 & return address & & \\ |
0 | 140 \hhline{~=~~} |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
141 register save area & & & \mrrbrace{4}{current frame} \\ |
0 | 142 \hhline{~-~~} |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
143 local data & & & \\ |
0 | 144 \hhline{~-~~} |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
145 parameter area & & & \\ |
0 | 146 \hhline{~-~~} |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
147 & \vdots & & \\ |
0 | 148 \end{tabular} |
149 \caption{Stack layout on x64 Microsoft platform} | |
150 \end{figure} | |
151 | |
152 | |
153 | |
467 | 154 \clearpage |
0 | 155 |
156 \subsubsection{System V (Linux / *BSD / MacOS X)} | |
157 | |
158 \paragraph{Registers and register usage} | |
159 | |
160 \begin{table}[h] | |
77 | 161 \begin{tabular*}{0.95\textwidth}{3 B} |
0 | 162 Name & Brief description\\ |
163 \hline | |
467 | 164 {\bf rax} & scratch, return value, special use for varargs (in al, see below)\\ |
0 | 165 {\bf rbx} & permanent\\ |
166 {\bf rcx} & scratch, parameter 3 if integer or pointer\\ | |
167 {\bf rdx} & scratch, parameter 2 if integer or pointer, return value\\ | |
168 {\bf rdi} & scratch, parameter 0 if integer or pointer\\ | |
169 {\bf rsi} & scratch, parameter 1 if integer or pointer\\ | |
276 | 170 {\bf rbp} & permanent, may be used as frame pointer\\ |
0 | 171 {\bf rsp} & stack pointer\\ |
172 {\bf r8-r9} & scratch, parameter 4 and 5 if integer or pointer\\ | |
173 {\bf r10-r11} & scratch\\ | |
174 {\bf r12-r15} & permanent\\ | |
467 | 175 {\bf xmm0-xmm1} & scratch, floating point parameters 0-1, floating point return value\\ |
176 {\bf xmm2-xmm7} & scratch, floating point parameters 2-7\\ | |
0 | 177 {\bf xmm8-xmm15} & scratch\\ |
178 {\bf st0-st1} & scratch, 16 byte floating point return value\\ | |
179 {\bf st2-st7} & scratch\\ | |
76 | 180 \end{tabular*} |
0 | 181 \caption{Register usage on x64 System V (Linux/*BSD)} |
182 \end{table} | |
183 | |
184 \paragraph{Parameter passing} | |
185 | |
186 \begin{itemize} | |
187 \item stack parameter order: right-to-left | |
188 \item caller cleans up the stack | |
189 \item first 6 integer/pointer parameters are passed via rdi, rsi, rdx, rcx, r8, r9 | |
190 \item first 8 floating point parameters \textless=\ 64 bits are passed via xmm0l-xmm7l | |
191 \item parameters in registers are right justified | |
467 | 192 \item parameters that are not passed via registers are pushed onto the stack (with their sizes rounded up to qwords) |
0 | 193 \item parameters \textless\ 64bits are not zero extended - zero the upper bits contiaining garbage if needed (but they are always |
194 passed as a qword) | |
195 \item integer/pointer parameters \textgreater\ 64 bit are passed via 2 registers | |
467 | 196 \item if callee takes address of a parameter, number of used xmm registers is passed silently in al (passed number doesn't need to be |
0 | 197 exact but an upper bound on the number of used xmm registers) |
467 | 198 \item aggregates (structs, unions (and arrays within those)) follow a more complicated logic (the following {\bf only considers field types supported by dyncall}): |
199 \begin{itemize} | |
493
75cb8f79d725
- doc and disas examples update about C++ non-trivial aggregates
Tassilo Philipp
parents:
486
diff
changeset
|
200 \item {\it non-trivial} C++ aggregates (as defined by the language), are passed indirectly via a pointer to a copy of the aggregate, no matter the size |
467 | 201 \item aggregates \textgreater\ 16 bytes are always passed entirely via the stack |
202 \item all other aggregates are classified per qword, by looking at all fields occupying all or part of that qword, recursively | |
203 \begin{itemize} | |
204 \item if any field would be passed via the stack, the entire qword will | |
205 \item otherwise, if any field would be passed like an integer/pointer value, the entire qword will | |
206 \item otherwise the qword is passed like a floating point value | |
207 \end{itemize} | |
208 \item after qword classification, the logic is: | |
209 \begin{itemize} | |
210 \item if any qword is classified to be passed via the stack, the entire aggregate will | |
211 \item if the size of the aggregate is \textgreater\ 2 qwords, it is passed via the stack (except for single floating point values \textgreater\ 128bits) | |
212 \item all others are passed qword by qword according to their classification, like individual arguments | |
213 \item however, an aggregate is never split between registers and the stack, if it doesn't fit into available registers it is entirely passed via the stack (freeing such registers for subsequent arguments) | |
214 \end{itemize} | |
215 \end{itemize} | |
0 | 216 \item stack is always 16byte aligned - since return address is 64 bits in size, stacks with an odd number of parameters are |
217 already aligned | |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
218 \item no spill area is used on stack, iterating over varargs requires a specific va\_list implementation |
0 | 219 \end{itemize} |
220 | |
221 | |
222 \paragraph{Return values} | |
223 | |
224 \begin{itemize} | |
467 | 225 \item return values of pointer or integral type are returned via the rax register (and rdx if needed) |
226 \item floating point types are returned via the xmm0 register (and xmm1 if needed) | |
227 \item aggregates are first classified in the same way as when passing them by value, then: | |
228 \begin{itemize} | |
493
75cb8f79d725
- doc and disas examples update about C++ non-trivial aggregates
Tassilo Philipp
parents:
486
diff
changeset
|
229 \item for aggregates that would be passed via the stack (or for {\it non-trivial} C++ aggregates of any size), a hidden pointer to a non-shared, |
75cb8f79d725
- doc and disas examples update about C++ non-trivial aggregates
Tassilo Philipp
parents:
486
diff
changeset
|
230 caller provided space is {\bf passed} as hidden, first argument; this pointer will be returned via rax |
467 | 231 \item otherwise, qword by qword is passed, using rax and rdx for integer/pointer qwords, and xmm0 and xmm1 for floating point ones |
232 \end{itemize} | |
0 | 233 \item floating point values \textgreater\ 64 bits are returned via st0 and st1 |
234 \end{itemize} | |
235 | |
236 | |
237 \paragraph{Stack layout} | |
238 | |
467 | 239 Stack frame is always 16-byte aligned. A 128 byte large zone beyond the |
240 location pointed to by the stack pointer is referred to as "red zone", | |
241 considered to be reserved and not be modified by signal or interrupt handlers | |
242 (useful for temporary data not needed to be preserved across calls, and for | |
243 optimizations for leaf functions). | |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
244 % verified/amended: TP nov 2019 (see also doc/disas_examples/x64.sysv.disas) |
0 | 245 Stack directly after function prolog:\\ |
246 | |
247 \begin{figure}[h] | |
248 \begin{tabular}{5|3|1 1} | |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
249 & \vdots & & \\ |
0 | 250 \hhline{~=~~} |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
251 register save area & \hspace{4cm} & & \mrrbrace{6}{caller's frame} \\ |
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
252 \hhline{~-~~} |
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
253 local data (with padding) & & & \\ |
0 | 254 \hhline{~-~~} |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
255 \mrlbrace{3}{parameter area} & arg n-1 & \mrrbrace{3}{stack parameters} & \\ |
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
256 & \ldots & & \\ |
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
257 & arg 6 & & \\ |
0 | 258 \hhline{~-~~} |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
259 & return address & & \\ |
0 | 260 \hhline{~=~~} |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
261 register save area & & & \mrrbrace{4}{current frame} \\ |
0 | 262 \hhline{~-~~} |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
263 local data & & & \\ |
0 | 264 \hhline{~-~~} |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
265 parameter area & & & \\ |
0 | 266 \hhline{~-~~} |
328
276eb8c87aa0
- review and fixes, cleanup, amendments to calling convention appendix of manual
Tassilo Philipp
parents:
276
diff
changeset
|
267 & \vdots & & \\ |
0 | 268 \end{tabular} |
269 \caption{Stack layout on x64 System V (Linux/*BSD)} | |
270 \end{figure} | |
271 | |
340 | 272 |
467 | 273 \clearpage |
340 | 274 |
275 \subsubsection{System V syscalls} | |
276 | |
277 \paragraph{Parameter passing} | |
278 | |
279 \begin{itemize} | |
280 \item syscall is issued via the {\em syscall} instruction | |
281 \item kernel destroys registers rcx and r11 | |
282 \item syscall number is set in rax | |
345
c607d67cd6b8
- doc: added syscall info to appendix, fixed broken literature link
Tassilo Philipp
parents:
340
diff
changeset
|
283 \item params are passed in the following registers in this order: rdi, rsi, rdx, rcx, r8, r9 |
c607d67cd6b8
- doc: added syscall info to appendix, fixed broken literature link
Tassilo Philipp
parents:
340
diff
changeset
|
284 \item no stack in use, meaning syscalls are in theory limited to six arguments |
340 | 285 \item register rax holds the return value (values in between -4095 and -1 indicate errors) |
286 \end{itemize} | |
287 |