comparison R/rdyncall/vignettes/FLI.Rnw @ 0:0cfcc391201f

initial from svn dyncall-1745
author Daniel Adler
date Thu, 19 Mar 2015 22:26:28 +0100
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:0cfcc391201f
1 \documentclass[11pt]{article}
2 \usepackage[round]{natbib}
3 \usepackage{hyperref}
4 \usepackage{amsmath}
5 \usepackage{fancyvrb}
6 \usepackage{verbatim}
7 \usepackage{alltt,graphicx}
8 \usepackage{fullpage}
9 \bibliographystyle{abbrvnat}
10 \newcommand{\file}[1]{{`\normalfont\textsf{#1}'}}
11 \newcommand{\strong}[1]{\texorpdfstring%
12 {{\normalfont\fontseries{b}\selectfont #1}}%
13 {#1}}
14 \let\pkg=\strong
15 \newcommand\code{\bgroup\@codex}
16 \def\@codex#1{\texorpdfstring%
17 {{\normalfont\ttfamily\hyphenchar\font=-1 #1}}%
18 {#1}\egroup}
19 \newenvironment{smallverbatim}{\small\verbatim}{\endverbatim}
20 \newenvironment{example}{\begin{alltt}}{\end{alltt}}
21 \newenvironment{smallexample}{\begin{alltt}\small}{\end{alltt}}
22
23 \begin{document}
24
25
26 \title{Foreign Library Interface}
27 %\VignetteIndexEntry{Foreign Library Interface}
28 \author{by Daniel Adler}
29 \maketitle
30 \abstract{
31 We present an improved Foreign Function Interface (FFI) for R to
32 call arbitary native functions without the need for C wrapper code.
33 Further we discuss a dynamic linkage
34 framework for binding standard C libraries to R across platforms using a
35 universal type information format.
36 The package \pkg{rdyncall} comprises the framework
37 and an initial repository of cross-platform bindings for standard libraries such as
38 (legacy and modern) \emph{OpenGL}, the family of \emph{SDL} libraries and \emph{Expat}.
39 The package enables system-level programming using the R language;
40 sample applications are given in the article.
41 We outline the underlying automation tool-chain that extracts
42 cross-platform bindings from C headers, making the
43 repository extendable and open for library developers.
44 }
45 \section{Introduction}
46
47 \begin{table*}
48 \centering
49 \label{tab:libs}
50 \begin{tabular}{l|l|c|c|c}
51 lib/dynport & description & functions & constants & aggregate types \\
52 \hline
53 \code{gl} & opengl & 337 & 3253 & - \\
54 \code{glu} & opengl utility & 59 & 154 & - \\
55 \code{r} & r library & 238 & 700 & 27 \\
56 \code{sdl} & audio/video/ui abstraction & 203 & 465 & 51 \\
57 \code{sdl\_image} & pixel format loaders & 29 & - & - \\
58 \code{sdl\_mixer} & music format loaders and playing & 63 & 12 & - \\
59 \code{sdl\_ttf} & font format loaders & 35 & 9 & - \\
60 \code{cuda} & gpu programming & 387 & 665 & 84 \\
61 \code{expat} & xml parsing framework & 65 & 70 & - \\
62 \code{glew} & gl extensions & 1465 & - & - \\
63 \code{gl3} & opengl 3 (strict) & 324 & 838 & 1 \\
64 \code{opencl} & gpu programming & 78 & 260 & 10 \\
65 \code{stdio} & standard i/o & 76 & 3 & - \\
66 \end{tabular}
67 \caption{overview of available dynports for portable c libraries}
68 \end{table*}
69
70 We present an improved Foreign Function Interface (FFI) for R that
71 significantly reduces the amount of C wrapper code needed to interface with C.
72 We also introduce a \emph{dynamic} linkage that binds the C
73 interface of a pre-compiled library (\emph{as a whole}) to an
74 interpreted programming environment \citep{Oust97a} such as R - hence th name
75 \emph{Foreign Library Interface}. Table 1 gives a list
76 of the C libraries currently supported across major R platforms.
77 For each library supported, abstract interface specifications are declared
78 in a compact platform-neutral text-based format stored in so-called
79 \emph{DynPort} files on a local repository.
80
81 %between high-level interpreted programming environments
82 %and native pre-compiled C libraries that uses a compact text-based
83 %interface and type information format that makes this method work across platforms.
84
85 R \citep{R:Ihaka+Gentleman:1996} was choosen as the first language
86 to implement a proof-of-concept implementation for this approach.
87 This article describes the \pkg{rdyncall} package which
88 implements a complete toolkit of low-level facilities that can be used as an
89 alternative FFI to interface with the C programming language.
90 And further, it enables direct and quick access to
91 the common C libraries from R without compilation.
92
93 The project was motivated by the fact that
94 high-quality software solutions implemented in portable C
95 are often not available in interpreter-based languages such as R.
96 The pool of freely available C libraries is quite large and
97 represents an invaluable resource for software development.
98 For example, OpenGL \citep{Board05} is the most portable and standard interface to
99 accelerated graphics hardware for developing real-time graphics software.
100 The combination of OpenGL with the \emph{Simple DirectMedia Layer} (SDL) \citep{SDL}
101 core and extension libraries offers a foundation framework for
102 developing interactive multimedia applications that can run on a
103 multitude of platforms.
104 Other libraries such as the Expat XML Parser \citep{www:expat} provide a parser framework
105 for processing very large XML documents.
106 And even the C library of R contains high-quality statistical
107 functions that are useful in context of other languages as well.
108
109 To make use of these libraries within high-level languages, \emph{language bindings}
110 to the library must be written as an extension to the language, a task that
111 requires deep familiarity of the internals of both the library and the interpreter.
112 Depending on the complexity of the library, the amount of work needed to wrap
113 the interface can be very large (Table \ref{tab:libs} gives the counts of
114 functions, constants and types that need to be wrapped).
115 Rather than having to write a separate binding for each \emph{library and language}
116 combination, we research a dynamic binding approach that
117 is adaptable to interpreters and works cross-platform without additional
118 compilation of wrapper layers.
119 Once the binding specification for a library has been specified, that
120 library becomes automatically accessible to all interpreters that
121 implement such a framework outlined here.
122 Extension techniques offered by the language interpreter, such as a
123 \emph{Foreign Function Interface} (FFI), are the fundamental technology
124 for bridging the dynamic interpreter with statically pre-compiled code.
125
126 In the case of R the built-in FFI function \code{.C} provides a fairly
127 basic call gate to C code with strong limitations; additional wrapper code has
128 to be written in addition to interface with standard C libraries.
129 \pkg{rdyncall} contributes an improved FFI for R that offers a \emph{flexible}
130 and \emph{type-safe} interface with support for almost all C types without
131 requiring additional C wrappers.
132
133 Based on this FFI, the package contains a proof-of-concept implementation of a \emph{Foreign Library Interface} that enables
134 \emph{direct} and \emph{dynamic} interoperability with foreign C Libraries
135 (including shared library code and the Application Programming Interface
136 specified in C headers) from within the R interpreter.
137 For each C library supported, abstract interface specification are declared in a
138 compact platform-neutral text-based format stored in a so-called \emph{DynPort} file
139 located in a local repository within the package.
140 Table \ref{tab:libs} gives a sample list of available bindings that come with the package.
141
142 Users gain access to C libraries from R using the front-end function \code{dynport(}\emph{portname}\code{)},
143 which processes a \emph{DynPort} file to load the C library\footnote{Pre-compiled libraries need to be installed, OS-specific installation notes are given in the documentation of the package.},
144 and wrap the C interface as a newly attached R environment
145 \footnote{Note \pkg{rdyncall} version 0.7.4 and below uses R name space objects \citep{RNameSpace} as dynport containers. This has changed starting with version 0.7.5 due to restrictions for packages hosted on CRAN not to use internal functions. Since there is no public interface for the creation of name space objects currently in R, \pkg{rdyncall} uses ordinary environment objects for now.
146 This disables the use of the double colon operator (\code{::}) to refer to dynport objects; unloading is done using \code{detach(dynport:<PORTNAME>)}.}
147 that uses the same symbolic names of the C API.
148 R code that uses C interfaces via \emph{DynPort}s might look very familiar to C user code.
149
150 This article motivates the topic with a comparison of the built-in and
151 contributed FFI by means of a simple use case. This leads to a detailed description of the improved FFI.
152 Then follows an overview of the package and a brief tour through the framework
153 with details on the handling of foreign C data types and wrapping R functions as callbacks.
154 Two sample applications are given using OpenGL, SDL and Expat.
155 The article ends with a brief description of the implementation based on C libraries from the \emph{DynCall} project \citep{dyncall}
156 and the tool-chain that was used to create the repository of \emph{DynPort} files.
157
158 \section{Foreign Function Interfaces}
159
160 FFIs provide the backbone of a language to interface with foreign code.
161 Depending on the design of this service,
162 it can largely unburden developers from writing additional wrapper code.
163 In this section, we compare the built-in FFI with the improved
164 FFI provided by \pkg{rdyncall} using a simple example that sketches
165 the different work flow paths for making an R binding to a function
166 from a foreign C library.
167
168 \subsection{FFI of base R}
169
170 Suppose that we wish to invoke the C function \code{sqrt} of the
171 C Standard Math library. The function is declared as follows in C:
172 \begin{verbatim}
173 double sqrt(double x);
174 \end{verbatim}
175
176 R offers a number of functions to call pre-compiled code from
177 within the R interpreter. While \code{.Call} and \code{.External}
178 are designed for interoperability with \emph{extension} code, \code{.C}
179 and \code{.Fortran} seem to offer the most low-level interoperability with
180 \emph{foreign} code.
181 But \code{.C} has also very strict conversion rules and strong limitations
182 regarding argument and return-types:
183 \code{.C} passes R arguments as C pointers and
184 C return types are not supported, so only C \code{void} functions,
185 which are procedures, can be called.
186 Given these limitations, we are not able to invoke the foreign
187 \code{sqrt} function directly and need some intermediate wrapper code
188 written in C that obeys the rules of the \code{.C} interface:
189
190 \begin{smallverbatim}
191 #include <math.h>
192 void R_C_sqrt(double * ptr_to_x)
193 {
194 double x = ptr_to_x[0], ans;
195 ans = sqrt(x);
196 ptr_to_x[0] = ans;
197 }
198 \end{smallverbatim}
199
200
201 We assume that the wrapper code is deployed as a shared library
202 in a package named \emph{testsqrt} which links to the C math library.
203 \footnote{We omit here the details such as registering C functions which is
204 described in detail in the R Manual '\emph{Writing R Extensions}' \citep{RExt}.}.
205 Then we load the \emph{testsqrt} package and call the C wrapper function directly
206 via \code{.C}.
207
208 \begin{example}
209 > library(testsqrt)
210 > .C("R_C_sqrt", 144, PACKAGE="testsqrt")
211 [[1]]
212 [1] 12
213 \end{example}
214
215 To make \code{sqrt} available as a public function, an additional
216 R wrapper layer is added, that does type-safety checks before
217 issuing the \code{.C} call.
218
219 \begin{smallverbatim}
220 sqrtViaC <- function(x)
221 {
222 x <- as.numeric(x) # type(x) should be C double.
223 # make sure length > 0:
224 length(x) <- max(1, length(x))
225 .C("R_C_sqrt", x, PACKAGE="example")
226 }
227 \end{smallverbatim}
228
229 As an alternative, R also provides high-level C extension interfaces
230 such as \code{.Call} and \code{.External}, that give access to R internals
231 at C level and enable to make type-safety checks within C:
232
233 \begin{smallverbatim}
234 #include <R.h>
235 #include <Rinternals.h>
236 #include <math.h>
237 SEXP R_Call_sqrt(SEXP x)
238 {
239 SEXP ans = R_NilValue, tmp;
240 PROTECT( tmp = coerceVector(x, REALSXP) );
241 if (LENGTH(tmp) > 0) {
242 double y = REAL(tmp)[0], result;
243 result = sqrt(y);
244 ans = ScalarReal(result);
245 }
246 UNPROTECT(1);
247 return ans;
248 }
249 \end{smallverbatim}
250
251 Now the corresponding R wrapper shrinks into a simple delegate:
252
253 \begin{example}
254 > sqrtViaCall <- function(x)
255 + .Call("R_Call_sqrt", x, PACKAGE="example")
256 \end{example}
257
258 The third alternative, via \code{.External}, is omitted here;
259 it has a different argument passing scheme, but the C and R wrapper
260 implementations would look very similar.
261
262 We can conclude that - in realistic settings - the built-in FFI of R
263 almost always needs support by a wrapper layer written in C.
264 The "foreign" in FFI is in fact relegated to the C wrapper layer.
265
266 Moreover the R FFI can be viewed as an \emph{extension} interface for
267 calling pre-compiled code written in a \emph{foreign} language within
268 the context of the R implementation, rather than a direct invocation
269 interface for code from a \emph{foreign} context such as an
270 ordinary C library.
271
272 \subsection{FFI of rdyncall}
273
274 \begin{table*}
275 \begin{center}
276 \begin{tabular}{ll|ll}
277 \hline \hline
278 Type& Sign. & Type & Sign. \\
279 \hline
280 \verb@void@ & \verb@v@ & \verb@bool@ & \verb@B@ \\
281 \verb@char@ & \verb@c@ & \verb@unsigned char@ & \verb@C@ \\
282 \verb@short@ & \verb@s@ & \verb@unsigned short@ & \verb@S@ \\
283 \verb@int@ & \verb@i@ & \verb@unsigned int@ & \verb@I@ \\
284 \verb@long@ & \verb@j@ & \verb@unsigned long@ & \verb@J@ \\
285 \verb@long long@ & \verb@l@ & \verb@unsigned long long@ & \verb@L@ \\
286 \verb@float@ & \verb@f@ & \verb@double@ & \verb@d@ \\
287 \verb@void*@ & \verb@p@ & \verb@struct@ \emph{name} \verb@*@ & \verb@*<@\emph{name}\verb@>@ \\
288 \emph{type}\verb@*@ & \verb@*@... & \verb@const char*@ & \verb@Z@ \\
289 \hline \hline
290 \end{tabular}
291 \end{center}
292 \caption{\label{tab:signature} C/C++ Types and Signatures}
293 \end{table*}
294
295 \pkg{rdyncall} provides an improved FFI for R
296 that is accessible via the function \code{.dyncall}.
297 In contrast to the built-in R FFI which uses a C wrapper layer,
298 the \code{sqrt} function is invoked dynamically and directly
299 by the interpreter at run-time.
300 Whereas the C math library was loaded implicitly via the
301 example package, it now has to be loaded explicitly.
302
303 R offers functions to deal with shared libraries at run-time,
304 but the location has to be specified as an absolute pathname which
305 is platform-specific.
306 For now, let us assume that the example is done on
307 Mac OS X where the C math library is located
308 at \file{/usr/lib/libm.dylib}. A platform-portable solution
309 is discussed in the next section on \emph{Portable loading of shared library}.
310
311 \begin{example}
312 > libm <- dyn.load("/usr/lib/libm.dylib")
313 > sqrtAddr <- libm$sqrt$address
314 \end{example}
315
316 We first need to load the R package \pkg{rdyncall}:
317
318 \begin{example}
319 > library(rdyncall)
320 \end{example}
321
322 Finally, we invoke the foreign C function \code{sqrt} \emph{directly} via
323 \code{.dyncall}:
324
325 \begin{example}
326 > .dyncall(sqrtAddr, "d)d", 144)
327 [1] 12
328 \end{example}
329
330 Let us review the last call, as it pinpoints the core solution for a direct
331 invocation of foreign code within R:
332 The first argument specifies the address of the foreign code, given as an
333 external pointer.
334 The second argument is a \emph{call signature}
335 that specifies the argument- and return types of the target C function.
336 This string \verb@"d)d"@ specifies that the foreign function
337 expects a \code{double} scalar argument and returns a \code{double} scalar value
338 in correspondence to the C declaration of \code{sqrt}.
339 Arguments following the call signature are passed to the
340 foreign function using the call signature for type-safe conversion to C types.
341 In this case we pass \code{144} as a C \code{double} argument type as first
342 argument and receive a C \code{double} value converted to an R \code{numeric}.
343
344 \subsection{Call Signatures}
345
346 The introduction of a type descriptor for foreign functions is a key
347 component that makes the FFI flexible and type-safe.
348 The format of the call signature has the following pattern:
349
350 \begin{center}
351 \emph{argument-types} \verb@')'@ \emph{return-type}
352 \end{center}
353
354 The signature can be derived from the C function declaration:
355 Argument types are specified first, in a left-to-right order, and are
356 terminated by the \verb@')'@ symbol followed by a single return type signature.
357
358 Almost all fundamental C types are supported and there is no real
359 restriction regarding the number of arguments supported to issue
360 a call.
361 Table \ref{tab:signature} gives an overview of supported C types and
362 the corresponding text encoding; Table \ref{tab:signature_examples}
363 provides some examples of C functions and call signatures.
364
365 \begin{table*}
366 \center
367 \begin{tabular}{l|l}
368 C function declaration & dyncall type signature \\
369 \hline
370 \verb@void rsort_with_index(double*,int*,int n)@ & \verb@*d*ii)v@ \\
371 \verb@SDL_Surface * SDL_SetVideoMode(int,int,int,Uint32_t)@ & \verb@iiiI)*<SDL_Surface>@ \\
372 \verb@void glClear(GLfloat,GLfloat,GLfloat,GLfloat)@ & \verb@ffff)v@ \\
373 \end{tabular}
374 \caption{\label{tab:signature_examples}
375 Some examples of C functions and corresponding signatures}
376 \end{table*}
377
378 Now, let us define a public and type-safe R wrapper function that
379 hides the details of the foreign function call by passing the formal
380 argument place holder "\code{...}" as third argument to \code{.dyncall}:
381
382 \begin{example}
383 > sqrtViaDynCall <- function(...)
384 + .dyncall(sqrtAddress, "d)d", ...)
385 \end{example}
386
387 Although there is no further guard code, this interface is type-safe and
388 the user can do no harm by inadvertently using a wrong set and/or type
389 of arguments due to the built-in type-checks.
390 Compared to the R wrapper code using \code{.C}, no explicit cast of the
391 arguments via \code{as.numeric} is required, because
392 automatic coercion rules for fundamental types are implemented as dictated
393 by the call signature. For example, \code{integer} R values are
394 implicitly casted to \code{double} automatically:
395
396 \begin{smallverbatim}
397 > sqrtViaDyncall(144L)
398 [1] 12
399 \end{smallverbatim}
400
401 A certain level of type-safety is achieved here as well:
402 All arguments to be passed to C are first checked against the call signature.
403 If any incompatibility is detected, such as a wrong number of arguments,
404 empty atomic vectors or incompatible type mappings, the invocation is aborted
405 and an error is reported before risking an application crash:
406
407 \begin{smallverbatim}
408 > sqrtViaDyncall(1,2)
409 Error in .dyncall(sqrtAddress, "d)d", ...) :
410 Too many arguments for signature 'd)d'.
411 > sqrtViaDyncall()
412 Error in .dyncall(sqrtAddress, "d)d", ...) :
413 Not enough arguments
414 for function-call signature 'd)d'.
415 > sqrtViaDyncall(NULL)
416 Error in .dyncall(sqrtAddress, "d)d", ...) :
417 Argument type mismatch at position 1:
418 expected double convertible value
419 > sqrtViaDyncall("144")
420 Error in .dyncall(sqrtAddress, "d)d", ...) :
421 Argument type mismatch at position 1:
422 expected double convertible value
423 \end{smallverbatim}
424
425 In contrast to the R FFI, where the argument conversion is
426 dictated solely by the R argument type at call-time in a one-way fashion,
427 the introduction of an additional specification with a call signature gives
428 several advantages.
429
430 \begin{itemize}
431 \item Almost all possible C functions can be invoked by a single interface;
432 no additional C wrapper is required.
433 \item The built-in type-safety checks of passed arguments enhance stability
434 and reduce assertion code in R wrappers significantly.
435 \item A single call signature can work across platforms,
436 given that the C function type remains constant across platforms.
437 \item Given that our FFI is implemented in multiple languages,
438 call signatures represent a portable type description for C libraries.
439 \end{itemize}
440
441 \section{Package Overview}
442
443 Besides dynamic calling of foreign code, the package provides essential
444 facilities for interoperability between the R and C programming languages.
445 A high-level overview of components that make up the
446 package is given in Figure \ref{fig:pkg_overview}.
447
448 \begin{figure}[h]
449 \centering
450 \includegraphics[scale=0.44]{img_overview.pdf}
451 \caption{\label{fig:pkg_overview}
452 Package Overview}
453 \end{figure}
454
455 We already described the \code{.dyncall} FFI. It follows a
456 brief description of portable loading of
457 shared libraries using \code{dynfind}, installation of wrappers via \code{dynbind},
458 handling of foreign data types via \code{new.struct} and wrapping of R functions as C callbacks via \code{new.callback}.
459 Finally the high-level \code{dynport} interface for accessing \emph{whole} C libraries is briefly discussed.
460 The technical details at low-level of some components are described briefly in the
461 section \emph{Architecture}.
462
463 \subsection{Portable loading of shared libraries}
464
465 The \emph{portable} loading of shared libraries across platforms is not
466 trivial because the file path is different in Operating-Systems (OS).
467 Referring back to the previous example, to load a particular library
468 in a portable fashion, one would have to check the platform to
469 locate the C library.\footnote{Possible C math library names are \file{libm.so}, \file{libm.so.6} and \file{MSVCRT.DLL}
470 in locations such as \file{/lib}, \file{/usr/lib}, \file{/lib64}, \file{/lib/sparcv9}, \file{/usr/lib64}, \file{C:\textbackslash WINDOWS\textbackslash SYSTEM32} etc..}
471
472 Although there is variation among the OSs, library file paths and
473 search patterns have common structures.
474 For example, among
475 all the different locations, prefixes and suffixes, there is a part within
476 a full library filename that can be taken as a \emph{short library name} or
477 label.
478
479 The function \code{dynfind} takes a list of short library names to
480 locate a library using common search heuristics.
481 For example, to load the Standard C Math library, one would either use
482 the Microsoft Visual C Run-Time library labeled \file{msvcrt} on Windows
483 or the C Math library labeled \file{m} or \file{m.so.6} otherwise.
484
485 \begin{example}
486 > mLib <- dynfind(c("msvcrt","m","m.so.6"))
487 \end{example}
488
489 \code{dynfind} also supports more exotic schemes, such as the Mac OS X Framework folders.
490 Depending on the library,
491 it is sometimes enough to have a single short filename - e.g. \code{"expat"} for
492 the \emph{Expat} library.
493
494 Internally, the dynamic linker interface of the OS is used via
495 \code{.dynload} and symbols get resolved via \code{.dynsym}:
496
497 \begin{example}
498 > sqrtAddr <- .dynsym(mLib, "sqrt")
499 \end{example}
500
501 Although R already contains support for loading shared libraries
502 and resolving of symbols, several issues have led to a reimplementation
503 of this part:
504
505 \begin{itemize}
506 \item System paths are not considered when loading libraries via
507 \code{dyn.load} of the package \pkg{base} but this is one part of the
508 search heuristics.
509 \item Automatic life-cycle management for loading and unloading of libraries
510 is a desired goal. Unloading of libraries should be done automatically
511 via finalizer code when no symbols are used anymore. External pointers
512 resolved via \code{.dynsym} hold a reference to the loaded library.
513 When all external pointers are garbage collected, the library handle is
514 not referenced anymore and the finalizer can unload the library.
515 \end{itemize}
516
517 \subsection{Wrapping C libraries}
518
519 Functional R interfaces to foreign code can be defined with small
520 R wrapper functions, which effectively delegates to \code{.dyncall}.
521 Each function interface is parameterized by a target address and
522 a matching call signature.
523
524 Since APIs often consist of hundreds of functions (see Table \ref{tab:libs}),
525 \code{dynbind} can create and install a batch of function wrappers for a library
526 with a single call by using a \emph{library signature} that
527 consists of concatenated function names and signatures separated by semicolons.
528
529 For example, to install wrappers to the C functions
530 \code{sqrt}, \code{sin} and \code{cos} from the math library, one
531 could use:
532
533 \begin{example}
534 > dynbind( c("msvcrt","m","m.so.6"),
535 + "sqrt(d)d;sin(d)d);cos(d)d;" )
536 \end{example}
537
538 The function call has the side-effect that three R wrapper functions are
539 created and stored in an environment which defaults to the global environment.
540 Let us review the \code{sin} wrapper (on the 64-bit Version of R running
541 on Mac OS X 10.6):
542 \begin{example}
543 > sin
544 function (...)
545 .dyncall.default(<pointer: 0x7fff81fd13f0>,
546 "d)d)", ...)
547 \end{example}
548
549 The wrapper directly uses the address of the resolved \code{sin} symbol.
550 In addition, the wrappers uses \code{.dyncall.default}, which is a
551 concrete selector of a particular calling convention, as outlined below.
552
553 \subsection{Calling Conventions}
554
555 Calling conventions specify how arguments and return values are passed
556 across sub-routines and functions at machine level. This information
557 is vital for interfacing with the binary interface of C libraries.
558 The package has support for multiple calling conventions.
559 Calling conventions are controlled by \code{.dyncall} via the named argument
560 \code{callmode} to specify a non-default calling convention.
561 Most current OSs and platforms only have support for a single \code{"default"} calling convention
562 at run-time.
563
564 An important exception is the Microsoft Windows platform
565 on the 32-bit \emph{i386} processor architecture:
566 While the default C calling convention on \emph{i386} is \code{"cdecl"} (which is the \code{"default"} on \emph{i386}),
567 system shared libraries from Microsoft such as \file{KERNEL32.DLL},
568 \file{USER32.DLL} and the OpenGL library \file{OPENGL32.DLL}
569 use the \code{"stdcall"} calling convention.
570 Only on this platform, the \code{callmode} argument has an effect and
571 selects the calling convention to be used when working on Microsoft Windows 32-Bit.
572 All other platforms currently ignore this argument.
573
574 \subsection{Handling of C Types in R}
575
576 C APIs often make use of high-level C \verb@struct@
577 and \verb@union@ types for exchanging information.
578 Thus, to make interoperability work at that level the handling of C
579 type information is addressed by the package.
580
581 Let us consider the following hypothetical example:
582 A user-interface library has a function to set the 2D coordinates
583 and dimension of a graphical output window. The coordinates are specified using a C
584 \code{struct Rect} data type and the C function receives a
585 pointer on that object:
586
587 \begin{smallverbatim}
588 void setWindowRect(struct Rect *pRect);
589 \end{smallverbatim}
590
591 The structure type is defined as follows:
592
593 \begin{smallverbatim}
594 struct Rect {
595 short x, y;
596 unsigned short w, h;
597 };
598 \end{smallverbatim}
599
600 Before we can issue a call, we have to allocate an object of that size and
601 initialize the fields with values encoded in C types, which are not
602 part of R data types.
603 The framework provides helper functions and objects to deal with C data types
604 in R. Type information objects can be created with a description of the
605 C aggregate structure.
606 First, we create a type information object in R for the \code{struct Rect}
607 C data type via \code{parseStructInfos} using a \emph{structure type signature}.
608
609 \begin{smallverbatim}
610 > parserStructInfos("Rect{ssSS}x y w h;")
611 \end{smallverbatim}
612
613 After registration, an R object named \code{Rect} is installed, which
614 contains C type information that corresponds to \code{struct Rect}.
615 The format of a \emph{structure type signature} has the following
616 pattern:
617
618 \begin{center}
619 \emph{Struct-name} \verb@'{'@ \emph{Field-types} \verb@'}'@ \emph{Field-names} \verb@';'@
620 \end{center}
621
622 \emph{Field-types} use the same type signature encoding as that of
623 \emph{call signatures} for argument and return types (Table \ref{tab:signature}).
624 \emph{Field-names} consist of a list of white-space separated names,
625 labeling each field component.
626
627 An instance of a C type can be allocated via \code{new.struct}:
628
629 \begin{smallverbatim}
630 > r <- new.struct(Rect)
631 \end{smallverbatim}
632
633 Finally, the extraction (\verb@'$'@, \verb@'['@) and
634 replacement(\verb@'$<-'@, \verb@'[<-'@) operators can be used to access
635 structure fields symbolically. During value transfer between R and C,
636 automatic conversion of values with respect to the underlying C field
637 type takes place.
638
639 \begin{smallverbatim}
640 > r$x <- -10 ; r$y <- -20 ; r$w <- 40 ; r$h <- 30
641 \end{smallverbatim}
642
643 In this example, R \code{numeric} values are converted on the fly to \code{signed}- and
644 \code{unsigned short} integers (usually 16-bit values). When the object gets printed on the prompt,
645 a detailed picture of the data object is given:
646
647 \begin{smallverbatim}
648 > r
649 struct Rect {
650 x: -10
651 y: -20
652 w: 40
653 h: 30
654 }
655 \end{smallverbatim}
656
657 At low-level, one can see that \code{r} is stored as an R \code{raw} vector object:
658
659 \begin{smallverbatim}
660 > r[]
661 [1] f6 ff ec ff 28 00 1e 00
662 attr(,"struct")
663 [1] "Rect"
664 \end{smallverbatim}
665
666 To follow the example, we issue a foreign function call to \code{setRect}
667 via \code{.dyncall} and pass in the \code{r} object,
668 assuming the library is loaded and the symbol is resolved and
669 stored in an external pointer object named \code{setWindowRectAddr}:
670
671 \begin{smallverbatim}
672 > .dyncall( setWindowRectAddr, "*<Rect>)v", r)
673 \end{smallverbatim}
674
675 We make use of a typed pointer expression \code{'*<Rect>'}
676 instead of the untyped pointer signature \code{'p'}, which would
677 also work but does not prevent users from passing other objects
678 that do not reference a \code{struct Rect} data object.
679 Typed pointer expressions increase type-safety and use the
680 pattern \verb@'*<@\emph{Type-Name}\verb@>'@.
681 The invocation will be rejected if the argument passed in is not
682 of C type \code{Rect}. As \code{r} is tagged with an attribute
683 \code{struct} that refers to \code{Rect}, the call will be issued.
684
685 Typed pointers can also occur as return types that - once the
686 type information is available - permit the manipulation of returned objects
687 in the same symbolic manner as above.
688
689 C \verb@union@ types are supported as well but use the \code{parseUnionInfos}
690 function instead for registration and a slightly different signature format:
691
692 \begin{center}
693 \emph{Union-name} \verb@'|'@ \emph{Field-types} \verb@'}'@ \emph{Field-names} \verb@';'@
694 \end{center}
695
696 The underlying low-level C type read- and write operations and conversions
697 from R data types are performed by the functions \code{.pack} and
698 \code{.unpack}. These can be used for various low-level operations as well,
699 such as dereferencing of pointers on pointers.
700
701 R objects such as external pointers and atomic raw, integer and numeric
702 vectors can be used as aggregate C types via the attribute \code{struct}.
703 To \emph{cast} a type in the style of C, one can use \code{as.struct}.
704
705 \subsection{Wrapping R functions as C callbacks}
706
707 Some C libraries, such as user-interface toolkits and I/O processing
708 frameworks, use \emph{callbacks} as part of their interface to enable
709 registration and activation of user-supplied event handlers.
710 A callback is a user-defined function that has a library-defined
711 function type. Call-backs are usually registered via a registration function
712 offered by the library interface and are activated later from within
713 a library run-time context.
714
715 \pkg{rdyncall} has support for wrapping ordinary R
716 functions as C callbacks via the function
717 \code{new.callback}. Callback wrappers are defined by a \emph{callback
718 signature} and the user-supplied R function to be wrapped. \emph{Callback signatures} look very
719 similar to \emph{call signatures} and should match the
720 functional type of the underlying C callback.
721 \code{new.callback} returns an external pointer that can
722 be used as a low-level function pointer for the registration as a C callback.
723 See Section \emph{Parsing XML using Expat} below for
724 applications of callback.
725
726 \subsection{Foreign Library Interface}
727
728 At the highest level, \pkg{rdyncall} provides the front-end function
729 \code{dynport} to dynamically setup an interface to a C Application
730 Programming Interface. This includes loading of the corresponding
731 shared C library and resolving of symbols. During the binding process,
732 a new R environment (this was a name space \citep{RNameSpace} till version 0.7.4) will be populated with thin R wrapper
733 objects that represent abstractions to C counter-parts such as
734 functions, pointer-to-functions, type-information objects for C struct and union
735 types and symbolic constant equivalents of C enums and macro defines.
736 The mechanism aims to work across platforms, given that the corresponding
737 shared libraries of a \emph{DynPort} have been installed in a
738 system standard location on the host.
739
740 An initial repository of \emph{DynPorts} is available in the package
741 that provides bindings for several popular C APIs, see Table \ref{tab:libs}
742 for examples of available bindings.
743
744 \section{Sample Applications}
745
746 We give two examples with different application contexts that demonstrate
747 the direct usage of C APIs from within R through the \pkg{rdyncall} package.
748 The R interface to C libraries looks very
749 similar to the actual C API. For details on the usage of a particular
750 C library, the programming manuals and documentation of the libraries
751 should be consulted.
752
753 Before loading R bindings via \code{dynport}, the shared library should
754 have been installed onto the system. Currently this is
755 to be done manually and the installation method depends on the target OS (See the manual
756 page about the 'rdyncall-demos' for details on this).
757 While \emph{OpenGL} is most often pre-installed on typical desktop-systems,
758 \emph{SDL} and \emph{Expat} sometimes have to be installed explicitly.
759
760 \subsection{OpenGL Programming in R}
761
762
763 In the first example, we make use of the Simple DirectMedia Layer library (SDL)
764 \citep{SDL} \citep{Pendleton:2003:GPS} \citep{www:sdl-alternative} and
765 the Open Graphics Library (OpenGL) \citep{Board05} to implement
766 a portable multimedia application skeleton in R.
767
768 We first need to load bindings to SDL and OpenGL via dynports:
769
770 \begin{example}
771 > dynport(SDL)
772 > dynport(GL)
773 \end{example}
774
775 Now we initialize the SDL library - in particular the video subsystem, and
776 open a window surface with a dimension of $640 x 480$ in 32-bit color
777 depths that has support for OpenGL rendering:
778
779 \begin{smallverbatim}
780 > SDL_Init(SDL_INIT_VIDEO)
781 > surface <- SDL_SetVideoMode(640,480,32,SDL_OPENGL)
782 \end{smallverbatim}
783
784 Next, we implement the application loop which updates the display repeatedly
785 and processes the event queue until a \emph{quit} request is
786 issued by the user via the window close button.
787
788 \begin{smallverbatim}
789 > mainloop <- function()
790 {
791 ev <- new.struct(SDL_Event)
792 quit <- FALSE
793 while(!quit) {
794 draw()
795 while(SDL_PollEvent(ev)) {
796 if (ev$type == SDL_QUIT) {
797 quit <- TRUE
798 }
799 }
800 }
801 }
802 \end{smallverbatim}
803
804 SDL event processing is implemented by collecting events that occur in a
805 queue.
806 Once per update frame, typical SDL applications poll the queue by
807 calling \code{SDL\_PollEvent} with a pointer to a user-allocated buffer
808 of C type \code{union SDL\_Event}.
809 Event records have a common type identifier which is set to \code{SDL\_QUIT}
810 when a quit event has occurred e.g. when users press a close button on a window.
811
812 Next, we implement our \code{draw} function making use of
813 the OpenGL 1.1 API. We clear the background with a blue color
814 and draw a light-green rectangle.
815
816 \begin{smallverbatim}
817 > draw <- function()
818 {
819 glClearColor(0,0,1,0)
820 glClear(GL_COLOR_BUFFER_BIT)
821 glColor3f(0.5,1,0.5)
822 glRectf(-0.5,-0.5,0.5,0.5)
823 SDL_GL_SwapBuffers()
824 }
825 \end{smallverbatim}
826
827 Now we can run the application mainloop.
828
829 \begin{smallverbatim}
830 > mainloop()
831 \end{smallverbatim}
832
833 To stop the application, we hit the close button of the window.
834 A similar example is also available via \code{demo(SDL)}. Here the \code{draw} function
835 displays a rotating 3D cube depict in Figure \ref{fig:demo_SDL}.
836
837 \begin{figure}
838 \centering
839 \includegraphics[scale=0.35]{img_SDL.png}
840 \caption{\label{fig:demo_SDL}
841 \code{demo(SDL)}}
842 \end{figure}
843
844 \code{demo(randomfield)} gives a slightly more scientific application of OpenGL and R:
845 Random fields of 512x512 size are generated via blending of 5000 texture mapped 2D gaussian kernels.
846 The \emph{frames per second} counter in the window title gives the number of matrices generated per second (see Figure \ref{fig:demo_randomfield}).
847 When clicking on the animation window, the current frame and matrix is passed to R and plotted.
848 While several dozens of matrices are computed per second using OpenGL,
849 it takes several seconds to plot a single matrix in R using \code{image()}.
850
851 \begin{figure}
852 \centering
853 \includegraphics[scale=0.35]{img_randomfield.png}
854 \caption{\label{fig:demo_randomfield}
855 \code{demo(randomfield)}}
856 \end{figure}
857
858 \subsection{Parsing XML using Expat}
859
860 In the second example, we use the Expat XML Parser library \citep{www:expat}
861 \citep{Kim:2001:TSJ} to implement a stream-oriented XML parser suitable
862 for very large documents.
863
864 The library, being very popular, is very likely to be
865 already installed on many OS distributions - otherwise it is
866 available from package repositories or can be built as a shared library
867 from source.
868
869 In Expat, custom XML parsers are implemented by defining
870 functions that are registered as callbacks to be invoked on
871 events that occur during parsing, such as the start and end of XML tags.
872 In our second example, we create a simple parser skeleton that
873 prints the start and end tag names.
874
875 First we load R bindings for Expat via \code{dynport}.
876
877 \begin{smallverbatim}
878 > dynport(expat)
879 \end{smallverbatim}
880
881 Next we create an abstract parser object via the C function
882 \code{XML\_ParserCreate} that receives one argument of type C string
883 to specify a desired character encoding that overrides the document
884 encoding declaration. We want to pass a null pointer (\code{NULL}) here.
885 In the \code{.dyncall} FFI C null pointer values for pointer types are
886 expressed via the R \code{NULL} value:
887
888 \begin{smallverbatim}
889 > p <- XML_ParserCreate(NULL)
890 \end{smallverbatim}
891
892 The C interface for registration of start and end-tag event handler
893 callbacks is given below:
894
895 \begin{smallverbatim}
896 /* Language C, from file expat.h: */
897 typedef void (*XML_StartElementHandler)
898 (void *userData, const XML_Char *name,
899 const XML_Char **atts);
900 typedef void (*XML_EndElementHandler)
901 (void *userData, const XML_Char *name);
902 void XML_SetElementHandler(XML_Parser parser,
903 XML_StartElementHandler start,
904 XML_EndElementHandler end);
905 \end{smallverbatim}
906
907 We implement the callbacks as R functions which print the event and
908 tag name. They are wrapped as C callback pointers via \code{new.callback}
909 using a matching \emph{callback signature}.
910 The second argument \code{name} of type C string in both callbacks, \code{XML\_StartElementHandler} and \code{XML\_EndElementHandler},
911 is of primnary interest ; this argument passes over the XML tag name.
912 C strings are handled in a special way by the \code{.dyncall} FFI, because they
913 have to be copied as R \code{character} objects.
914 The special type signature \code{'Z'} is used to denote a
915 C string type.
916 The other arguments are simply denoted as untyped pointers using \code{'p'}:
917
918 \begin{smallverbatim}
919 > start <- new.callback("pZp)v",
920 function(ignored1,tag,ignored2)
921 cat("Start tag:", tag, "\n")
922 )
923 > end <- new.callback("pZ)v",
924 function(ignored,tag)
925 cat("Stop tag:", tag, "\n")
926 )
927 > XML_SetElementHandler(p, start, end)
928 \end{smallverbatim}
929
930 To test the parser, we create a sample document stored in a \code{character}
931 object named \code{text} and pass it to the parse function \code{XML\_Parse}:
932
933 \begin{smallverbatim}
934 > text <- "<hello> <world> </world> </hello>"
935 > XML_Parse( p, text, nchar(text), 1)
936 \end{smallverbatim}
937
938 The resulting output is given below:
939
940 \begin{smallverbatim}
941 Start tag: hello
942 Start tag: world
943 End tag: world
944 End tag: hello
945 \end{smallverbatim}
946
947 Expat supports processing of very large XML documents in a chunk-based manner by
948 calling \code{XML\_Parse} several times, where the last argument is used
949 as indicator for the final chunk of the document.
950
951 \section{Architecture}
952
953 The core implementation of the FFI, callbacks and loading of
954 code are mainly based on the suite of libraries of the \emph{DynCall}
955 project \citep{dyncall}.
956
957 \subsection{Dynamic calls}
958
959 The FFI offered by \pkg{rdyncall} is based on the \pkg{dyncall}
960 library, which provides an abstraction for making arbitrary
961 machine-level calls with support for multiple calling conventions
962 and most C argument- and return-types. \footnote{\emph{Inline} structure types are currently not fully supported.}
963
964 For each processor architecture, the supported calling conventions
965 are abstracted in a \emph{Call Virtual Machine} (CallVM)
966 object. The \pkg{dyncall} library offers a universal C interface that can
967 be used from within scripting language interpreter contexts to build
968 up a machine-level call in a structured manner.
969
970 A CallVM comprises a state machine and a call kernel. The state machine
971 is implemented in C and keeps track of internal buffers for pre-loading argument
972 values that get arranged for specific storage locations, such as stack or
973 special register sets according to the processor architecture and the chosen
974 calling conventions.
975 The actual invocation of a foreign function call is conducted by
976 the Call Kernel - a small piece of code that is implemented in
977 Assembly and that provides a generic call facility for a particular
978 calling convention.
979 It prepares machine-level calls by copying data to registers and to the
980 call stack according to the relevant calling convention, and finally
981 executes the machine call to a target address.
982
983 From a scripting language interpreter perspective, the invocation of a
984 foreign function call through the CallVM is conducted in three consecutive
985 phases using the \pkg{dyncall} C API:
986
987 \begin{enumerate}
988 \item \emph{Setup Phase:} The desired calling convention has to be
989 chosen which, in most cases, is just the \emph{default C} calling convention.
990 However, more specialized and platform-specific calling conventions are
991 available as well, in particular for the 32-Bit Windows OS.
992 \item \emph{Argument Loading Phase:} Arguments are passed in a
993 \emph{left-to-right} order according to the declaration of the C/C++
994 function/method type declaration. Argument values are stored in buffers
995 according to the processor architecture and selected calling convention.
996 \item \emph{Call and Return-Value Receive Phase:}
997 A return-type specific call function is chosen and the target address
998 of the foreign code is passed, which gets called via the Call Kernel.
999 \end{enumerate}
1000
1001 The architecture makes it straight-forward to implement a FFI
1002 for a dynamic language interpreter using a text parser for call signatures
1003 to drive the conversion of arguments and results.
1004 Similar FFIs with a text-based interface have been implemented for other language
1005 interpreters such as Ruby, Python and Lua. See the DynCall source repository \citep{dyncall}.
1006
1007 Both the C interface of dyncall and the signature format use the abstract
1008 C/C++ type system and give no indication about the effective size of
1009 a particular type. In experiments with several C APIs bound via \pkg{rdyncall}
1010 it turns out that the signatures do work cross-platform,
1011 if the fundamental type definitions of the C API do not change across platforms.
1012 In our tests and the presented examples, a wide range of
1013 C APIs have this property and type signatures are valid across
1014 platforms even when switching between 32- and 64-bit platforms.
1015
1016 \subsection{Dynamic callbacks}
1017
1018 The \pkg{dyncallback} library provides a framework to implement
1019 dynamic callbacks for language interpreters to wrap scripting functions
1020 as C function pointers.
1021 The framework offers a universal C interface for callback handler that
1022 is implemented once for a particular interpreter.
1023 The handler receives callback calls from C and forwards the call,
1024 including conversion of arguments, to a scripting function.
1025
1026 Handlers need to access machine-level arguments whose location
1027 can be on the stack, or in registers,
1028 depending on the processor architecture and calling convention.
1029 For that reason, the handler interface receives an abstract argument
1030 iterator that gives structured access to the arguments for
1031 passing over to the high-level language.
1032 Call-backs are created via an interface that pools a handler,
1033 language context, scripting function reference,
1034 callback type-information and other user data into a
1035 \emph{single} native C function pointer, such that even very
1036 low-level C callbacks without user-supplied user-data can be
1037 addressed with the underlying technique. \footnote{This includes
1038 callbacks for sort routines of the Standard C library which lack user-data.}
1039
1040 \subsection{Portability and Stability}
1041
1042 The requirements for porting the \emph{DynCall} libraries to
1043 a new processor and/or platform are high: The calling conventions of a target processor platform have to be studied in detail,
1044 state machines have to be implemented in C and a small amount of code has to be written in
1045 Assembly which can be even non-portable across build tools on the same platform.
1046 Nevertheless \pkg{dyncall} (as of version 0.7) has support for many processor architectures such as
1047 Intel i386 (x86), AMD 64 (x64), PowerPC 32-bit,ARM (including Thumb extension), MIPS 32/64-bit and SPARC 32/64-bit
1048 including support for several platform-, processor- and compiler-specific calling conventions.
1049 \pkg{dyncallback} also supports major processor architectures such as Intel i386 (x86), AMD 64 (x64) and ARM and offers
1050 partial support for PowerPC 32-bit (support for Mac OS X/Darwin).
1051 Besides the processor architecture, the libraries are also explicitly ported and tested on
1052 various OS such as Linux, Mac OS X, Windows, the BSD family, Solaris, Haiku, Minix and Plan9.
1053 Support for embedded platforms such as Playstation Portable, Nintendo DS and iPhone OS is available as well.
1054
1055 \emph{DynCall} contains a suite of testing tools for quality assurance. Included are test-case generators written in
1056 Lua and Python. Extreme call and callback scenarios are tested here to ensure correct passing of arguments and results.
1057 Before a release, the libraries and tests are built for a large set of architectures on
1058 \pkg{DynOS} \citep{dynos} - a batch-build system using full system emulators such as
1059 \pkg{QEmu}\citep{qemu} and \pkg{GXEmul}\citep{gxemul} and various operating-system images
1060 to test release candidates and create pre-built binary releases of the library.
1061
1062 \subsection{Text-based Signature Interfaces}
1063
1064 A common property of the service interface presented here is the use of
1065 signature text formats. Signatures are used
1066 as descriptors for types, such as foreign function calls, callbacks and
1067 aggregate data types.
1068 The reasons that lead to the use of signatures as a high-level user-interface
1069 to interact with such services are given next:
1070
1071 \begin{enumerate}
1072 \item Cross-language interface: Text format interfaces are available across
1073 high-level languages. Examples for cross-language text-based
1074 interfaces include regular expressions or \code{printf}-style formatted output
1075 descriptions.
1076
1077 \item Developer-friendly:
1078 The simplicity and compactness of the text-format enables developers
1079 to bridge with foreign code in interactive and rapid development
1080 sessions.
1081 C type signatures can be derived by hand with minimum effort:
1082 Fundamental types are encoded with a single character and the
1083 upper-case encodes an \code{unsigned} type.
1084
1085 \item Machine-neutral:
1086 In contrast to binary encoded type libraries, the data format is not affected
1087 by the endian model of the underlying platform.
1088
1089 \item Parser-friendly:
1090 The signature format can be used as driver code to perform foreign function
1091 calls. Implementations of parsers match the sequential
1092 design of \pkg{dyncall}'s CallVM and \pkg{dyncallback}'s argument iterator interface.
1093 \end{enumerate}
1094
1095 \subsection{Creation of DynPort files}
1096
1097 In this section we describe the tool-chain that creates the
1098 universal bindings called \emph{DynPort}. The process described
1099 here is applied once on a build machine, the generated output
1100 is used later at run-time across platforms to drive the
1101 dynamic linkage and binding procedure.
1102 \emph{DynPort} files can be created automatically from
1103 C header files using a tool-chain as depicted in
1104 Figure \ref{fig:gen_dynport}.
1105
1106 \begin{figure}
1107 \centering
1108 \includegraphics[scale=0.45]{img_gen_dynport.pdf}
1109 \caption{\label{fig:gen_dynport}
1110 Tool-chain to create \emph{DynPort} files from C headers}
1111 \end{figure}
1112
1113 The tool-chain comprises several freely available components that
1114 are briefly described next:
1115 \pkg{GCC-XML} \citep{gccxml} is a modified version of the GCC compiler
1116 which translates C sources to XML document.
1117 \pkg{xsltproc}, distributed as part of the \pkg{libxslt} library
1118 \citep{libxslt}, is a XSLT processor that transforms XML documents to
1119 XML, text or binary formats according to style-sheets written in
1120 the \emph{XSL Transformations} \citep{Clark:01:XTV} language.
1121
1122 To extract library binding specifications, a main C source file is created that
1123 consists of one or more \code{\#include} statements that
1124 reference library and/or system header files to process.
1125 The header files should have been previously installed on
1126 the build machine.
1127 In a preprocessing phase, the GNU C Macro Processor is used to process
1128 all \code{\#include} statements using standard system search paths
1129 to create a concatenated \emph{All-In-One} source file free of any
1130 \code{\#include} statements.
1131 GCC-XML transforms C header declarations to XML.
1132 A XSL style-sheet implements the transformation of XML to
1133 type signature formats using a XSLT processor.
1134 C Macro \code{\#define} statements are handled separately by a custom
1135 C Preprocessor implemented in C++ using the boost wave library \citep{boostwave}.
1136 An optional filter stage is used to include only elements with
1137 a certain pattern such as a common prefix usually found in many
1138 libraries e.g. '\code{SDL\_}'.
1139 In a last step, the various fragments are assembled into a single
1140 text-file which represents the \emph{DynPort} file.
1141 The overall build process is managed by \emph{make} files and a repository of recipes
1142 has been setup to extend support for additional
1143 dynports and libraries in a structured and coordinated way.
1144
1145
1146 \section{Summary and Outlook}
1147
1148 This paper introduces the \pkg{rdyncall} package (Version 0.7.3 on CRAN as of this writing) that contributes an improved Foreign Function Interface for R.
1149 The FFI facilitates \emph{direct} invocation of foreign functions \emph{without} the need to compile additional wrapper in C.
1150 Based on the FFI, a dynamic cross-platform linkage framework to wrap and access \emph{whole} C interfaces of native libraries from R
1151 is discussed.
1152 Instead of \emph{compiling} bindings for every library-and-language combination,
1153 R bindings of a library are created dynamically at run-time in a data-driven manner via
1154 \emph{DynPort} files - a cross-platform universal type information format.
1155 C libraries are made accessible in R as though they were extension packages and
1156 the R interface looks very similar to that of C.
1157 This enables system-level programming in R and brings a new wave of possibilities for R developers
1158 such as using OpenGL directly in R across platforms as described in the example.
1159 An initial repository of \emph{DynPort}s for standard cross-platform portable
1160 C libraries comes with the package.
1161
1162 The implementation is based on libraries from the \emph{DynCall} project that implement non-trivial
1163 facilities such as an abstraction to machine-level function calls supporting
1164 multiple calling conventions and the handling of C callbacks from within scripting language interpreter environments.
1165 The libraries have been ported across major R platforms.
1166 Work is in progress to support missing architectures in \pkg{dyncallback} such as PowerPC System V 32-bit, PowerPC 64-bit, and, 32/64-bit MIPS and SPARC architectures.
1167 The handling of foreign aggregate data types, which is currently implemented in R and C,
1168 is planned to be reimplemented in portable C as part of \emph{DynCall}, in cooperation with the developers of \emph{BridJ}\citep{bridj}.
1169 Currently, \emph{DynPort} files are written as R scripts with
1170 inline text chunks created from the \emph{DynPort} tool chain.
1171 For the Lua Programming Language \citep{SPE::IerusalimschyFF1996}, a similar framework named \pkg{luadyncall} is in
1172 development using a language-neutral format for \emph{DynPort} files.
1173 The need to install additional shared libraries still represents a hurdle for ordinary R users.
1174 We plan to find a common abstraction layer for installation systems, package managers and software distribution services
1175 across OS-distributions, and to integrate meta installation information into the \emph{DynPort} file format.
1176
1177 The \emph{DynPort} facility in \pkg{rdyncall} consitutes an initial step in building up an infrastructure between
1178 scripting languages and C libraries.
1179 Analogous to the way in which R users enjoy quick access to the large pool of R software
1180 managed by CRAN, we envision an archive network in which C library developers can distribute
1181 their work across languages, and users get quick access to the pool of C libraries from within
1182 scripting languages via automatic installation of precompiled components and using
1183 universal type information for cross-platform and cross-language dynamic bindings.
1184
1185 \bibliography{FLI}
1186
1187 \end{document}