"VL-e PoC Documentation::Adding R packages", owner=>"Jan Just Keijser", email=>"janjust@nikhef.nl")); ?>
The R toolkit is part of the PoC distribution and has been compiled without
any external R packages. Some applications might require a custom R package to
be loaded. This document serves as a tutorial on how to build and add a
custom R package and how to submit a job on the VL-e PoC environment that
uses (requires) the package.
For this tutorial the package RMySQL
was chosen, but the same
approach applies to other R packages. A huge repository of R packages can be
found on The Comprehensive R-Archive Network.
This HOWTO applies to R 2.4.0, as found in the PoC R2 distribution.
See the PoC R1 version of this
HOWTO for instructions on how to do this for R 2.2.0.
The RMySQL
package can be downloaded from this
webpage.
At the time of writing the latest version was 0.6-0.
A list of archived versions can be found
here.
The website indicates that RMySQL
depends on DBI
so that
one also needs to be downloaded:
DBI-0.2-3
After downloading the packages we install and build the packages on a system on
which the Vl-e PoC distribution is installed. This is done in a regular user's
home directory.
For the RMySQL
package it is required to have the MySQL library
libmysqlcient.so
installed. This file is part of the
MySQL-shared
RPM that is part of CentOS/Scientific Linux 3 or the
mysql
RPM that is part of CentOS/Scientific Linux 4.
On RHEL4 it can be installed using
# yum install mysqlFor older RHEL3 systems, which will shortly be no longer supported, use
# apt-get install MySQL-sharedNext, we build the software in the user's home directory.
mkdir ~/src mkdir ~/R cd ~/src tar xzvf DBI_0.2-3.tar.gz tar xzvf RMySQL_0.6-0.tar.gzThen we build the R packages:
R CMD INSTALL --no-docs -l ~/R DBI R CMD INSTALL --no-docs -l ~/R RMySQLwhich should result in output similar to
* Installing *source* package 'DBI' ... ** R ** inst ** save image [1] TRUE <snip> ** building package indices ... * DONE (DBI) * Installing *source* package 'RMySQL' ... checking for gcc... gcc checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ANSI C... none needed checking how to run the C preprocessor... gcc -E checking for compress in -lz... yes checking for getopt_long in -lc... yes checking for mysql_init in -lmysqlclient... no checking for egrep... grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking mysql.h usability... no checking mysql.h presence... no checking for mysql.h... no checking for mysql_init in -lmysqlclient... no checking for mysql_init in -lmysqlclient... no checking for mysql_init in -lmysqlclient... no checking for mysql_init in -lmysqlclient... yes mysqlclient found in -L/usr/lib/mysql checking /usr/local/include/mysql/mysql.h usability... no checking /usr/local/include/mysql/mysql.h presence... no checking for /usr/local/include/mysql/mysql.h... no checking /usr/include/mysql/mysql.h usability... yes checking /usr/include/mysql/mysql.h presence... yes checking for /usr/include/mysql/mysql.h... yes configure: creating ./config.status config.status: creating src/Makevars ** libs gcc -I/opt/vl-e/r_2.4/lib/R/include -I/usr/include/mysql -I/usr/local/include -fpic -O2 -g -march=i386 -mcpu=i686 -c RS-DBI.c -o RS-DBI.o gcc -I/opt/vl-e/r_2.4/lib/R/include -I/usr/include/mysql -I/usr/local/include -fpic -O2 -g -march=i386 -mcpu=i686 -c RS-MySQL.c -o RS-MySQL.o gcc -shared -L/usr/local/lib -o RMySQL.so RS-DBI.o RS-MySQL.o -L/usr/lib/mysql -lmysqlclient -lz -L/opt/vl-e/r_2.4/lib/R/lib -lR ** R ** inst ** preparing package for lazy loading Loading required package: DBI Creating a new generic function for "format" in "RMySQL" Creating a new generic function for "print" in "RMySQL" <snip> ** building package indices ... * DONE (RMySQL)
To use the packages on the grid, we need to create a distribution tarball that we can send along with the grid job. To do this, we strip some unnecessary files from the build we have just made:
cd ~/R rm R.css cd DBI rm -rf NEWS TODO doc man cd ../RMySQL rm -rf NEWS README* THANKS TODO WindowsPath.txt INSTALL INSTALL.win rm -rf doc gnu man newFunctionNames.txtNext, we add any external dependencies. For the RMySQL package we need the
libmysqlclient.so.14
file (see RHEL4 sample above). We cannot assume
that this file is present on the worker nodes where our grid job will run,
so we add the file to our installation package:
cp /usr/lib/mysql/libmysqlclient.so.14 ~/R/RMySQL/libs
cd tar czvf RmySQL-libs.tar.gz RThe resulting file can be downloaded here.
In order to use our custom R package we need to send the package tarball
along with the rest of our grid job. An InputSandbox can contain a few
megabytes and our tarball is only a few hundred kilobytes. If the input
sandbox were to become too large then we would have to resort to using
a VO_SW directory, but that is outside the scope of this tutorial.
To add our custom R package the following .jdl file is used:
Executable = "R.sh"; Stdoutput = "std.out"; StdError = "std.err"; InputSandbox = {"R.sh", "RMySQL-libs.tar.gz", "Rtest.R" }; OutputSandBox = {"std.out","std.err"};Alternatively you can download the file directly here.
R.jdl
file lists
R.sh
which is listed below;
R.sh
script unpacks the tarball containing our custom
R package, sets up a few environment variables and then runs the R test script:
#!/bin/bash tar xzf RMySQL-libs.tar.gz export R_LIBS=$PWD/R export LD_LIBRARY_PATH=$PWD/R/RMySQL/libs R --no-save < Rtest.RAlternatively you can download the file directly here.
tar
command will create a directory 'R' in the current
directory (aka $PWD);
export R_LIBS
command tells R to look for packages
in the directory $PWD/R;
export LD_LIBRARY_PATH
command is required to allow R to
find the libmysqlclient.so.1[24]
library.
Rtest.R
script is a very simple test script to verify that we
can connect to a MySQL database:
require(RMySQL) con <- dbConnect(MySQL(), user="db-user", password="db-passwd", dbname="db-name", host="db-host") dbGetQuery(con, "select * from mod_users where username='janjust'")Alternatively you can download the file directly here.
Rtest.R
script is run like any other R script. Please note
that the --no-save
parameter is useful to make sure the script
finished automatically without asking any questions about saving workspaces.
During the development of this tutorial several issues showed up. Here are a few tips and tricks on how to troubleshoot such issues:
--no-save
correctly;
libmysqlcient.so.1[24]
file
is not installed by default on all grid nodes (nor should it). So I had add the
.so file to the tarball, modify the R.sh
script and test all over
again.