Run R scripts on the OSPool¶
This tutorial describes how to run R scripts on the OSPool. We'll first run the program locally as a test. After that we'll create a submit file, submit it to the OSPool using OSG Connect, and look at the results when the jobs finish.
Run R scripts on OSG¶
Access R on the submit host¶
First we'll need to create a working directory, you can either
$ tutorial ROR
- type the following: $ mkdir tutorial-R; cd tutorial-R
R is run using containers on the OSG. To test it out on the submit node, we can run:
$ singularity shell \ /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-r:3.5.0
Other Supported R Versions¶
To see a list of all Singularity containers containing R, look at the list of OSPool Supported Containers
The previous command sometimes takes a minute or so to start. Once it starts, you should see the following prompt:
Now, we can try to run R by typing
R in our terminal:
$ Singularity osgvo-r:3.5.0:~> R R version 3.5.1 (2018-07-02) -- "Feather Spray" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. >
You can quit out with
> q() Save workspace image? [y/n/c]: n Singularity osgvo-r:3.5.0:~>
Great! R works. We'll leave the container running for the next step. See below on how to exit from the container.
Run R code¶
Now that we can run R, let's create a small script. Create the file
hello_world.R using a text editor like nano or vim that contains the following:
We will then exit the text editing program and run this script. To run this script, we will use the
Rscript command (equivalent to
R CMD BATCH) which accepts the script as command line argument. This approach makes
R much less verbose, and it's easier to parse the output later. The command in our script will look like this:
Singularity osgvo-r:3.5.0:~> Rscript hello_world.R
If this works, we will have
 "Hello World!" printed to our terminal. Once we have this output, we'll exit the container for now with
Singularity osgvo-r:3.5.0:~> exit $
Build the HTCondor job¶
To prepare our R job to run on the OSPool, we need to create a wrapper for our R environment, based on the setup we did in previous sections. Create the file
R-wrapper.sh with this text inside the file:
1 2 3 4 5 6 7
Once done, exit the file. We will now change the permissions on the wrapper script:
$ chmod +x R-wrapper.sh
Now that we've created a wrapper, let's build a HTCondor submit file around it. Using a text editor, create a file called
R.submit with the following text inside it:
universe = vanilla log = R.log.$(Cluster).$(Process) error = R.err.$(Cluster).$(Process) output = R.out.$(Cluster).$(Process) +SingularityImage = "/cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-r:3.5.0" executable = R-wrapper.sh transfer_input_files = hello_world.R request_cpus = 1 request_memory = 1GB request_disk = 1GB queue 1
The path you put in the
+SingularityImage option should match whatever you used
to test R above.
R.submit file may have included a few lines that you are unfamiliar with. For example,
$(Process) are variables that will be replaced with the job's cluster and process id. This is useful when you have many jobs submitted in the same file. Any output and errors will be placed in a separate file for each job.
Also, did you see the
transfer_input_files line? This tells HTCondor what files to transfer with the job to the worker node. You don't have to tell it to transfer the executable, HTCondor is smart enough to know that the job will need that. But any extra files, such as our R script file, will need to be explicitly listed to be transferred with the job. You can use transfer_input_files for input data to the job, as shown in Transferring data with HTCondor.
Submit and analyze the output¶
Finally, submit the job!
$ condor_submit R.submit Submitting job(s).......... 1 job(s) submitted to cluster 3796250. $ condor_q user $ condor_q -- Schedd: login03.osgconnect.net : <126.96.36.199:9618?... @ 05/13/19 09:51:04 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS user ID: 3796250 5/13 09:50 _ _ 1 1 3796250.0 ...
You can follow the status of your job cluster with the
connect watch command, which shows
condor_q output that refreshes each 5 seconds. Press
control-C to stop watching.
Since our jobs prints to standard out, we can check the output files. Let's see what one looks like:
$ cat R.out.3796250.0  "Hello World!"
We recommend you read about how to steer your jobs with HTCondor job requirements - this will allow you to select good resources for your workload. Please see this page
For assistance or questions, please email the OSG User Support team at email@example.com.