3. DataSHIELD filters

We just created our first DataSHIELD package. Cool. Now, let’s continue this journey by taking a look at the filter parameters, how to use them, and how to create new ones.

If the reader has some prior experience with DataSHIELD, the concept may sound familiar, if not, there is no problem. On the base package, there are a collection of filters that affect the behaviour of different functions (of that same package). More precisely they are there to prevent disclosure of individual data, aka the core objective of the technology. These filter values can be configured by the data owners at their own likings.

This might bring some light to the developers. It is your job to make an analysis package secure. It is your job to implement disclosure traps. DataSHIELD just brings you an infrastructure with the tools to build those security layers.

Having said that, when we build a new package we have two choices, 1 use the current filter values from dsBase, 2 create our own new filter variables.

Note

Note that if we choose to use dsBase filter values, that package has to be available on the same environment (Opal Rock Profile or Armadillo Analisis environment) where our new package will run. Generally speaking this will be the case, as the base package is almost mandatory to perform any kind of task, however, it does not need to be like this.

Before digging into the two options we just introduced. Let’s take a look at what exactly are these “filter values”, how are defined, and how do they fit within the structure of a regular R package.

Options slot of an R package

Filter values are only available on the server, as we do not want the analysts to be able to manipulate them, just the data owners. For that reason they are defined on the server packages. More precisely they are defined on the DATASHIELD file of the R package. Please note that as discussed on the previous section, this information could go to the DESCRIPTION file wit the declaration of aggregate and assign functions as additional metadata. Take a look for example at the dsBase DESCRIPTION file. During this workshop we will be using the DATASHIELD file.

Now let’s continue by checking how to access those values within our package functions.

Accessing a filter value

In order to retrieve the value of a filter on our package functions we will use the getOption function. If we want to retrieve the value of the my.new.filter filter, we will do:

getOption("my.new.filter")
Note

If we try to retrieve a value of an option that is not defined, it will return NULL, this can be convenient to handle exceptions.

Following that we can see that if we want to use a filter from dsBase, we just have to take a look at the available ones and use them:

getOption("default.nfilter.glm")

Also if we define our own filter (on our package DATASHIELD file):

Options:
    nice.filter=5

Here we see that the filter contains a numeric variable. However they are not limited to that, they can also contain strings and other R data types.

Options:
    even.nicer.filter="hello"

We retrieve it by:

getOption("nice.filter")

Exercise 3

For this exercise, we will continue working on our new packges.

  1. Create a new filter named myFilter. Set the default value as numeric 6.

  2. Modify the getColnamesDS function to retrieve the myFilter value.

  3. If the number of columns of the data frame is less than myFilter return an error.

In R we can throw errors (or exceptions) using the stop function.

E.g.

stop("This is my error message")
  1. Test your new development using DSLite. Does it throw the error we expect?