τ-Argus from R - tabular version
¶
Package {rtauargus} ¶
Source: Package {rtauargus} ¶
vignettes/rtauargus.Rmd
rtauargus.Rmd
Introduction
Presentation of the package
The rtauargus package offers an R interface for τ-Argus. It allows to :
- create inputs (asc, tab and rda files) from R data;
- generate the sequence of instructions to be executed in batch mode (arb file);
- launch a batch τ-Argus ;
- retrieve the results in R.
The syntax of some of the arguments closely matches the batch syntax of τ-Argus. This allows a large number of functions to be used without multiplying the arguments of the functions. The package will also be able to adapt more easily to possible modifications of the software (new methods available, additional options…). The syntax rules for writing batch are given in the τ-Argus reference manual and will be specified in a dedicated help section.
The package was developed on the basis of open source versions of τ-Argus (versions 4.2 and above), in particular the latest version available at the time of development (4.2.3).
It is not compatible with version 3.5.**_
Setup
The following setup should be made before the first use (and no longer afterwards)
τ-Argus
rtauargus fonctions using τ-Argus requires that the software can be used from the workstation. The github repository of τ-Argus is here: https://github.com/sdcTools/tauargus. The latest releases can be downloaded here: https://github.com/sdcTools/tauargus/releases.
Quick Start
This section explains how to perform a minimal configuration of the package and how to apply suppressive methods in a single instruction.
Location of τ-Argus
When loading the package, the console displays some information:
In particular, a plausible location for the τ-Argus software is predefined. This can be changed for the duration of the R session, as follows:
loc_tauargus <- "Y:/Logiciels/TauArgus/TauArgus4.2.3/TauArgus.exe"
options(rtauargus.tauargus_exe = loc_tauargus)
With this small adjustment done, the package is ready to be used.
For a more customized configuration, see the specific vignettes
Protecting a single tabular data with tab_rtauargus()
function
The tab_rtauargus()
function performs a full processing
to protect the table and retrieves the results immediately in R.
Completely abstracting from the inner workings of τ-Argus, it allows the entire processing to be made in a single instruction. All intermediate files are created in a local directory.
tab_rtauargus()
requires the following arguments :
-
tabular
: a data.frame containing the table ; -
dir_name
: the directory for the outputs; -
files_name
: all the τ-Argus files will be named with it (different extensions); -
explanatory_vars
: the name of all explanatory variables intabular
; -
secret_var
orsafety_rules
: the way to apply primary suppression (explain later) -
totcode
: the code for total of each explanatory variable intabular
All the arguments and their default options will be detailed ( where?).
Minimal example
For the following demonstration, a fictitious table will be used:
act_size <-
data.frame(
ACTIVITY = c("01","01","01","02","02","02","06","06","06","Total","Total","Total"),
SIZE = c("tr1","tr2","Total","tr1","tr2","Total","tr1","tr2","Total","tr1","tr2","Total"),
VAL = c(100,50,150,30,20,50,60,40,100,190,110,300),
N_OBS = c(10,5,15,2,5,7,8,6,14,20,16,36),
MAX = c(20,15,20,20,10,20,16,38,38,20,38,38)
)
act_size
#> ACTIVITY SIZE VAL N_OBS MAX
#> 1 01 tr1 100 10 20
#> 2 01 tr2 50 5 15
#> 3 01 Total 150 15 20
#> 4 02 tr1 30 2 20
#> 5 02 tr2 20 5 10
#> 6 02 Total 50 7 20
#> 7 06 tr1 60 8 16
#> 8 06 tr2 40 6 38
#> 9 06 Total 100 14 38
#> 10 Total tr1 190 20 20
#> 11 Total tr2 110 16 38
#> 12 Total Total 300 36 38
As primary rules, we use the two following ones:
- The n-k dominance rule with n=1 and k = 85
- The minimum frequency rule with n = 3 and a safety range of 10.
To get the results for the dominance rule, we need to specify the
largest contributor to each cell, corresponding to the MAX
variable in the tabular data.
ex1 <- tab_rtauargus(
act_size,
dir_name = "tauargus_files/ex1",
files_name = "ex1",
explanatory_vars = c("ACTIVITY","SIZE"),
safety_rules = "FREQ(3,10)|NK(1,85)",
value = "VAL",
freq = "N_OBS",
maxscore = "MAX",
totcode = c(ACTIVITY="Total",SIZE="Total")
)
#> Start of batch procedure; file: Z:\SDC\OutilsConfidentialite\rtauargus\vignettes\tauargus_files\ex1\ex1.arb
#> <OPENTABLEDATA> "Z:\SDC\OutilsConfidentialite\rtauargus\vignettes\tauargus_files\ex1\ex1.tab"
#> <OPENMETADATA> "Z:\SDC\OutilsConfidentialite\rtauargus\vignettes\tauargus_files\ex1\ex1.rda"
#> <SPECIFYTABLE> "ACTIVITY""SIZE"|"VAL"||
#> <SAFETYRULE> FREQ(3,10)|NK(1,85)
#> <READTABLE> 1
#> Tables have been read
#> <SUPPRESS> MOD(1,5,1,0,0)
#> Start of the modular protection for table ACTIVITY x SIZE | VAL
#> End of modular protection. Time used 0 seconds
#> Number of suppressions: 2
#> <WRITETABLE> (1,4,,"Z:\SDC\OutilsConfidentialite\rtauargus\vignettes\tauargus_files\ex1\ex1.csv")
#> Table: ACTIVITY x SIZE | VAL has been written
#> Output file name: Z:\SDC\OutilsConfidentialite\rtauargus\vignettes\tauargus_files\ex1\ex1.csv
#> End of TauArgus run
By default, the function displays in the console the logbook content
in which user can read all steps run by τ-Argus. This can be retrieved
in the logbook.txt file. With verbose = FALSE
, the function
can be silenced.
By default, the function returns the original dataset with one
variable more, called Status
, directly resulting from
τ-Argus and describing the status of each cell as follows:
-A
: primary secret cell because of frequency rule;
-B
: primary secret cell because of dominance rule (1st
contributor); -C
: primary secret cell because of frequency
rule (more contributors in case when n>1); -D
: secondary
secret cell; -V
: valid cells - no need to mask.
ex1
#> ACTIVITY SIZE VAL N_OBS MAX Status
#> 1 01 Total 150 15 20 V
#> 2 01 tr1 100 10 20 V
#> 3 01 tr2 50 5 15 V
#> 4 02 Total 50 7 20 V
#> 5 02 tr1 30 2 20 A
#> 6 02 tr2 20 5 10 D
#> 7 06 Total 100 14 38 V
#> 8 06 tr1 60 8 16 D
#> 9 06 tr2 40 6 38 B
#> 10 Total Total 300 36 38 V
#> 11 Total tr1 190 20 20 V
#> 12 Total tr2 110 16 38 V
All the files generated by the function are written in the specified
directory (dir_name
argument). The default format for the
protected table is csv but it can be changed. All the τ-Argus files
(.tab, .rda, .arb and .txt) are written in the same directory, too. To
go further, you can consult the latest version of the τ-Argus manual is
downloadable here: https://research.cbs.nl/casc/Software/TauManualV4.1.pdf.
Example with hierarchy and primary secret already done
Data
For this example, we’d like to protect a table in which companies’ turnover is broken down by business sectors and size. To load the data, do:
data("turnover_act_size")
head(turnover_act_size)
#> # A tibble: 6 x 5
#> ACTIVITY SIZE N_OBS TOT MAX
#> <chr> <chr> <int> <dbl> <dbl>
#> 1 AZ Total 405 44475. 6212.
#> 2 BE Total 12878 24827613. 1442029.
#> 3 FZ Total 28043 8907311. 1065833.
#> 4 GI Total 62053 26962063. 3084242.
#> 5 JZ Total 8135 8584917. 3957364.
#> 6 KZ Total 8140 62556596. 10018017.
The meaning of each variable is:
-ACTIVITY
: business sector, hierarchical variables with
three levels described in the activity_corr_table
dataset.
The root is noted “Total”; -SIZE
: size of the companies
(Number of employees in three categories + overall category “Total”);
-N_OBS
: Frequency, number of companies; -TOT
:
turnover value in euros; -MAX
: turnover of the company
which contributes the most to the cell.
Hierarchy’s file
Before performing the tab_rtauargus()
function, we have
to prepare the hierarchical information into the appropriate format for
τ-Argus, .i.e. a .hrc
file. From a correspondence
table, the write_hrc2()
function does the job for you.
Here, the correspondence table describes the nesting of the three levels of business sectors, from the most aggregated to the least one:
data(activity_corr_table)
head(activity_corr_table)
#> A10 A21 A88
#> 1 AZ A 01
#> 2 AZ A 02
#> 3 AZ X X
#> 4 BE B 06
#> 5 BE B 07
#> 6 BE B 08
hrc_file_activity <- write_hrc2(
corr_table = activity_corr_table,
file_name = "hrc/activity.hrc"
)
Primary secret
In this example, we’ll apply the primary secret ourselves, i.e. not with the help of τ-Argus. The idea is to use τ-Argus with an apriori file. For that purpose, we create a boolean variable to specify which cells don’t comply with the primary secret rules. Using the same rules as before, we get:
Running τ-Argus
Two arguments has to be added to the tab_rtauargus()
function:
-secret_var
, indicating the name of the variable in
tabular
containing the primary secret (apriori)
information; -hrc
, indicating the name of the hierarchy
file to use for ACTIVITY
variable.
Since the primary suppression was already specified, no need to use
the arguments, safety_rules
and maxscore
. The
first one is set by default to “MAN(10)”, so as to say that a 10%
Interval Protection is applied.
By default, tab_rtauargus()
runs the Modular method to
perform the secondary secret. Here, we choose to use the Optimal method
by changing the suppress
argument.
ex2 <- tab_rtauargus(
turnover_act_size,
dir_name = "tauargus_files/ex2",
files_name = "ex2",
explanatory_vars = c("ACTIVITY","SIZE"),
value = "TOT",
freq = "N_OBS",
secret_var = "is_secret_prim",
hrc = c(ACTIVITY = hrc_file_activity),
totcode = c(ACTIVITY="Total",SIZE="Total"),
suppress = "OPT(1,5)",
verbose=FALSE
)
Result
str(ex2)
#> 'data.frame': 414 obs. of 9 variables:
#> $ ACTIVITY : chr "01" "01" "02" "02" ...
#> $ SIZE : chr "Total" "tr1" "Total" "tr1" ...
#> $ N_OBS : int 18 18 387 381 6 1 1 4 4 84 ...
#> $ TOT : num 853 853 43623 35503 8120 ...
#> $ MAX : num 303 303 6212 6212 4812 ...
#> $ is_secret_freq: logi FALSE FALSE FALSE FALSE FALSE TRUE ...
#> $ is_secret_dom : logi FALSE FALSE FALSE FALSE FALSE TRUE ...
#> $ is_secret_prim: logi FALSE FALSE FALSE FALSE FALSE TRUE ...
#> $ Status : chr "V" "V" "V" "V" ...
table(ex2$Status)
#>
#> B D V
#> 77 64 273
As we can see in the table()
results, all the primary
secret has the status “B” in the output produced by τ-Argus. To adjust
this, we can do:
Protecting several tables at the same time, with
tab_muli_manager()
function
The function tab_multi_manager()
can deal with a whole
set of (linked or not) tables. It is an iterating process, performing
secondary suppression with one table at a time and it ensures that the
common cells have the same status. When a common cell is concerned by
secondary suppression, it reverberates the secret on each table that
shares this common cell. The process ends when the secondary secret is
consistent in every tables. See more details in vignettes Manage the
protection of linked tables
Data
For this example, two tables will be used :
data("turnover_act_size")
data("turnover_act_cj")
str(turnover_act_cj)
#> tibble [406 x 5] (S3: tbl_df/tbl/data.frame)
#> $ ACTIVITY: chr [1:406] "AZ" "BE" "FZ" "GI" ...
#> $ CJ : chr [1:406] "Total" "Total" "Total" "Total" ...
#> $ N_OBS : int [1:406] 405 12878 28043 62053 8135 8140 11961 41359 26686 25108 ...
#> $ TOT : num [1:406] 44475 24827613 8907311 26962063 8584917 ...
#> $ MAX : num [1:406] 6212 1442029 1065833 3084242 3957364 ...
The second tabular dataset provides the turnover of companies broken
down by business sectors (ACTIVITY
) and type of company
(CJ
). The latter has three categories and the overall
category is noted “Total”.
Since the two tables share one common explanatory variable
(ACTIVITY
), they can’t be treated separately without risk
of inconsistent protection.
Primary secret
The first step consists in indicating whether each cell complies with the primary rules, or not. A boolean variable is created, equal to TRUE if the cell doesn’t comply.
Here, we use the same rules as previously.
Running τ-Argus
Now that the primary secret has been specified for both tables, we can run the process.
ex3 <- tab_multi_manager(
list_tables = list_data_2_tabs,
list_explanatory_vars = list(
act_size = c("ACTIVITY", "SIZE"),
act_cj = c("ACTIVITY", "CJ")
),
hrc = c(ACTIVITY = hrc_file_activity),
dir_name = "tauargus_files/ex3",
value = "TOT",
freq = "N_OBS",
secret_var = "is_secret_prim",
totcode = "Total"
)
#> --- Current table to treat: act_size ---
#> --- Current table to treat: act_cj ---
#> --- Current table to treat: act_size ---
By default, the function uses a wrapper of
tab_rtauargus()
function, called
tab_rtauargus2()
, to apply secondary secret with τ-Argus.
Many default parameters are set. In particular:
- Interval Protection fixed to 10;
- Suppress method set to “MOD(1,5,1,0,0);
- Output.
When running, the function displays at each iteration which table is treated.
The vignette Manage the protection of linked tables provides
a full presentation of tab_multi_manager()
function.
About this vignette
- Authors: Julien Jamme & Nathanael Rastout
- Last update: 11/01/2024
- Version of rtauargus used: 1.2.0
- Version of τ-Argus used : TauArgus 4.2.3
- R version used : 4.2.3