The function prepares all the files needed by Tau-Argus and launches the software with the good settings and gets back the result.
Usage
tab_rtauargus(
tabular,
explanatory_vars,
files_name = NULL,
dir_name = NULL,
totcode = getOption("rtauargus.totcode"),
hrc = NULL,
secret_var = NULL,
secret_no_pl = NULL,
cost_var = NULL,
value = "value",
freq = "freq",
ip = 10,
maxscore = NULL,
suppress = "MOD(1,5,1,0,0)",
safety_rules = paste0("MAN(", ip, ")"),
show_batch_console = FALSE,
output_type = 4,
output_options = "",
unif_labels = TRUE,
split_tab = FALSE,
nb_tab_option = "smart",
limit = 14700,
...
)
Arguments
- tabular
data.frame which contains the tabulated data and an additional boolean variable that indicates the primary secret of type boolean
( data.frame contenant les données tabulées et une variable supplémentaire indiquant le secret primaire de type booléen.)- explanatory_vars
Vector of explanatory variables
Variables catégorielles, sous forme de vecteurs
Example :c("A21", "TREFF", "REG")
for a table crossingA21
xTREFF
xREG
(Variable indiquant le secret primaire de type booléen: prend la valeur "TRUE" quand les cellules du tableau doivent être masquées par le secret primaire, "FALSE" sinon. Permet de créer un fichier d'apriori)- files_name
string used to name all the files needed to process. All files will have the same name, only their extension will be different.
- dir_name
string indicated the path of the directory in which to save all the files (.rda, .hst, .txt, .arb, .csv) generated by the function.
- totcode
Code(s) which represent the total of a categorical variable (see section 'Specific parameters' for this parameter's syntax). If unspecified for a variable(neither by default nor explicitly) it will be set to
rtauargus.totcode
.
(Code(s) pour le total d'une variable catégorielle (voir section 'Specific parameters' pour la syntaxe de ce paramètre). Les variables non spécifiées (ni par défaut, ni explicitement) se verront attribuer la valeur dertauargus.totcode
.)- hrc
Informations of hierarchical variables (see section 'Hierarchical variables').
(Informations sur les variables hiérarchiques (voir section 'Hierarchical variables').) (Caractère qui, répété n fois, indique que la valeur est à n niveaux de profondeur dans la hiérarchie.)- secret_var
Nae of the boolean variable which specifies the secret, primary or not : equal to "TRUE" if a cell is concerned by the secret,"FALSE" otherwise. will be exported in the apriori file.
(Variable indiquant le secret de type booléen: prend la valeur "TRUE" quand les cellules du tableau doivent être masquées "FALSE" sinon. Permet de créer un fichier d'apriori)- secret_no_pl
name of a boolean variable which indicates the cells on which the protection levels won't be applied. If
secret_no_pl = NULL
(default), the protection levels are applied on each cell which gets aTRUE
status for thesecret_var
.- cost_var
Numeric variable allow to change the cost suppression of a cell for secondary suppression, it's the value of the cell by default, can be specified for each cell, fill with NA if the cost doesn't need to be changed for all cells
(Variable numeric qui permet de changer la coût de suppression d'une cellule, pris en compte dans les algorithmes de secret secondaire.Par défaut le coût correspond à la valeur de la cellule. peut être spécifié pour chacune des cellules, peut contenir des NA pour les coûts que l'on ne souhaite pas modifier.) (nombre minimal de décimales à afficher (voir section 'Number of decimals').)- value
Name of the column containing the value of the cells.
(Nom de la colonne contenant la valeur des cellules)- freq
Name of the column containing the cell frequency.
(Nom de la colonne contenant les effectifs pour une cellule)- ip
Value of the safety margin in % (must be an integer). (Valeur pour les intervalles de protection en %, doit être entier )
- maxscore
Name of the column containing, the value of the largest contributor of a cell.
(Nom de la colonne contenant la valeur du plus gros contributeur d'une cellule)- suppress
Algortihm for secondary suppression (Tau-Argus batch syntax), and the parameters for it.
( Algorithme de gestion du secret secondaire (syntaxe batch de Tau-Argus), ainsi que les potentiels paramètres associés)- safety_rules
Rules for primary suppression with Argus syntax, if the primary suppression has been dealt with an apriori file specify manual safety range :"MAN(10)" for example.
( Règle(s) de secret primaire. Chaîne de caractères en syntaxe batch Tau-Argus. Si le secret primaire a été traité dans un fichier d'apriori : utiliser "MAN(10)")- show_batch_console
to display the batch progress in the console.
(pour afficher le déroulement du batch dans la console.)- output_type
Type of the output file (Argus codification) By default
"2"
(csv for pivot-table). For SBS files use"4"
(Format des fichiers en sortie (codification Tau-Argus). Valeur par défaut du package :"2"
(csv for pivot-table). Pour le format SBS utiliser"4"
)- output_options
Additionnal parameter for the output, by default : code"AS+" (print Status). To specify no options :
""
.
(Options supplémentaires des fichiers en sortie. Valeur par défaut du package :"AS+"
(affichage du statut). Pour ne spécifier aucune option,""
.)- unif_labels
boolean, if explanatory variables have to be standardized
- split_tab
boolean, whether to reduce dimension to 3 while treating a table of dimension 4 or 5 (default to
FALSE
)- nb_tab_option
strategy to follow to choose variables automatically while splitting:
"min"
: minimize the number of tables;"max"
: maximize the number of tables;"smart"
: minimize the number of tables under the constraint of their row count.
- limit
numeric, used to choose which variable to merge (if nb_tab_option = 'smart') and split table with a number of row above this limit in order to avoid tauargus failures
- ...
any parameter of the tab_rda, tab_arb or run_arb functions, relevant for the treatment of tabular.
Value
If output_type equals to 4 and split_tab = FALSE, then the original tabular is returned with a new column called Status, indicating the status of the cell coming from Tau-Argus : "A" for a primary secret due to frequency rule, "B" for a primary secret due to dominance rule, "D" for secondary secret and "V" for no secret cell.
If split_tab = TRUE,
then the original tabular is returned with some new columns which are boolean
variables indicating the status of a cell at each iteration of the protection
process as we get with tab_multi_manager()
function. TRUE
denotes a cell that have to be suppressed. The last column is then the final status of the suppression process of the original table.
If split_tab = FALSE
and output_type
doesn't equal to 4
,
then the raw result from tau-argus is returned.
Standardization of explanatory variables and hierarchies
The boolean argument unif_labels
is useful to
prevent some common errors in using Tau-Argus. Indeed, Tau-Argus needs that,
within a same level of a hierarchy, the labels have the same number of
characters. When the argument is set to TRUE, tab_rtauargus
standardizes the explanatory variables to prevent this issue.
Hierarchical explanatory variables (explanatory variables associated to
a hrc file) are then modified in the tabular data and an another hrc file is
created to be relevant with the tabular. In the output, these modifications
are removed.
Examples
if (FALSE) {
library(dplyr)
data(turnover_act_size)
# Prepare data with primary secret ----
turnover_act_size <- turnover_act_size %>%
mutate(
is_secret_freq = N_OBS > 0 & N_OBS < 3,
is_secret_dom = ifelse(MAX == 0, FALSE, MAX/TOT>0.85),
is_secret_prim = is_secret_freq | is_secret_dom
)
# Make hrc file of business sectors ----
data(activity_corr_table)
hrc_file_activity <- activity_corr_table %>%
write_hrc2(file_name = "hrc/activity")
# Compute the secondary secret ----
options(
rtauargus.tauargus_exe =
"Y:/Logiciels/TauArgus/TauArgus4.2.3/TauArgus.exe"
)
res <- tab_rtauargus(
tabular = turnover_act_size,
files_name = "turn_act_size",
dir_name = "tauargus_files",
explanatory_vars = c("ACTIVITY", "SIZE"),
hrc = c(ACTIVITY = hrc_file_activity),
totcode = c(ACTIVITY = "Total", SIZE = "Total"),
secret_var = "is_secret_prim",
value = "TOT",
freq = "N_OBS",
verbose = FALSE
)
# Reduce dims feature
data(datatest1)
res_dim4 <- tab_rtauargus(
tabular = datatest1,
dir_name = "tauargus_files",
explanatory_vars = c("A10", "treff","type_distrib","cj"),
totcode = rep("Total", 4),
secret_var = "is_secret_prim",
value = "pizzas_tot_abs",
freq = "nb_obs_rnd",
split_tab = TRUE
)
}