The protocol is made to run on a multi-GPU platform, with orders in .slurm files.
The protocol is made to run on a multi-GPU platform, with orders in .slurm files.
# How are trained the classifiers
### How are trained the classifiers
The classifiers are trained between cover images, and new stegos are generated in each batch with the corresponding cost map. It allows to train the classifier, if desired (depending on parameters --CL and --start_emb_rate) to use curriculum learning during the training, such as new stegos embedding any size of payload can be generated during the training.
The classifiers are trained between cover images, and new stegos are generated in each batch with the corresponding cost map. It allows to train the classifier, if desired (depending on parameters --CL and --start_emb_rate) to use curriculum learning during the training, such as new stegos embedding any size of payload can be generated during the training.
# Scruture of the results of a run of the protocol
### Structure of the results of a run of the protocol
A run of a protocol is an experiment for given values of QF, emb_rate, intial cost_maps, different steganalysts...
A run of a protocol is an experiment for given values of QF, emb_rate, intial cost_maps, different steganalysts...
A run of the protocol will save all values in a folder, which is defined in the parameter --data_dir_prot.
A run of the protocol will save all values in a folder, which is defined in the parameter --data_dir_prot.
The organization of this folder is described by the illustration, and in the following.
The organization of this folder is described by the illustration, and in the following.
...
@@ -14,74 +14,75 @@ At the beginning, it creates file description.txt which resumes all parameters p
...
@@ -14,74 +14,75 @@ At the beginning, it creates file description.txt which resumes all parameters p
Adversarial images are saved in "data_adv_$i/adv_final/" and optimized cost maps in "data_adv_$i/adv_cost/".
Adversarial images are saved in "data_adv_$i/adv_final/" and optimized cost maps in "data_adv_$i/adv_cost/".
Evalution of classifier with architecture $model trained at iteration $j on adversarial images generated at iteration $i are saved in "data_adv_$i/eval_$model_$j/". There are two files: "logits.npy" of size (10000,2) containing the raw logits given by the classifier, and "probas.npy" of size (10000,) which are the stego class probability given by the softmax of the logits. The images are ordred are in the file --permutation_files.npy given in input of the protocol.
Evalution of classifier with architecture $model trained at iteration $j on adversarial images generated at iteration $i are saved in "data_adv_$i/eval_$model_$j/". There are two files: "logits.npy" of size (10000,2) containing the raw logits given by the classifier, and "probas.npy" of size (10000,) which are the stego class probability given by the softmax of the logits. The images are ordred are in the file --permutation_files.npy given in input of the protocol.
Here are the steps of the protocol:
### Here are the steps of the protocol:
If --begin_step=0, the run of the protocol will begin with initialization which contains the following steps:
If --begin_step=0, the run of the protocol will begin with initialization which contains the following steps:
- training of classifiers (depending of --model) between cover and stego in folder 'train_$model_0/'.
* training of classifiers (depending of --model) between cover and stego in folder 'train_$model_0/'.
- evaluation of the classifiers and both databases: cover and initial stegos.
* evaluation of the classifiers and both databases: cover and initial stegos.
It produces files at iteration $k are:
It produces files at iteration $k are:
- adversarial stegos stored in a .npy file in the folder "data_adv_$k/adv_final/"
* adversarial stegos stored in a .npy file in the folder "data_adv_$k/adv_final/"
- optimized costs maps of shape 3*image_size*image_size stored in the folder "data_adv_$k/adv_cost/"
* optimized costs maps of shape 3*image_size*image_size stored in the folder "data_adv_$k/adv_cost/"
- evaluation of all the classifiers (all possible models trained at all previous iterations) on this new database of stegos in "data_adv_$k/eval_$model_$i/" of i between 0 and k-1.
* evaluation of all the classifiers (all possible models trained at all previous iterations) on this new database of stegos in "data_adv_$k/eval_$model_$i/" of i between 0 and k-1.
- index of the database of stego images (vector of shape 10000 containing integer between 0 and k) stored in "data_train_$k/index.npy"
* index of the database of stego images (vector of shape 10000 containing integer between 0 and k) stored in "data_train_$k/index.npy"
- train of classifier "model" stored in "train_$model_k/"
* train of classifier "model" stored in "train_$model_k/"
- evaluation of this new classifier on cover (in "cover/eval_$model_k/") + all the databases of stegos (in "data_adv_$i/eval_$model_k/") saved in "data_adv_$i/adv_final/" for all i between 0 and k.
* evaluation of this new classifier on cover (in "cover/eval_$model_k/") + all the databases of stegos (in "data_adv_$i/eval_$model_k/") saved in "data_adv_$i/adv_final/" for all i between 0 and k.
Parameters to pass in main.py:
# Parameters to pass in main.py:
- begin_step: first iteration of the protocol. Should be equals to 0 if you never launched it.
* begin_step: first iteration of the protocol. Should be equals to 0 if you never launched it.
- number_step: for how many further iteration to lauchn the protocol
* number_step: for how many further iteration to lauchn the protocol
- folder_model: absolute path leading to the folder './models/' stored in this folder. It contains the model and the architectures of the steganalysts networks.
* folder_model: absolute path leading to the folder './models/' stored in this folder. It contains the model and the architectures of the steganalysts networks.
- data_dir_prot: folder where all the stegos and classifiers produced during the algorithm are saved.
* data_dir_prot: folder where all the stegos and classifiers produced during the algorithm are saved.
- data_dir_cover: folder containing all cover images in the .npy format. One file for each image is required.
* data_dir_cover: folder containing all cover images in the .npy format. One file for each image is required.
- data_dir_stego_0: folder containing stegos at iteration 0, in the same format as cover images.
* data_dir_stego_0: folder containing stegos at iteration 0, in the same format as cover images.
- cost_dir: folder containing all initial modification maps at iteration 0. Each cost map is saved in a single file for format .npy. It has the shape
* cost_dir: folder containing all initial modification maps at iteration 0. Each cost map is saved in a single file for format .npy. It has the shape
- 3 x image_size x image_size : in that case, first channel for cost of -1, second for 0 and final for +1
* 3 x image_size x image_size : in that case, first channel for cost of -1, second for 0 and final for +1
- or image_size x image_size: in that case, only channel for symmetric cost of doing -1 or +1
* or image_size x image_size: in that case, only channel for symmetric cost of doing -1 or +1
- strategy='minmax': one of the three strategies available: 'minmax' (by default), 'random' or 'lastit'
* strategy='minmax': one of the three strategies available: 'minmax' (by default), 'random' or 'lastit'
- image_size='512': height of the squared images used
* image_size='512': height of the squared images used
- QF: quality factor of the images used. For a given QF, the quantification table should be saved in folder 'folder_model/' in the format 'c_quant_$QF.npy'
* QF: quality factor of the images used. For a given QF, the quantification table should be saved in folder 'folder_model/' in the format 'c_quant_$QF.npy'
- emb_rate: embedding rate in bpnzAC coefficients for JPEG images, or bpp for spatial images
* emb_rate: embedding rate in bpnzAC coefficients for JPEG images, or bpp for spatial images
- model: string value of models against Alice optimizes herself. 3 models are implemented: 'xunet', 'srnet' or 'efnet'. For multiple models, the models should be seperated by a coma ','. For example, models='xunet,srnet,efnet'
* model: string value of models against Alice optimizes herself. 3 models are implemented: 'xunet', 'srnet' or 'efnet'. For multiple models, the models should be seperated by a coma ','. For example, models='xunet,srnet,efnet'
- version_eff: string, which value is from 'b0', 'b1', ... to 'b7'. This is the size of the models of efficient net.
* version_eff: string, which value is from 'b0', 'b1', ... to 'b7'. This is the size of the models of efficient net.
- stride: 1 or 2, for the stride of the convolution at the beginning of efficient net.
* stride: 1 or 2, for the stride of the convolution at the beginning of efficient net.
- batch_size_classif_ef, batch_size_classif_sr, batch_size_classif_xu: batch size to use during training of each classifier (efficient net, SRNet or XU-Net)
* batch_size_classif_ef, batch_size_classif_sr, batch_size_classif_xu: batch size to use during training of each classifier (efficient net, SRNet or XU-Net)
- batch_size_eval_ef, batch_size_eval_sr, batch_size_eval_xu: batch size to use during evaluation of networks.
* batch_size_eval_ef, batch_size_eval_sr, batch_size_eval_xu: batch size to use during evaluation of networks.
- epoch_num_ef, epoch_num_sr, epoch_num_xu: number of epochs for the training for each classifier
* epoch_num_ef, epoch_num_sr, epoch_num_xu: number of epochs for the training for each classifier
- CL_ef, CL_sr, CL_xu: 'yes' for using curriculum learning during training, 'no' for not using it.
* CL_ef, CL_sr, CL_xu: 'yes' for using curriculum learning during training, 'no' for not using it.
- start_emb_rate_ef, start_emb_rate_sr, start_emb_rate_xu: if parameter CL is set to 'yes' for the corresponding classifier, starting value of the embedding rate during the training. It is during training decreased bu substracting value 0.1 every two epochs. It can be modified in class Fitter defined in file train.py in the method fit.
* start_emb_rate_ef, start_emb_rate_sr, start_emb_rate_xu: if parameter CL is set to 'yes' for the corresponding classifier, starting value of the embedding rate during the training. It is during training decreased bu substracting value 0.1 every two epochs. It can be modified in class Fitter defined in file train.py in the method fit.
- pair_training_ef, pair_training_sr, pair_training_xu: 'yes' for using pair training during the training of each network, 'no' for not using it.
* pair_training_ef, pair_training_sr, pair_training_xu: 'yes' for using pair training during the training of each network, 'no' for not using it.
- n_iter_max_backpack: integer for the maximum number of steps to use during the gradient descent of the optimization with backpack
* n_iter_max_backpack: integer for the maximum number of steps to use during the gradient descent of the optimization with backpack
- tau_0: starting float value of the temperature controlling the smoothness in the softmax gumbel distribution.
* tau_0: starting float value of the temperature controlling the smoothness in the softmax gumbel distribution.
- precision: float value for the precision for the stopping condition of the gradient descent, for checking inequalities like f(stego) "<" f(cover), which is replaced by f(stego) "<" f(cover)+precision
* precision: float value for the precision for the stopping condition of the gradient descent, for checking inequalities like f(stego) "<" f(cover), which is replaced by f(stego) "<" f(cover)+precision
- N_samples: integer for the number of samples of stego to use during the gradient descent for checking the average detectability of the cost map.
* N_samples: integer for the number of samples of stego to use during the gradient descent for checking the average detectability of the cost map.
- attack: 'SGE', 'DoTanh' or 'DoCoTanh' for softmax gumbel estimation, double tanh or double compact tanh. This is for which differentiable function to use for approximating the discrete changes.
* attack: 'SGE', 'DoTanh' or 'DoCoTanh' for softmax gumbel estimation, double tanh or double compact tanh. This is for which differentiable function to use for approximating the discrete changes.
- attack_last: 'no' or 'yes', for how many previous classifiers to optimize during the gradient descent. By default equals to 'yes', because the attack needs to load all previous classifiers which might not fit in GPU. If it doesn't fit, you can set this parameter to 'yes' such as it will take classifiers trained in the 3 previous iterations. This can be changed in python file script_attack.py, in the line "n_classifier_min = max(params.iteration_step-3, 0)".
* attack_last: 'no' or 'yes', for how many previous classifiers to optimize during the gradient descent. By default equals to 'yes', because the attack needs to load all previous classifiers which might not fit in GPU. If it doesn't fit, you can set this parameter to 'yes' such as it will take classifiers trained in the 3 previous iterations. This can be changed in python file script_attack.py, in the line "n_classifier_min = max(params.iteration_step-3, 0)".
- lr: float value for the value of the learning rate to use in ADAM optimizer for the gradient descent. Advices: use 0.5 for QF 75 and 0.05 for QF 100.
* lr: float value for the value of the learning rate to use in ADAM optimizer for the gradient descent. Advices: use 0.5 for QF 75 and 0.05 for QF 100.
In this folder:
# In this folder:
FOLDER:
- models: folder containing some constants, and .py files describing three different models
##### FOLDER:
NUMPY FILES :
* models: folder containing some constants, and .py files describing three different models
- DCT_4.npy : preprocessing kernel of size 4x4 containing DCT coefficients used in XUNet architecture
* NUMPY FILES :
- c_quant_75.npy, c_quant_95.npy, c_quant_100.npy : quantization tables for QF 75, 95 or 100
* DCT_4.npy : preprocessing kernel of size 4x4 containing DCT coefficients used in XUNet architecture
- permutation_files.npy : permutation of files (take the first images to be in train, after in valid and last in test set)
* c_quant_75.npy, c_quant_95.npy, c_quant_100.npy : quantization tables for QF 75, 95 or 100
PYTHON FILES:
* permutation_files.npy : permutation of files (take the first images to be in train, after in valid and last in test set)
- efficientnet.py, srnet.py and xunet.py for describing three steganalysts models
* PYTHON FILES:
PYTHON FILES:
* efficientnet.py, srnet.py and xunet.py for describing three steganalysts models
- main.py: Script to call for launching a protocol. It will itself run during the whole protocol and call other scripts to proceed to some actions. At the beginning, it creates the files description.txt which resumes the parameters of the experiment.
##### PYTHON FILES:
- backpack.py: the class Backpack with SGE, which provides the output and the gradient of the output of a classifier w.r.t. costs.
* main.py: Script to call for launching a protocol. It will itself run during the whole protocol and call other scripts to proceed to some actions. At the beginning, it creates the files description.txt which resumes the parameters of the experiment.
- double_tanh.py: the class with Backpack but with Double Tanh or Double Compact Tanh instead of SGE
* backpack.py: the class Backpack with SGE, which provides the output and the gradient of the output of a classifier w.r.t. costs.
- eval_classifier.py: function which evaluates a classifier on a database
* double_tanh.py: the class with Backpack but with Double Tanh or Double Compact Tanh instead of SGE
- generate_train_db.py: produces the index (files index.npy) of the adversarial stegos picked from the minmax strategy. It stores the result of np.argmin(np.max(probas,axis=1),axis=0).
* eval_classifier.py: function which evaluates a classifier on a database
- data_loader.py : contains input_fn to feed the network
* generate_train_db.py: produces the index (files index.npy) of the adversarial stegos picked from the minmax strategy. It stores the result of np.argmin(np.max(probas,axis=1),axis=0).
- script_attack.py: script which launches the gradient descent for each images.
* data_loader.py : contains input_fn to feed the network
- script_evalute_classif.py: script which launches the evaluation of any classifier on any database.
* script_attack.py: script which launches the gradient descent for each images.
- script_train.py: script which launche the training of a classifier
* script_evalute_classif.py: script which launches the evaluation of any classifier on any database.
- train.py: definition of the class Fitter useful for training a classifier.
* script_train.py: script which launche the training of a classifier
* train.py: definition of the class Fitter useful for training a classifier.