##### Page tree
Go to start of banner

# Advanced Usage: FileComparator and FileMutator

Go to start of metadata

Describing file validations in YAML involves 3 main propertiesresourcescomparators, and mutators. Before listing the full format, FileComparator and FileMutator will be covered briefly.

## Overview of FileComparator

However, if you want more finely-grained controls on what is considered acceptable for the files, you can get that with FileComparators and FileMutators. Note that when you choose to use these options, you must actually have a copy of the files to work with, so instead of simply doing a REST call to check for file existence, NRG_Selenium will download a local copy of the files to work with. A FileComparator (or more accurately, a subclass extending FileComparator) defines a check to whether or not a file created by a pipeline is considered acceptable.

## Overview of FileMutator

Ok, so you can compare files, except you know that the file created in XNAT has a timestamp in it, so the above method won't work for you. We can solve this problem with FileMutators. A subclass extending FileMutator defines some way to change a file beforeFileComparator is applied. NRG_Selenium provides:

1. ReplaceAllMutator, which replaces each of the keys in the "replacements" map (as a regex) with their corresponding value. The most common use case for this would be to replace different types of timestamps with generic Strings like "TIMESTAMP" so that the MD5 checksum would be correct and consistent.
2. DecompressGzipMutator, which simply takes a gzipped file and uncompresses it.

## More Advanced FileComparator usage

NRG_Selenium has several FileComparator subclasses:

1. One of the simplest comparators provided is MD5_Comparator. As the name implies, the comparator does an MD5 hash on the file, and the check is considered passed if and only if the hash equals the value on the file's "md5" property.
2. FileSizeComparator allows an optional "tolerance" property (assumed to be 0 if not included) to specify a maximum allowed percent error from the actual size of the generated file and what was expected (as listed on the file's "expectedSize" property).
3. TextComparator is satisfied if and only if the contents of the file are equal to the value in the file's "expectedText" property.
4. The next comparator is actually a family of comparators: ImageComparator. If a comparator extends this class, the pipeline file to which it is applied is compared against an imaging file known to be good. These previous files should be stored under the session resources under a folder with a name specified under the resource's "secondaryResources" property (covered below).
1. ImageDeviationComparator extends ImageComparator: This comparator allows two optional properties (assumed to be 0 if not supplied): "gray" and "color" where the first number is an integer listing the maximum total of deviations of the pixels if the image is grayscale, and the second number is an integer listing the maximum total of deviations of the RGB components of the pixels if the image is in color. More formally, denote the pixels in the $k$-th slice of the image stack with coordinates $(i, j)$ by $O_{i, j, k}$ for the original image and $G_{i, j, k}$ for the generated image. Let $m_G, m_C$ be the maximum deviations allowed for grayscale and color images, respectively. Then the comparator is considered satisfied if and only if:

if the image is grayscale, or

if the image is in color. In the above, $\| \cdot \|_1$ represents the 1-norm (also known as the Taxicab norm). If the image is grayscale, then the pixel is a single number, so the 1-norm simply reduces to the absolute value over $\mathbb{R}$. If the image is in color, then the pixel has 3 components (RGB), so the pixel is actually a vector in $\mathbb{R}^3$ (technically also $\mathbb{Z}^3$), so the 1-norm serves to add up the absolute value of the deviation in each component.

2. NumberPixelsComparator extends ImageComparator: This comparator specifies the maximum number of pixels that are allowed to differ at all between the generated and original files. This is done with the "maxDifferingPixels" property, assumed to be 0 if not provided. The check is equivalent to this construction: more formally, denote the pixels in the $k$-th slice of the image stack with coordinates $(i, j)$ by $O_{i, j, k}$ for the original image and $G_{i, j, k}$ for the generated image. Let $M$ be the maximum number of pixels allowed to differ. Then the comparator is considered satisfied if and only if:

where the above metric $d$ is the discrete metric (read more here):

3. PercentPixelsComparator extends ImageComparator: This comparator specifies the maximum percentage of pixels that are allowed to differ at all between the generated and original files (e.g. 2.5). This is done with the "maxPercentError" property, assumed to be 0 if not provided. This is equivalent to this construction: more formally, suppose the image is a stack of $z$ slices with resolution $x \times y$. Denote the pixels in the $k$-th slice of the image stack with coordinates $(i, j)$ by $O_{i, j, k}$ for the original image and $G_{i, j, k}$ for the generated image. Let $P$ be the maximum percent of pixels allowed to differ. Then the comparator is considered satisfied if and only if:

where the above metric $d$ is the discrete metric (read more here):

4. PixelClusterComparator extends ImageComparator: This comparator takes a bit of a different approach than its siblings. The "maxClusterSize" property should specify the maximum number of differing pixels in a "cluster" for the whole image (a cluster is a string of pixels adjacent to each other such that from any one pixel in the cluster, you can reach any other pixel by moving 1 unit in the $x$ or $y$ direction at a time without leaving the cluster). This is equivalent to this construction: more formally, suppose the image is a stack of $z$ slices. Let $V_i$ be the set of coordinates in $\mathbb{Z}^2$ such that the pixels at that coordinate differ in the $i$-th slice of the original and generated images. Define an edge set $E_i$ such that $\{v_1, v_2\} \in E_i$ if and only if $\|v_1 - v_2\|_1 = 1$. Define a graph $G_i = (V_i, E_i)$. Denote the number of vertices in the largest connected component of $G_i$ by $M_{cc}(G_i)$. Finally, let $M$ be the maximum allowed size of a cluster. Then, the comparator is satisfied if and only if:

## Defining the format

The comparators and mutators properties are maps where the keys are simple strings that can be used within the resources block to reference the appropriate object. At the root level, the "type" of resources should be specified, along with any additional data to locate the resources in question:

"type"MeaningExtra config required
assessor_xsiResources attached to an assessor under the session"xsiType": specifying the xsiType of the assessor
sessionResources under the sessionnone
scanResources attached to a scan within the session"scanId": specifying the scan

The "resources" property should be a list of objects with the following properties:

PropertyRequiredMeaning/usage
"folder"trueThe name of the resource folder to find.
"regex"falseIf true, specifies that the "folder" property is a regex to find the actual name.
"secondaryResources"falseThe name of a resource folder under the session where source files are stored to use for comparison.
"files"falseA string list of files for which no explicit comparison is required (other than they were created).
"complexFiles"falseA list of objects defined in the following table...

A file defined in "complexFiles" should have the following properties:

PropertyRequiredMeaning/usage
"name"trueThe name of the file to find.
"regex"falseIf true, specifies that the "name" property is a regex to find the actual name.
"comparator"falseThe key for the FileComparator to use.
"mutator"falseThe key for the FileMutator to use.
"md5"falseThe MD5 checksum for the file.
"expectedText"falseThe expected contents of the file.
"expectedSize"falseThe expected size of the file in bytes.
"compareTo"falseThe name of the file in "secondaryResources" to which this file should be compared, if there is a name mismatch.

The FileComparators each have a unique "type" that defines which subclass to instantiate, along with possibly other data needed:

FileComparator"type"Other data
MD5_ComparatorMD5none
FileSizeComparatorFileSizenone
TextComparatorTextEqualsnone
ImageDeviationComparatorImageDeviation"gray" and "color" nonnegative integers
NumberPixelsComparatorNumPixels"maxDifferingPixels" nonnegative integer
PercentPixelsComparatorPercentPixels"maxPercentError" nonnegative double
PixelClusterComparatorCluster"maxClusterSize" nonnegative integer

The FileMutators are analogous:

FileMutator"type"Other data
DecompressGzipMutatorungzipnone
ReplaceAllMutatorreplaceAll"replacements" string-string map

## Putting it all together

A fairly diverse example is listed below

type: assessor_xsi
xsiType: 'xnat:qcAssessmentData'
resources:
- folder: DATA
secondaryResources: QC_files
complexFiles:
- name: generated_values.txt
comparator: text_equals
mutator: timestamp
expectedText: '44.4 | 55.5 | 99.9 | DATE'
- name: snapshot.png
comparator: images_equal
compareTo: original_snapshot.png
- name: generated.nii.gz
comparator: images_equal
mutator: ungzip
files:
- otherdata1.txt
- otherdata2.txt
- folder: LOG
files:
- logfile.log
mutators:
ungzip:
type: ungzip
timestamp:
type: replaceAll
replacements:
'\d{8}': 'DATE'
comparators:
text_equals:
type: TextEquals
images_equal:
type: ImageDeviation
gray: 0
color: 0

• No labels