Advanced Usage: FileComparator and FileMutator
Describing file validations in YAML involves 3 main properties, resources, comparators, and mutators. Before listing the full format, FileComparator and FileMutator will be covered briefly.
Overview of FileComparator
However, if you want more finely-grained controls on what is considered acceptable for the files, you can get that with FileComparators and FileMutators. Note that when you choose to use these options, you must actually have a copy of the files to work with, so instead of simply doing a REST call to check for file existence, NRG_Selenium will download a local copy of the files to work with. A FileComparator (or more accurately, a subclass extending FileComparator) defines a check to whether or not a file created by a pipeline is considered acceptable.
Overview of FileMutator
Ok, so you can compare files, except you know that the file created in XNAT has a timestamp in it, so the above method won't work for you. We can solve this problem with FileMutators. A subclass extending FileMutator defines some way to change a file before a FileComparator is applied. NRG_Selenium provides:
- ReplaceAllMutator, which replaces each of the keys in the "replacements" map (as a regex) with their corresponding value. The most common use case for this would be to replace different types of timestamps with generic Strings like "TIMESTAMP" so that the MD5 checksum would be correct and consistent.
- DecompressGzipMutator, which simply takes a gzipped file and uncompresses it.
More Advanced FileComparator usage
NRG_Selenium has several FileComparator subclasses:
- One of the simplest comparators provided is MD5_Comparator. As the name implies, the comparator does an MD5 hash on the file, and the check is considered passed if and only if the hash equals the value on the file's "md5" property.
- FileSizeComparator allows an optional "tolerance" property (assumed to be 0 if not included) to specify a maximum allowed percent error from the actual size of the generated file and what was expected (as listed on the file's "expectedSize" property).
- TextComparator is satisfied if and only if the contents of the file are equal to the value in the file's "expectedText" property.
- The next comparator is actually a family of comparators: ImageComparator. If a comparator extends this class, the pipeline file to which it is applied is compared against an imaging file known to be good. These previous files should be stored under the session resources under a folder with a name specified under the resource's "secondaryResources" property (covered below).
ImageDeviationComparator extends ImageComparator: This comparator allows two optional properties (assumed to be 0 if not supplied): "gray" and "color" where the first number is an integer listing the maximum total of deviations of the pixels if the image is grayscale, and the second number is an integer listing the maximum total of deviations of the RGB components of the pixels if the image is in color. More formally, denote the pixels in the -th slice of the image stack with coordinates by for the original image and for the generated image. Let be the maximum deviations allowed for grayscale and color images, respectively. Then the comparator is considered satisfied if and only if:
if the image is grayscale, or
if the image is in color. In the above, represents the 1-norm (also known as the Taxicab norm). If the image is grayscale, then the pixel is a single number, so the 1-norm simply reduces to the absolute value over . If the image is in color, then the pixel has 3 components (RGB), so the pixel is actually a vector in (technically also ), so the 1-norm serves to add up the absolute value of the deviation in each component.
NumberPixelsComparator extends ImageComparator: This comparator specifies the maximum number of pixels that are allowed to differ at all between the generated and original files. This is done with the "maxDifferingPixels" property, assumed to be 0 if not provided. The check is equivalent to this construction: more formally, denote the pixels in the -th slice of the image stack with coordinates by for the original image and for the generated image. Let be the maximum number of pixels allowed to differ. Then the comparator is considered satisfied if and only if:
where the above metric is the discrete metric (read more here):
PercentPixelsComparator extends ImageComparator: This comparator specifies the maximum percentage of pixels that are allowed to differ at all between the generated and original files (e.g. 2.5). This is done with the "maxPercentError" property, assumed to be 0 if not provided. This is equivalent to this construction: more formally, suppose the image is a stack of slices with resolution . Denote the pixels in the -th slice of the image stack with coordinates by for the original image and for the generated image. Let be the maximum percent of pixels allowed to differ. Then the comparator is considered satisfied if and only if:
where the above metric is the discrete metric (read more here):
PixelClusterComparator extends ImageComparator: This comparator takes a bit of a different approach than its siblings. The "maxClusterSize" property should specify the maximum number of differing pixels in a "cluster" for the whole image (a cluster is a string of pixels adjacent to each other such that from any one pixel in the cluster, you can reach any other pixel by moving 1 unit in the or direction at a time without leaving the cluster). This is equivalent to this construction: more formally, suppose the image is a stack of slices. Let be the set of coordinates in such that the pixels at that coordinate differ in the -th slice of the original and generated images. Define an edge set such that if and only if . Define a graph . Denote the number of vertices in the largest connected component of by . Finally, let be the maximum allowed size of a cluster. Then, the comparator is satisfied if and only if:
Defining the format
The comparators and mutators properties are maps where the keys are simple strings that can be used within the resources block to reference the appropriate object. At the root level, the "type" of resources should be specified, along with any additional data to locate the resources in question:
"type" | Meaning | Extra config required |
---|---|---|
assessor_xsi | Resources attached to an assessor under the session | "xsiType": specifying the xsiType of the assessor |
session | Resources under the session | none |
scan | Resources attached to a scan within the session | "scanId": specifying the scan |
The "resources" property should be a list of objects with the following properties:
Property | Required | Meaning/usage |
---|---|---|
"folder" | true | The name of the resource folder to find. |
"regex" | false | If true, specifies that the "folder" property is a regex to find the actual name. |
"secondaryResources" | false | The name of a resource folder under the session where source files are stored to use for comparison. |
"files" | false | A string list of files for which no explicit comparison is required (other than they were created). |
"complexFiles" | false | A list of objects defined in the following table... |
A file defined in "complexFiles" should have the following properties:
Property | Required | Meaning/usage |
---|---|---|
"name" | true | The name of the file to find. |
"regex" | false | If true, specifies that the "name" property is a regex to find the actual name. |
"comparator" | false | The key for the FileComparator to use. |
"mutator" | false | The key for the FileMutator to use. |
"md5" | false | The MD5 checksum for the file. |
"expectedText" | false | The expected contents of the file. |
"expectedSize" | false | The expected size of the file in bytes. |
"compareTo" | false | The name of the file in "secondaryResources" to which this file should be compared, if there is a name mismatch. |
The FileComparators each have a unique "type" that defines which subclass to instantiate, along with possibly other data needed:
FileComparator | "type" | Other data |
---|---|---|
MD5_Comparator | MD5 | none |
FileSizeComparator | FileSize | none |
TextComparator | TextEquals | none |
ImageDeviationComparator | ImageDeviation | "gray" and "color" nonnegative integers |
NumberPixelsComparator | NumPixels | "maxDifferingPixels" nonnegative integer |
PercentPixelsComparator | PercentPixels | "maxPercentError" nonnegative double |
PixelClusterComparator | Cluster | "maxClusterSize" nonnegative integer |
The FileMutators are analogous:
FileMutator | "type" | Other data |
---|---|---|
DecompressGzipMutator | ungzip | none |
ReplaceAllMutator | replaceAll | "replacements" string-string map |
Putting it all together
A fairly diverse example is listed below
type: assessor_xsi
xsiType: 'xnat:qcAssessmentData'
resources:
- folder: DATA
secondaryResources: QC_files
complexFiles:
- name: generated_values.txt
comparator: text_equals
mutator: timestamp
expectedText: '44.4 | 55.5 | 99.9 | DATE'
- name: snapshot.png
comparator: images_equal
compareTo: original_snapshot.png
- name: generated.nii.gz
comparator: images_equal
mutator: ungzip
files:
- otherdata1.txt
- otherdata2.txt
- folder: LOG
files:
- logfile.log
mutators:
ungzip:
type: ungzip
timestamp:
type: replaceAll
replacements:
'\d{8}': 'DATE'
comparators:
text_equals:
type: TextEquals
images_equal:
type: ImageDeviation
gray: 0
color: 0