Describing file validations in YAML involves 3 main properties**, ****resources**, **comparators, **and **mutators**. Before listing the full format, **FileComparator **and **FileMutator **will be covered briefly.

## Overview of FileComparator

However, if you want more finely-grained controls on what is considered acceptable for the files, you can get that with **FileComparator**s and **FileMutator**s. Note that when you choose to use these options, you must actually have a copy of the files to work with, so instead of simply doing a REST call to check for file existence, NRG_Selenium will download a local copy of the files to work with. A **FileComparator** (or more accurately, a subclass extending **FileComparator**) defines a check to whether or not a file created by a pipeline is considered acceptable.

## Overview of FileMutator

Ok, so you can compare files, except you know that the file created in XNAT has a timestamp in it, so the above method won't work for you. We can solve this problem with **FileMutator**s. A subclass extending **FileMutator** defines some way to change a file *before* a **FileComparator** is applied. NRG_Selenium provides:

**ReplaceAllMutator**, which replaces each of the keys in the "replacements" map (as a regex) with their corresponding value. The most common use case for this would be to replace different types of timestamps with generic Strings like "TIMESTAMP" so that the MD5 checksum would be correct and consistent.**DecompressGzipMutator**, which simply takes a gzipped file and uncompresses it.

## More Advanced FileComparator usage

NRG_Selenium has several **FileComparator** subclasses:

- One of the simplest comparators provided is
**MD5_Comparator**. As the name implies, the comparator does an MD5 hash on the file, and the check is considered passed if and only if the hash equals the value on the file's "md5" property. **FileSizeComparator**allows an optional "tolerance" property (assumed to be 0 if not included) to specify a maximum allowed percent error from the actual size of the generated file and what was expected (as listed on the file's "expectedSize" property).**TextComparator**is satisfied if and only if the contents of the file are equal to the value in the file's "expectedText" property.- The next comparator is actually a family of comparators:
**ImageComparator**. If a comparator extends this class, the pipeline file to which it is applied is compared against an imaging file known to be good. These previous files should be stored under the session resources under a folder with a name specified under the resource's "secondaryResources" property (covered below).**ImageDeviationComparator**extends**ImageComparator:**This comparator allows two optional properties (assumed to be 0 if not supplied): "gray" and "color"\sum_{i,j,k} d_1(G_{i,j,k}, O_{i,j,k}) = \sum_{i,j,k} \| G_{i,j,k} - O_{i,j,k} \|_{1} \leq m_G if the image is grayscale, or

\sum_{i,j,k} d_1(G_{i,j,k}, O_{i,j,k}) = \sum_{i,j,k} \| G_{i,j,k} - O_{i,j,k} \|_{1} \leq m_C if the image is in color. In the above, \| \cdot \|_1 represents the 1-norm (also known as the Taxicab norm). If the image is grayscale, then the pixel is a single number, so the 1-norm simply reduces to the absolute value over \mathbb{R}. If the image is in color, then the pixel has 3 components (RGB), so the pixel is actually a vector in \mathbb{R}^3 (technically also \mathbb{Z}^3), so the 1-norm serves to add up the absolute value of the deviation in each component.

**NumberPixelsComparator**extends**ImageComparator:**This comparator specifies the maximum number of pixels that are allowed to differ at all between the generated and original files. This is done with the "maxDifferingPixels" property, assumed to be 0 if not provided. The check is equivalent to this construction: more formally, denote the pixels in the k-th slice of the image stack with coordinates (i, j) by O_{i, j, k} for the original image and G_{i, j, k} for the generated image. Let M be the maximum number of pixels allowed to differ. Then the comparator is considered satisfied if and only if:\sum_{i,j,k} d(G_{i,j,k}, O_{i,j,k}) \leq M where the above metric d is the discrete metric (read more here):

d(x, y) = \begin{cases} 0 & \text{if } x = y \\ 1 & \text{if } x \neq y \end{cases} **PercentPixelsComparator**extends**ImageComparator:**This comparator specifies the maximum percentage of pixels that are allowed to differ at all between the generated and original files (e.g. 2.5). This is done with the "maxPercentError" property, assumed to be 0 if not provided. This is equivalent to this construction: more formally, suppose the image is a stack of z slices with resolution x \times y. Denote the pixels in the k-th slice of the image stack with coordinates (i, j) by O_{i, j, k} for the original image and G_{i, j, k} for the generated image. Let P be the maximum percent of pixels allowed to differ. Then the comparator is considered satisfied if and only if:\frac{100\sum_{i,j,k} d(G_{i,j,k}, O_{i,j,k})}{xyz} \leq P where the above metric d is the discrete metric (read more here):

d(x, y) = \begin{cases} 0 & \text{if } x = y \\ 1 & \text{if } x \neq y \end{cases} **PixelClusterComparator**extends**ImageComparator:**This comparator takes a bit of a different approach than its siblings. The "maxClusterSize" property should specify the maximum number of differing pixels in a "cluster" for the whole image (a cluster is a string of pixels adjacent to each other such that from any one pixel in the cluster, you can reach any other pixel by moving 1 unit in the x or y direction at a time without leaving the cluster). This is equivalent to this construction: more formally, suppose the image is a stack of z slices. Let V_i be the set of coordinates in \mathbb{Z}^2 such that the pixels at that coordinate differ in the i-th slice of the original and generated images. Define an edge set E_i such that \{v_1, v_2\} \in E_i if and only if \|v_1 - v_2\|_1 = 1. Define a graph G_i = (V_i, E_i). Denote the number of vertices in the largest connected component of G_i by M_{cc}(G_i). Finally, let M be the maximum allowed size of a cluster. Then, the comparator is satisfied if and only if:\max_{i = 1, \ldots, z} M_{cc}(G_i) \leq M

## Defining the format

The **comparators** and **mutators** properties are maps where the keys are simple strings that can be used within the **resources** block to reference the appropriate object. At the root level, the "type" of resources should be specified, along with any additional data to locate the resources in question:

"type" | Meaning | Extra config required |
---|---|---|

assessor_xsi | Resources attached to an assessor under the session | "xsiType": specifying the xsiType of the assessor |

session | Resources under the session | none |

scan | Resources attached to a scan within the session | "scanId": specifying the scan |

The "resources" property should be a list of objects with the following properties:

Property | Required | Meaning/usage |
---|---|---|

"folder" | true | The name of the resource folder to find. |

"regex" | false | If true, specifies that the "folder" property is a regex to find the actual name. |

"secondaryResources" | false | The name of a resource folder under the session where source files are stored to use for comparison. |

"files" | false | A string list of files for which no explicit comparison is required (other than they were created). |

"complexFiles" | false | A list of objects defined in the following table... |

A file defined in "complexFiles" should have the following properties:

Property | Required | Meaning/usage |
---|---|---|

"name" | true | The name of the file to find. |

"regex" | false | If true, specifies that the "name" property is a regex to find the actual name. |

"comparator" | false | The key for the FileComparator to use. |

"mutator" | false | The key for the FileMutator to use. |

"md5" | false | The MD5 checksum for the file. |

"expectedText" | false | The expected contents of the file. |

"expectedSize" | false | The expected size of the file in bytes. |

"compareTo" | false | The name of the file in "secondaryResources" to which this file should be compared, if there is a name mismatch. |

The **FileComparator**s each have a unique "type" that defines which subclass to instantiate, along with possibly other data needed:

FileComparator | "type" | Other data |
---|---|---|

MD5_Comparator | MD5 | none |

FileSizeComparator | FileSize | none |

TextComparator | TextEquals | none |

ImageDeviationComparator | ImageDeviation | "gray" and "color" nonnegative integers |

NumberPixelsComparator | NumPixels | "maxDifferingPixels" nonnegative integer |

PercentPixelsComparator | PercentPixels | "maxPercentError" nonnegative double |

PixelClusterComparator | Cluster | "maxClusterSize" nonnegative integer |

The **FileMutator**s are analogous:

FileMutator | "type" | Other data |
---|---|---|

DecompressGzipMutator | ungzip | none |

ReplaceAllMutator | replaceAll | "replacements" string-string map |

## Putting it all together

A fairly diverse example is listed below

type: assessor_xsi xsiType: 'xnat:qcAssessmentData' resources: - folder: DATA secondaryResources: QC_files complexFiles: - name: generated_values.txt comparator: text_equals mutator: timestamp expectedText: '44.4 | 55.5 | 99.9 | DATE' - name: snapshot.png comparator: images_equal compareTo: original_snapshot.png - name: generated.nii.gz comparator: images_equal mutator: ungzip files: - otherdata1.txt - otherdata2.txt - folder: LOG files: - logfile.log mutators: ungzip: type: ungzip timestamp: type: replaceAll replacements: '\d{8}': 'DATE' comparators: text_equals: type: TextEquals images_equal: type: ImageDeviation gray: 0 color: 0