How To Perform Batch Anonymization with DicomBrowser
Once you write a script to anonymize fields within a single scan, you can apply that anonymization to any number of other scans that came from the same scanner. Here is how to do that.
Step-by-step guide
- Download the Linux version of the DicomBrowser binary distribution, which includes the command-line tools.
- (Optional) Write a DicomEdit script for any simple changes, such as deleting an attribute, or setting an attribute value to either a fixed value or a simple function of other attribute values in the same file. The more changes you can put into this script, the simpler will be the rest of this process. The DicomEdit language is described in detail here.
Write a remapping config file. This is an XML file that describes the spreadsheet that will be built from the data. The root element is
<Columns>
; each subelement describes either one or two columns in the spreadsheet:CODE<LEVEL>tag</LEVEL>
where tag is the 32-bit DICOM tag for the attribute to be displayed, expressed as two 16-bit numbers separated by a comma and surrounded by parentheses, e.g., (0008,0080) for Institution Name; and LEVEL is one of
Global, Patient, Study, or Series
, describing the highest domain over which the attribute will have a single value: Patient ID, for example, should be at the Patient level.The LEVEL element shown above adds one column to the spreadsheet, with column title equal to the name of the specified DICOM attribute, and each entry in the column the original value of that attribute in the DICOM files.
To allow an attribute value to be changed, add the
remap
attribute:CODE<LEVEL remap="Remap Column Name">tag</LEVEL>
This element will add two columns to the spreadsheet: one showing the original data values, as described above, and a new column with the given column name, in which replacement values can be specified. The new values column will appear immediately to the right of the original values column, unless the optional attribute append="true" is used, in which case the new values column will be placed at the far right, after all the original values columns.
Example:
CODE<Columns> <Global remap="Fixed Institution Name">(0008,0080)</Global> <Patient remap="Anon Patient Name">(0010,0010)</Patient> <Patient remap="Anon PatientID">(0010,0020)</Patient> <Study>(0020,0010)</Study> <Study>(0008,0020)</Study> <Series>(0020,0011)</Series> <Series>(0008,0031)</Series> </Columns>
Generate a spreadsheet from the data.
CODEDicomSummarize -c remap-config-file.xml -v remap.csv [directory-1 ...]
where the arguments in brackets are a list of directories containing the source DICOM data. Remember this list of directories, because you’ll use it again in step 5.
- Edit the spreadsheet using your favorite CSV editor. This spreadsheet will contain all the columns you defined, plus some additional columns needed to uniquely identify each patient, study, and series.Each new value (remap) column should be filled with values. In some cases, some cells in the spreadsheet can be left blank: for a Patient-level remap, one value must be specified for each patient; if the spreadsheet contains multiple rows for each patient, the column needs only be filled in one row for each patient. Similarly, for a Study-level remap, the value need only be filled once.
(This is a little complicated, but the remapper does some consistency checking. If you don’t fill in a required cell, the remapper will complain. If you give, for example, a Patient-level remap column multiple values for a single patient, the remapper will complain.)
Run the remapper.
CODEDicomRemap -c remap-config-file.xml -o <path-to-output-directory> -v remap.csv [directory-1 ...]
where the remap config XML should be the same file you used in step 3, remap.csv is the spreadsheet generated in step 3 that you edited in step 4, and the list of directories is the same list of source directories from step 3.
If you’re anonymizing many files, go get a cup of coffee, because it will take a while. If all goes well, the program will finish quietly, and the output directory will be full of anonymized files. If all doesn’t go well, the program will print some error messages providing clues as to what went wrong.
You can add an anonymization script to be applied at this stage by using the
-d
option.- Inspect the contents of the output directory to verify that what you thought was going to happen happened. There are many ways to make mistakes along the way, and there may be bugs that I haven’t found yet.
-h
option to get more information about their command-line arguments.
Related Video