XW2021 Round Table: Deployment Strategies for Scalable XNATs

Time: Day 1, 10:00 – 10:30 am CDT

Host: Chip Schweiss, director of IT Operations at the CIL.

Panelists: Brian Holt (Radiologics), Bennett Landman (Vanderbilt), Chris Cannistraci (Mt. Sinai)


Questions and Answers

When possible and as time permitted, questions that were brought up in the Q&A module during each talk were addressed in real time by the presenter. Other responses were entered in the Q&A interface itself. Those written responses are included below.

QuestionsAnswers
Is the ZFS pool an appliance of some kind or a built server?
  • Answered Live
  • ZFS storage is on JBOD disk arrays, each managed by a pair or OmniiOS hosts.  One primary, other fallback.  Storage is published to application layer via NFS.  We presently use OmniOS due to proven stability with ZFS at our scale.
Do you offer XNAT in cloud AWS or Azure image ?(Answered live in the session)
Have you considered Ceph as datastorage?
  • Yes. Ceph is a scalable distributed files system and/or block storage.
In the future, can this QA log be made available with the Zoom/LiveStream for posterity? Lots of good answers to questions here.
  • Answered Live
  • Yes! It is being recorded and will be published on the wiki, I think zoom will also track the questions within the recording!
To benefit from distributed computing, can XNAT support distributed database system/storage?(Answered live in the session)
Can you elaborate on processing nodes? Are you able to ingest DICOM from multiple nodes ?
  • Yes.  We use Slurm to queue jobs.
  • Yes.  We dedicate multiple XNAT instances that are not exposed human users to do heavy lifting.
Can XNAT store nifti files?
  • Yes, absolutely. There is not a built-in importer for NIFTI-formatted image sessions, but we recently added a tutorial on how to perform scripted uploads of this data to the XNAT Admin 101 course in XNAT Academy

    You can also convert DICOM to NIFTI using the Container Service plugin.

Are there any plans to incorporate surface viewers such as Brainbrowser or integrate with OHIF plugins in order to view Freesurfer data?
  • There's been some work in this direction at Radiologics and Flywheel, using vtkjs. Nothing ready for release yet. If there are other OHIF plugins for surface viewing, it seems like they could be integrated but I'm not familiar with it
@Chip can you tell us more about the support of Slurm for distributed processing of XNAT-hosted image data? Is that something new or tailor for your deployments? Thanks.
  • Mike Hodge on Human Connectome has recently converted SGE based job submissions to Slurm.
Are you able to separate XNAT projects storage to allow storage ML projects on high-speed storage and other projects on long-term storage less expensive storage environment?
  • This is why we have both ZFS and Ceph storage.  ZFS is about 2/3 the cost per TB as Ceph.  We also have a couple "archive" ZFS pools that are built to be highly resilient but not performant. (raidz3).
We are moving towards deploying XNAT in Kubernetes and are developing Helm charts for this. Is there anyone who goes down this road as well? It would be nice to share thoughts and experiences.
  • From Ryan Sullivan in the Chat: "I'll be presenting a poster on the Australian Imaging Service deployment method using Kubernetes on multi-cloud in Spatial Chat (Poster Room 4) tomorrow for anyone interested in k8s"
General question: how do you handle data being sent to HPC for computation? do you have to copy them?
  • We have the datasets directly mounted on the HPC.  In most cases we read directly out of the storage.
Is there a document for how to turn single server storage into distributed storage for XNAT?
  • No.  I would be happy to write up how to create NFS referals on ZFS to accomplish this.
If a job is cancelled on the HPC is there a way to trickle that cancel signal back to XNAT in order to update the job status?
  • Yes.  I'm not clear on what mechanisms are used.  If you will email me I will find out from Mike Hodge who recently converted SGE based jobs to use Slurm on the HPC.
Regarding previous presentations on infrastructure, very interesting to understand different approaches to on-premises VS AWS/Cloud provisioning. Wondering if anyone might be willing to share any cost/benefit analysis of such decisions, not literal costs but perhaps proportion? or perhaps be willing to chat in a break out sesh at some point?(Discussed in Spatial Chat)

$label.name