XNAT Hardware for Enterprise Storage

Enterprise Class Storage for XNAT - the Hardware Setup

The XNAT hardware setup at the University of Iowa is a robust enterprise class data storage system, serving multiple XNAT instances (a site-wide repository as well as an instance set up for the multi-site PREDICT-HD project). This information was gathered from a conversation between Adam Harding at Iowa and Chip Schweiss at WashU.

Server

The server set up is based on a Virtual Machine (VM) infrastructure, using VMware ESX version 4. We've considered a clustered solution, but have not pursued this yet. ZFS storage platform back end forms the data pools on which VMs live and run.

As a best practice for security, each application (such as XNAT) has a unique Unix user created for that application. Each application is installed with its own Java & Tomcat into this user's account. Likewise, each account is backed up on the ZFS storage system.This setup allows us to quickly deploy – or redeploy, or migrate, or recover from catastrophic failure – onto a fresh system by simply 1) installing the OS on a system, 2) connecting the application's filesystem to the system that runs it, and 3) applying all the system and application settings by connecting the system to the configuration management system (Puppet).

This setup is very robust, performance isn't an issue. The vast majority of servers are controlled by Puppet, or will be soon. Using Hudson (now called Jenkins... our server predates the "split").

One of the primary benefits of our use of ZFS are snapshots and replication, both of which are automated by Puppet. Since snapshots are of filesystems, rather than files, there are some efficiency advantages compared to conventional rsync. ZFS-send/receive transfers only the blocks that store data, and it doesn't have to traverse the filesystem so it can just stream blocks. With its low-overhead default compression enabled and approaching a compression ration of 1.5:1, there are fewer blocks to read, write, or send across the network.

The only downside to this approach is that it's all-or-nothing: you get the whole filesystem or none of it, and thus can't access individual files or directories. It's also possible to send incremental streams, which are even more efficient. End-user home accounts and laboratory shares also use this infrastructure, and these read-only snapshots are directly accessible to users.

(An aside from Chip: We also use Puppet on our system at WashU. Snapshots are taken of VMs every night. Synch can be an issue, so snapshots are taken at 4am when there is the least activity. This gets as close to a real-time snapshot as possible. Test system uses clones of machines, doesn't touch production.)

Storage

As raw data, before compression, we have roughly 600-700 TB of managed/snapshotted/backed-up ZFS pools, including backup/replication space. We use the same management system for our shared cluster and its disk space, though it's administratively separate.

On the shared cluster: Lustre provides roughly 100 TB of high-speed scratch space, using ZFS for staging data. There is no need for a management front-end for ZFS, it's all managed by Puppet. We use Puppet to automate snapshot, retention & replication policies. It can send a snapshot of "these" files to "that" system, and back data up across town for full DRE. Also, an additional 100TB of users' home accounts are on the same ZFS infrastructure as already described.

We use Sun/Oracle Grid Engine for scheduling and resource management on our clusters, including the large shared cluster. XNAT's Pipeline Engine needs to run on a host configured for job submission, and we're currently considering a small but dedicated cluster for XNAT use.

(Aside from Chip: Our XNAT runs on a submit host to the Sun Grid Engine.)