XNAT for Data Sharing: ConnectomeDB
Data sharing entails an investigator distributing his data, either openly, semi-openly, or in closed collaborations. Large NIH studies are required to share data and many smaller projects have realized the benefits of sharing. However, data sharing requires the use of an application that can give researchers control over multiple levels of access, and control which data is accessible by whom.
With the increasing prevalence of sharing, XNAT is being used more frequently in this context. The ConnectomeDB, which distributes more than 2 petabytes of data for the Human Connectome Project, is a prime example.
XNAT's access control system allows investigators to make their data openly accessible to users of their XNAT instance, accessible by request, or completely closed. Its support for anonymization and DICOM metadata review help ensure subject privacy and compliance with HIPAA regulations. In XNAT, investigators can harmonize their scan labeling scheme with commonly used terms. Finally, its extensible data model allows investigators to share a variety of non-imaging data and derived image data with their imaging studies.
Currently, XNAT is working with the Human Connectome Project and the Cancer Imaging Archive on each project's data sharing needs, and is the backbone of a publicly available imaging resource at XNAT Central.
Project Aims:
The Human Connectome Project was founded in 2011 as the charter project to create and share the largest dataset of brain imaging from healthy young adults to date. The HCP captured a complex imaging protocol across 3T MRI, 7T MRI and MEG data modalities. The protocol includes structural scans, resting-state and task functional scans, and diffusion scans. Moreover, the project needed to distribute data in unprocessed and preprocessed formats, and other types of data including task analysis and group average data.
In addition to the imaging data, the HCP performed an exhaustive set of behavioral and clinical data gathering, including information that needed to be restricted from the general public – either for potential identifiability (i.e. exact age, ethnicity, or family status) or sensitivity (i.e. alcohol and substance use, or family history of mental disorders).
Why use XNAT?
The HCP informatics team deployed two XNAT instances – one internal application ("IntraDB") to store and manage all incoming scans and data, and one external application ("ConnectomeDB") for pipeline processing and data sharing.
XNAT has native support for all of the data types that the HCP needed to distribute, and the extensibility of the front end and the permissions model allowed the HCP team to construct a heavily customized data-sharing UI. The permissions model applied "open access", "restricted access" and "sensitive access" levels of permissions to each user account, depending on what level of usage they had been approved for. This allowed ConnectomeDB to maintain IRB approval while distributing highly restricted and sensitive data to select investigators.
Additionally, XNAT's built-in project controls allowed for multiple phased releases of data as the project hit certain milestones – i.e. 500 subjects, 900 subjects, and 1200 subjects with completed imaging and processing.
By the final release of data, ConnectomeDB was storing and distributing nearly 2 petabytes of data. In order to facilitate the downloading of hundreds of gigabytes of data, this XNAT was integrated with an Aspera download server, which funnels extremely high-speed downloads outside of the http: protocol. HCP developers also built a custom download selector, allowing users to only select the imaging modalities and processing levels that were of interest to their research.
Who are the Primary Users?
IntraDB is used only by a small team of scan technicians, quality assurance, and data managers internal to the HCP.
ConnectomeDB has nearly 10,000 users at the open access permission level, and nearly 1,000 investigators that have been granted restricted access. As of November 2017, nearly 10 petabytes of data has been downloaded.
What Inherent Features of XNAT Have Been Most Useful?
By far, XNAT's extensibility should be considered as its most useful feature. Without that, managing a project of this scope and scale could never have been possible. Additionally, the trust in XNAT's security model enabled us to release data to a wide audience without breaching the project's IRB requirements.