This document will contain a discussion on container job scheduling as it pertains to the XNAT container service.
Scheduling is a necessary part of executing container jobs on a cluster or in a high-performance computing enviornment. The container service can currently communicate with a single docker server, and all containers run on the same machine as docker. In this scenario, the container service is acting as a scheduler, scheduling all jobs to a single worker node. When the number or size of the jobs grows, a single worker node will no longer provide adequate resources to service all the jobs simultaneously. Additional worker nodes can be added to a cluster and jobs assigned to nodes as appropriate, or the jobs can be held in a queue until resources are available, or both. These solutions are part of the job of a scheduler.
scheduling is the method by which work specified by some means is assigned to resources that complete the work. - Wikipedia: Scheduling (Computing)
The requirements of the scheduler will quickly grow beyond what the container service can do. We need to look at integrating with some scheduling system.
The pipeline engine delegates job scheduling to a customizable script. When XNAT prepares to launch a pipeline execution, it builds the command-line string to be executed and passes that to a script called schedule
. This script, as it exists when distributed with the pipeline engine, does nothing but executes whatever arguments it was given; it is a simple pass-through. But the schedule
script can be overridden by particular XNAT installations, which allows customizable job scheduling. On the CNDA and other NRG-managed XNAT installations, schedule
passes its arguments (which, remember, are the command-line string of a pipeline to be executed) to another tool called PipelineJobSubmitter
. The latter tool was written against the Distributed Resource Management Application API (DRMAA), which allows it to communicate with the NRG's Sun Grid Engine (and later Open Grid Engine) for job scheduling.
Kevin Archie designed and partially built the Onde service in 2013 to serve as a next-generation XNAT processing system. (Some links to old wiki docs: Onde requirements, Onde design, Example "CIFTI Average" job request)
At its core, Onde was designed as a job scheduler. It was made to manage compute nodes and job queues. the structure of the jobs themselves was not a primary concern.
It was never finished, so we can't really evaluate it. I just wanted to include some of the history of this project.
Some constraints to keep in mind as we consider various schedulers:
SUMMARY: Docker swarm would work ok for our purposes, but its strongest features are things we will never use.
Swarm is more of an "orchestration" tool rather than a "scheduling" tool. Its core purpose is to maintain replicas of "services" across all your nodes. Services are, roughly, a spec for how to run a container with some resource requirements and a desired state, like how many replicas of the service you want to be up at a given time. The job of the swarm is to maintain the service in the desired state.
For instance, a service can specify how many replicas should exist, and the manager will make sure that number of replicas is maintained. If some worker goes down, or a task becomes unresponsive, the manager will start identical replica tasks on other worker nodes. Replication is a key selling point for swarm services, but it seems useless for our purposes. Every job we run is doing something unique, either because it was given unique command-line arguments or it has mounted different files.
We can use the swarm job-scheduling features, but that's it. Which is not to say that this is bad. Being able to submit a service spec to a manager node and have that service be sent out to a worker node is still very useful to us, even if we will never replicate any services.
My entire understanding comes from browsing their documentation and from this article: Docker Swarm vs. Kubernetes: Comparison of the Two Giants in Container Orchestration. I think Kubernetes will not work for us right now because...
Like Docker Swarm, Kubernetes is more of an orchestration tool that can also do job scheduling, and not a dedicated scheduler.