Revamp job avoidance and accounting
Created by: julianhess
-
Avoidance
- avoided shards are now implemented as Slurm noops, by starting the batch paused then cancelling any noop'd shards
-
delocalization.py
computes SHA1 checksums for every output -
delocalization.py
saves output patterns to job manifest, so thatOrchestrator.job_avoid()
can match them -
Orchestrator.job_avoid()
totally revamped to read from individual shard manifests, rather than entire job dataframe
-
Accounting
- We save accounting information for each shard to disk, and can reload it in the exact format returned by
Backend.sacct()
. This lets us keep track of accounting info across avoided jobs - Simplified accounting in
Orchestrator.wait_for_jobs_to_finish
- We save accounting information for each shard to disk, and can reload it in the exact format returned by
-
Job exit, localization, and teardown exit codes are all saved to disk by
entrypoint.sh
-
Add hashing features to
utils.py
-
Docker backend can connect to preexisting controller container, in which case the backend won't attempt to stop the container after it exits
-
Bump version to 0.10