generate HPC scheduler systems jobs input scripts and submit these scripts to HPC systems and poke until they finish

Last update: Nov 30, 2022

Overview

DPDispatcher

DPDispatcher is a python package used to generate HPC(High Performance Computing) scheduler systems (Slurm/PBS/LSF/dpcloudserver) jobs input scripts and submit these scripts to HPC systems and poke until they finish.
DPDispatcher will monitor (poke) until these jobs finish and download the results files (if these jobs is running on remote systems connected by SSH).

For more information, check the documentation.

Installation

DPDispatcher can installed by pip:

pip install dpdispatcher

Usage

See Getting Started for usage.

Contributing

DPDispatcher is maintained by Deep Modeling's developers and welcome other people. See Contributing Guide to become a contributor! 🤓

Comments

paramiko.ssh_exception.SSHException: Server connection dropped

I recently encountered a problem when using dpgen to perform the model deviation. When paramiko downloads a large file, it will report the following error paramiko.ssh_exception.SSHException: Server connection dropped The error occurs when downloading the tar file of model_devi (123 Gb), which is consistent with the situation described in this article https://zhuanlan.zhihu.com/p/102372919. At present, the process of dpgen is to download the trajectory to the local and then obtain the candidate structures through analysis. If it can be changed to analyze the trajectory on the remote server, get the candidate structure and then download it locally, it can avoid downloading large trajectory files, greatly speeds up the process, and avoid the above errors. I hope the developers can consider my suggestion, thank you very much.

opened by wankiwi 8
add ratio_unfinished param to allow jobs failed or discarded in a submission

In order to speed up dpgen jobs, it may be practicable to accelerate FP stage by just skipping slow jobs or waiting finished asynchronously.

While running dpgen jobs, we found the duration time of FP phrase is a large part in each iteration. This is because the DFT computing for some candidates are really hard and time consuming, and we need to wait all those long tail to be finished before going to the next iteration. We found that the proportion of those candidates is very small, which may less than 1%.

So we may add new params "ratio_unfinished" to optimize the execution: if most of fp jobs have finished, we can directly discard the remaining jobs and go to the next iteration. We think it is acceptable since the ratio is very small. Further more, we can also enable async check to wait these jobs finished and download results asynchronously, but this step is need to be done in dpgen. Now we modify dpdispatcher as the first step.

Following is our test number, which can significantly saving time:

opened by shazj99 7
paramiko.ssh_exception.AuthenticationException: Authentication failed.

When I updated dpgen from 10.0 to 10.6, I encountered an ssh error

dpgen: 10.0 to 10.6

ssh erro for dpgen10.0

I solved the problem by changing the look_for_keys in client.py for 10.0 version, however, It doesnt work for 10.6 version.

ssh erro for dpgen10.6

Traceback (most recent call last): File "/data/jccao/app/deepmd2_1_5/bin/dpgen", line 8, in sys.exit(main()) File "/data/jccao/app/deepmd2_1_5/lib/python3.10/site-packages/dpgen/main.py", line 185, in main args.func(args) File "/data/jccao/app/deepmd2_1_5/lib/python3.10/site-packages/dpgen/generator/run.py", line 3642, in gen_run run_iter (args.PARAM, args.MACHINE) File "/data/jccao/app/deepmd2_1_5/lib/python3.10/site-packages/dpgen/generator/run.py", line 3607, in run_iter run_train (ii, jdata, mdata) File "/data/jccao/app/deepmd2_1_5/lib/python3.10/site-packages/dpgen/generator/run.py", line 598, in run_train submission = make_submission( File "/data/jccao/app/deepmd2_1_5/lib/python3.10/site-packages/dpgen/dispatcher/Dispatcher.py", line 359, in make_submission machine = Machine.load_from_dict(abs_mdata_machine) File "/data/jccao/app/deepmd2_1_5/lib/python3.10/site-packages/dpdispatcher/machine.py", line 129, in load_from_dict context = BaseContext.load_from_dict(machine_dict) File "/data/jccao/app/deepmd2_1_5/lib/python3.10/site-packages/dpdispatcher/base_context.py", line 35, in load_from_dict context = context_class.load_from_dict(context_dict) File "/data/jccao/app/deepmd2_1_5/lib/python3.10/site-packages/dpdispatcher/ssh_context.py", line 350, in load_from_dict ssh_context = cls( File "/data/jccao/app/deepmd2_1_5/lib/python3.10/site-packages/dpdispatcher/ssh_context.py", line 323, in init self.ssh_session = SSHSession(**remote_profile) File "/data/jccao/app/deepmd2_1_5/lib/python3.10/site-packages/dpdispatcher/ssh_context.py", line 44, in init self._setup_ssh() File "/data/jccao/app/deepmd2_1_5/lib/python3.10/site-packages/dpdispatcher/utils.py", line 162, in wrapper return func(*args, **kwargs) File "/data/jccao/app/deepmd2_1_5/lib/python3.10/site-packages/dpdispatcher/ssh_context.py", line 166, in _setup_ssh ts.auth_password(self.username, self.password) File "/data/jccao/app/deepmd2_1_5/lib/python3.10/site-packages/paramiko/transport.py", line 1564, in auth_password return self.auth_handler.wait_for_response(my_event) File "/data/jccao/app/deepmd2_1_5/lib/python3.10/site-packages/paramiko/auth_handler.py", line 245, in wait_for_response raise e paramiko.ssh_exception.AuthenticationException: Authentication failed.

Details
enhancement

opened by caojiachun 6
why jobs have started running in diffrent folder when I run two same submission?

Hi , guys.

I got a problem: Firstly, I run a job submission, then I stop it ( use Ctrl+C ). Secondly, I restart this same job, but the dpdispatcher cannot restart this jobs submission from last folder. It run a new calculation from a new folder.

Why cause this and how to fix it?

Thanks for any help.

opened by LiuGaoyong 6
`check_status()` always return `JobStatus.unknown` unless `qstat -x` not return `0`
Some environment info:

dpdispatcher: updated with branch master python: 3.8 "pbs": torque-6.1

Description

dpdispatcher works well on Machine with batch_type = slurm, shell but not PBS. I think maybe qstat -x in Line 53 has different behavior on my nodes and yours. https://github.com/deepmodeling/dpdispatcher/blob/60f5c90ef3b57dbbb270aeea8565bde74a53fdbd/dpdispatcher/pbs.py#L48-L78

If one job has been finished moments ago, and qstat return with nothing but return to stderr. Code in check_status() will go into Line 56, and return JobStatus.finished or JobStatus.terminated. And truth is I can get correct result (files downloaded, python exit with 0) after a while.

But the script will raise a RuntimeError

RuntimeError: job_state for job ...<json info of the job, not important>... is unknown

Locate the problem

qstat -x seems to return a somewhat XML format (maybe) like: (on shell, not from script)

<?xml version="1.0"?> <Data></Data>

on my nodes.

I'm not sure whether I did anything wrong, or "pbs" not works the same way as yours. Perhaps @felix5572 knows this well. Maybe you are using PBS PRO but I'm using Torque.

By the way, I can also got job_state from my qstat -x return with label <job_state> but not simple .split()[-2]

<?xml version="1.0"?> <Data> <Job> .... <job_state>C</job_state> .... </Job> </Data>
opened by saltball 6
Task has no attribute 'load_from_json'

Traceback (most recent call last): File "submitted.py", line 6, in task0 = Task.load_from_json('json/task.json') AttributeError: type object 'Task' has no attribute 'load_from_json'

----here is the env I used

Name Version Build Channel

bcrypt 3.2.0 pypi_0 pypi ca-certificates 2021.5.30 h4653dfc_0 conda-forge certifi 2021.5.30 pypi_0 pypi cffi 1.14.6 pypi_0 pypi charset-normalizer 2.0.4 pypi_0 pypi cryptography 3.4.7 pypi_0 pypi dargs 0.2.6 pypi_0 pypi dpdispatcher 0.3.39 pypi_0 pypi idna 3.2 pypi_0 pypi libcxx 12.0.1 h168391b_0 conda-forge libffi 3.3 h9f76cd9_2 conda-forge ncurses 6.2 h9aa5885_4 conda-forge openssl 1.1.1k h3422bc3_1 conda-forge paramiko 2.7.2 pypi_0 pypi pip 21.2.4 pyhd8ed1ab_0 conda-forge pycparser 2.20 pypi_0 pypi pynacl 1.4.0 pypi_0 pypi python 3.8.10 hf9733c0_1_cpython conda-forge python_abi 3.8 2_cp38 conda-forge readline 8.1 hedafd6a_0 conda-forge requests 2.26.0 pypi_0 pypi setuptools 57.4.0 py38h10201cd_0 conda-forge six 1.16.0 pypi_0 pypi sqlite 3.36.0 h72a2b83_0 conda-forge tk 8.6.10 hf7e6567_1 conda-forge urllib3 1.26.6 pypi_0 pypi wheel 0.37.0 pyhd8ed1ab_1 conda-forge xz 5.2.5 h642e427_1 conda-forge zlib 1.2.11 h31e879b_1009 conda-forge
documentation enhancement

opened by csu1505110121 5
Add prepend_script and append_script for job.resource
It is sometimes in need of executing command lines before or after task submitted to startup or shutdown necessary environment. While most issue with prepend script required like this might be solved by source a static script on remote server (which might also be difficult when remote server is not so stable.), scripts to be executed after tasks finished might be difficult under current dpdispatcher resource. Here in this PR, prepend_script and append_script parameters have been add to job.resource in form of List, in which a single line could be an item, like this:

"prepend_script": [ "conda activate test_env", "export PATH=/path/to/package:$PATH", "send_an_email_to [email protected]" "sleep 1919810" ]

and the expected output:

conda activate test_env export PATH=/path/to/package:$PATH send_an_email_to [email protected] sleep 1919810

Another little change is just add delay=True for logger to prevent the generation of dpdispatcher.log even if no log information output.
opened by Cloudac7 4
Re-running dpdispatcher will re-upload forward_files to the remote and replace it
I found that re-running dpdispatcher would re-upload forward_files to the remote and replace it. But this can cause some problems:

If forward_files is large, it will take a lot of time to retransfer files.

Sometimes forward_files are rewritten during calculation, then the substitution will cause an error.

So it seems to me that since dpdispatcher checks for old submission, there is no need to re-upload forward_files.
opened by LavendaRaphael 4
Fix: NoneType Error on DPCloudServer

After deleting a job group manually on bohrium, a NoneType error will be raised when submitting the job in the same path. To solve this problem, I add a check to NoneType and a note to help the user submit the job again. (Only the last commit in this pr is applied.)

opened by HuangJiameng 3
Support optional compress for SSHContext

For a large number of small files, compression can be very CPU-intensive and time-consuming. See https://github.com/deepmodeling/dpgen/issues/766. Add tar_compress in dict remote_profile. The archive will be compressed in upload and download if it is True. If not, compression will be skipped.

opened by HuangJiameng 3
add wait_time to Resources for delayed submission

For some special queue or host, job submission might require waiting time after each single job submitted, to prevent crash when tasks were submitted together. Therefore, parameter wait_time is added to Resources, to support the special issue.

The default value of wait_time is 0, and it accepts a value of int. If set to a value larger than 0, it will sleep for wait_time seconds after each job submission.

Also, a condition is added for mirror_gitee action to prevent error message after pushing to the forked repository instead of the main one.

opened by Cloudac7 3
RuntimeError: Authentication failed, try to provide password
We have two batch job system (Torque and LSF). This example is run by RSA type key ( id_rsa ) for logging. But results are as below: In Torque, RuntimeError: Authentication failed, try to provide password In LSF, it works.

So I try connecting by paramiko.

import paramiko ssh = paramiko.SSHClient() ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) ssh.connect("Torque ip", username="user", key_filename="/home/user/.ssh/id_rsa") ssh.connect("LSF ip", username="user", key_filename="/home/user/.ssh/id_rsa")

In Torque, ValueError: q must be exactly 160, 224, or 256 bits long paramiko In LSF, it works.
wontfix
opened by scott-5 3
paramiko.ssh_exception.AuthenticationException: Authentication failed.

Summary

When I use dpgen run param.json machine.json to run a job, I frequently get errors: paramiko.ssh_exception.AuthenticationException: Authentication failed.

DeePMD-kit Version

2.1.1

TensorFlow Version

tf=2.5.0

Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

python=3.8.5 cudatoolkit-11.3.1 gcc=7.5

Details

hello , dear Developers

When I use dpgen run param.json machine.json to run a job, I frequently find errors as follows: Please cite: Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E, DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models, Computer Physics Communications, 2020, 107206.

Description

Traceback (most recent call last): File "/HOME/zhoujy/.conda/envs/deepmd-kit2.1.1/bin/dpgen", line 8, in sys.exit(main()) File "/HOME/zhoujy/.conda/envs/deepmd-kit2.1.1/lib/python3.9/site-packages/dpgen/main.py", line 185, in main args.func(args) File "/HOME/zhoujy/.conda/envs/deepmd-kit2.1.1/lib/python3.9/site-packages/dpgen/generator/run.py", line 3642, in gen_run run_iter (args.PARAM, args.MACHINE) File "/HOME/zhoujy/.conda/envs/deepmd-kit2.1.1/lib/python3.9/site-packages/dpgen/generator/run.py", line 3628, in run_iter run_fp (ii, jdata, mdata) File "/HOME/zhoujy/.conda/envs/deepmd-kit2.1.1/lib/python3.9/site-packages/dpgen/generator/run.py", line 3018, in run_fp run_fp_inner(iter_index, jdata, mdata, forward_files, backward_files, _vasp_check_fin, File "/HOME/zhoujy/.conda/envs/deepmd-kit2.1.1/lib/python3.9/site-packages/dpgen/generator/run.py", line 2985, in run_fp_inner submission = make_submission( File "/HOME/zhoujy/.conda/envs/deepmd-kit2.1.1/lib/python3.9/site-packages/dpgen/dispatcher/Dispatcher.py", line 359, in make_submission machine = Machine.load_from_dict(abs_mdata_machine) File "/HOME/zhoujy/.conda/envs/deepmd-kit2.1.1/lib/python3.9/site-packages/dpdispatcher/machine.py", line 134, in load_from_dict context = BaseContext.load_from_dict(machine_dict) File "/HOME/zhoujy/.conda/envs/deepmd-kit2.1.1/lib/python3.9/site-packages/dpdispatcher/base_context.py", line 41, in load_from_dict context = context_class.load_from_dict(context_dict) File "/HOME/zhoujy/.conda/envs/deepmd-kit2.1.1/lib/python3.9/site-packages/dpdispatcher/ssh_context.py", line 350, in load_from_dict ssh_context = cls( File "/HOME/zhoujy/.conda/envs/deepmd-kit2.1.1/lib/python3.9/site-packages/dpdispatcher/ssh_context.py", line 323, in init self.ssh_session = SSHSession(**remote_profile) File "/HOME/zhoujy/.conda/envs/deepmd-kit2.1.1/lib/python3.9/site-packages/dpdispatcher/ssh_context.py", line 44, in init self._setup_ssh() File "/HOME/zhoujy/.conda/envs/deepmd-kit2.1.1/lib/python3.9/site-packages/dpdispatcher/utils.py", line 162, in wrapper return func(*args, **kwargs) File "/HOME/zhoujy/.conda/envs/deepmd-kit2.1.1/lib/python3.9/site-packages/dpdispatcher/ssh_context.py", line 166, in _setup_ssh ts.auth_password(self.username, self.password) File "/HOME/zhoujy/.conda/envs/deepmd-kit2.1.1/lib/python3.9/site-packages/paramiko/transport.py", line 1564, in auth_password return self.auth_handler.wait_for_response(my_event) File "/HOME/zhoujy/.conda/envs/deepmd-kit2.1.1/lib/python3.9/site-packages/paramiko/auth_handler.py", line 245, in wait_for_response raise e paramiko.ssh_exception.AuthenticationException: Authentication failed.

here is my machine.json，hostname、username and password is correct , about the fp , I want to run it on remote cluster: { "api_version": "1.0", "train": [ { "machine": { "context_type": "local", "batch_type": "Slurm", "machine_type": "Slurm", "local_root": "./", "_remote_root": "/data/run01/scz5616/HHM/dpmd-project/WFpro/S2tet/work", "remote_root": "/HOME/zhoujy/run/dp-test/work" }, "resources": { "module_list": [], "_source_list": [ "/data/run01/scz5616/HHM/dpmd-project/WFpro/S2tet/train.sh" ], "source_list": ["/HOME/zhoujy/run/dp-test/train.sh"], "cpu_per_node": 6, "number_node": 1, "gpu_per_node": 1, "queue_name": "gpu_c128", "_exclude_list": [], "_time_limit": "24:0:0", "group_size": 1 }, "command": "dp" } ], "model_devi": [ { "machine": { "context_type": "local", "batch_type": "Slurm", "machine_type": "Slurm", "local_root": "./", "_remote_root": "/data/run01/scz5616/HHM/dpmd-project/WFpro/S2tet/work", "remote_root": "/HOME/zhoujy/run/dp-test/work" }, "resources": { "_module_list": [], "_source_list": [ "/data/run01/scz5616/HHM/dp-test/lammps.sh" ], "cpu_per_node": 6, "number_node": 1, "gpu_per_node": 1, "queue_name": "gpu", "_exclude_list": [], "_time_limit": "23:0:0", "group_size": 1 }, "command": "lmp" } ], "fp": [ { "machine": { "context_type": "ssh", "batch_type": "Slurm", "_machine_type": "Slurm", "local_root": "./", "remote_root": "/public1/ws133/sc94566/zhou/work", "remote_profile": { "hostname": "36.103.203.6", "username": "[email protected]", "port": 22, "password": "xxxxxxxxxxxxxxxxxxxx" } }, "resources": { "number_node": 1, "cpu_per_node": 64, "_custom_flags": [ "-p G1Part_sce" ], "queue_name": "amd_256", "_with_mpi": false, "source_list": [ "/public1/ws133/sc94566/zhou/env.sh" ], "_time_limit": "120:0:0", "_comment": "that's all", "group_size": 100 }, "command": "ulimit -s unlimited; srun -n 64 vasp_std" } ] }
bug

opened by zhoujingyu13687306871 2

RuntimeError in make_model_devi step

After I updated the dpdispatcher version to 0.4.18, I got the following error when DPGEN performed the make_model_devi, which can be solved when downgrading dpdispatcher to 0.4.17.

2022-09-20 04:00:40,341 - INFO : job: 31fcd1c1d95b2fedff35615bf29adbc61e3057e5 315398 finished
INFO:dpgen:-------------------------iter.000007 task 02--------------------------
INFO:dpgen:-------------------------iter.000007 task 03--------------------------
INFO:dpgen:-------------------------iter.000007 task 04--------------------------
Traceback (most recent call last):
  File "/home/kwwan/.local/bin/dpgen", line 8, in <module>
    sys.exit(main())
  File "/home/kwwan/.local/lib/python3.8/site-packages/dpgen/main.py", line 185, in main
    args.func(args)
  File "/home/kwwan/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 3914, in gen_run
    run_iter (args.PARAM, args.MACHINE)
  File "/home/kwwan/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 3787, in run_iter
    run_model_devi (ii, jdata, mdata)
  File "/home/kwwan/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 1614, in run_model_devi
    run_md_model_devi(iter_index,jdata,mdata)
  File "/home/kwwan/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 1608, in run_md_model_devi
    submission.run_submission()
  File "/home/kwwan/software/Anaconda3/lib/python3.8/site-packages/dpdispatcher/submission.py", line 176, in run_submission
    self.generate_jobs()
  File "/home/kwwan/software/Anaconda3/lib/python3.8/site-packages/dpdispatcher/submission.py", line 340, in generate_jobs
    self.bind_machine(self.machine)
  File "/home/kwwan/software/Anaconda3/lib/python3.8/site-packages/dpdispatcher/submission.py", line 163, in bind_machine
    self.machine.context.bind_submission(self)
  File "/home/kwwan/software/Anaconda3/lib/python3.8/site-packages/dpdispatcher/ssh_context.py", line 389, in bind_submission
    self.block_checkcall(f"mv {old_remote_root} {self.remote_root}")
  File "/home/kwwan/software/Anaconda3/lib/python3.8/site-packages/dpdispatcher/ssh_context.py", line 537, in block_checkcall
    raise RuntimeError("Get error code %d in calling %s through ssh with job: %s . message: %s" %
RuntimeError: Get error code 1 in calling mv /data/home/scv3616/run/wankw/temp/dpmd_remote/447fbf8e9ee0ecc33a67e8f01f1847a2d3888f29 /data/home/scv3616/run/wankw/temp/dpmd_remote/5b3271c64c830aca6cfc836322191dc2482054ad through ssh with job: 5b3271c64c830aca6cfc836322191dc2482054ad . message:

opened by wankiwi 5

ratio_unfinished for group_size > 1

I want dpdispatcher to handle those failed tasks via parameter ratio_unfinished, but I found that it doesn't work as expected when group_size exceeds 1. In this case, even if only one task in the group fails, all tasks in the group are deleted, which is unexpected.
enhancement

opened by LavendaRaphael 0
Add more examples
I have added two examples below:

https://docs.deepmodeling.com/projects/dpdispatcher/en/latest/examples/expanse.html

https://docs.deepmodeling.com/projects/dpdispatcher/en/latest/examples/shell.html

Here, I would like to solicit more examples to add to the documentation. They should cover more job scheduling packages including PBS, LSF, Lebesgue, etc.
documentation
opened by njzjz 0

Releases(v0.5.1)

v0.5.1(Jan 6, 2023)
What's Changed

fix local context with uploading files in the subdirectory by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/300

Full Changelog: https://github.com/deepmodeling/dpdispatcher/compare/v0.5.0...v0.5.1
Source code(tar.gz)
Source code(zip)
v0.5.0(Jan 5, 2023)
What's Changed

fix codecov by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/274

Add prepend_script and append_script for job.resource by @Cloudac7 in https://github.com/deepmodeling/dpdispatcher/pull/273

Remove os.chdir() method; add support for other key_file types other than RSA by @Cloudac7 in https://github.com/deepmodeling/dpdispatcher/pull/275

add tests for Python 3.11, macos, and windows by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/276

skip building docker out of deepmodeling by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/278

add cloudserver to the docker by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/282

add Optional to type hints when default is None by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/283

avoid compressing duplicated files by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/284

fix shell when filename contains special charaters by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/285

add type checker by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/286

disable tqdm when stderr is redirected by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/277

fix bohrium remote root by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/287

fix pass action by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/292

Support redirect log from bohrium which will be used by dflow by @KZHIWEI in https://github.com/deepmodeling/dpdispatcher/pull/298

add look_for_keys option by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/299

Full Changelog: https://github.com/deepmodeling/dpdispatcher/compare/v0.4.19...v0.5.0
Source code(tar.gz)
Source code(zip)
v0.4.19(Nov 3, 2022)
What's Changed

migrate from setup.py to pyproject.toml by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/265

document different contexts and batches by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/266

Fix typo in pr#266 by @HuangJiameng in https://github.com/deepmodeling/dpdispatcher/pull/267

Change Lebesgue API Service To Bohrium API Service by @KZHIWEI in https://github.com/deepmodeling/dpdispatcher/pull/268

drop Python 3.6 support by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/270

support machine and context alias by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/269

Full Changelog: https://github.com/deepmodeling/dpdispatcher/compare/v0.4.18...v0.4.19
Source code(tar.gz)
Source code(zip)
v0.4.18(Sep 18, 2022)
What's Changed

add retry for totp authentication by @PKUfjh in https://github.com/deepmodeling/dpdispatcher/pull/246

fix authing using secrets and totp by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/251

change source of mock-ssh-server; uncomment 3.8 by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/252

add ci tests on slurm by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/253

fix codecov upload by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/255

add tests for openpbs by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/256

add tests for slurm job array by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/257

migrate ssh ci test to docker by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/258

support port and key_filename for rsync by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/249

add tests for LazyLocalContext by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/259

add tests for empty transfer by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/260

ssh: move remote_root when it changes by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/261

Full Changelog: https://github.com/deepmodeling/dpdispatcher/compare/v0.4.17...v0.4.18
Source code(tar.gz)
Source code(zip)
v0.4.17(Sep 2, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.16(Aug 11, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.15(Aug 2, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.14(Jul 12, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.13(Jul 6, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.12(Jun 30, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.11(Jun 22, 2022)
What's Changed

docs: use dargs directive by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/196

catch socket.timeout for ut by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/197

docs: add links for classes, methods, and parameters by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/198

set Machine and BaseContext as abstract classes by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/199

follow symlink in LocalContext downloading by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/201

Full Changelog: https://github.com/deepmodeling/dpdispatcher/compare/v0.4.10...v0.4.11
Source code(tar.gz)
Source code(zip)
v0.4.10(Jun 9, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.9(May 9, 2022)
Breaking Change

enable strict check for arguments by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/183 The hash of the submission may change in this version. Do not upgrade dpdispatcher before a submission is finished.

What's Changed

Fix symlink subdirs not uploaded to remote by @LavendaRaphael in https://github.com/deepmodeling/dpdispatcher/pull/185

allow batch_type with strict check; check kwargs when batch_type exsits by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/186

doc: add links to DP-GUI by @njzjz in https://github.com/deepmodeling/dpdispatcher/pull/187

Full Changelog: https://github.com/deepmodeling/dpdispatcher/compare/v0.4.8...v0.4.9
Source code(tar.gz)
Source code(zip)
v0.4.8(Apr 18, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.7(Mar 17, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.6(Feb 17, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.5(Feb 11, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.4(Feb 8, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.3(Jan 19, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.2(Dec 23, 2021)

Source code(tar.gz)
Source code(zip)
v0.4.1(Dec 4, 2021)

Source code(tar.gz)
Source code(zip)
0.4.0(Dec 2, 2021)

update submission json data format
Source code(tar.gz)
Source code(zip)
v0.3.46(Nov 20, 2021)

Source code(tar.gz)
Source code(zip)
v0.3.45(Nov 1, 2021)

Source code(tar.gz)
Source code(zip)
v0.3.44(Oct 29, 2021)

Source code(tar.gz)
Source code(zip)
v0.3.43(Oct 12, 2021)

Source code(tar.gz)
Source code(zip)
v0.3.42(Sep 28, 2021)

Source code(tar.gz)
Source code(zip)
v0.3.41(Sep 9, 2021)

Source code(tar.gz)
Source code(zip)
v0.3.40(Sep 9, 2021)

Source code(tar.gz)
Source code(zip)

Owner

DeepModeling

Define the future of scientific computing together

GitHub Repository https://docs.deepmodeling.org/projects/dpdispatcher/

dragonscales is a highly customizable asynchronous job-scheduler framework

dragonscales 🐉 dragonscales is a highly customizable asynchronous job-scheduler framework. This framework is used to scale the execution of multiple

2 May 16, 2022

A flexible python library for building your own cron-like system, with REST APIs and a Web UI.

Nextdoor Scheduler ndscheduler is a flexible python library for building your own cron-like system to schedule jobs, which is to run a tornado process

1k Dec 15, 2022

Python job scheduling for humans.

schedule Python job scheduling for humans. Run Python functions (or any other callable) periodically using a friendly syntax. A simple to use API for

10.4k Jan 02, 2023

Remote task execution tool

Gunnery Gunnery is a multipurpose task execution tool for distributed systems with web-based interface. If your application is divided into multiple s

747 Nov 09, 2022

Python-Repeated-Timer is an open-source & highly performing timer using only standard-libraries.

Python Repeated Timer Python-Repeated-Timer is an open-source & highly performing timer using only standard-libraries.

3 Oct 09, 2022

A task scheduler with task scheduling, timing and task completion time tracking functions

A task scheduler with task scheduling, timing and task completion time tracking functions. Could be helpful for time management in daily life.

0 Jan 15, 2022

Crontab jobs management in Python

Plan Plan is a Python package for writing and deploying cron jobs. Plan will convert Python code to cron syntax. You can easily manage you

1.2k Dec 28, 2022

Another Scheduler is a Kubernetes controller that automatically starts, stops, or restarts pods from a deployment at a specified time using a cron annotation.

Another Scheduler Another Scheduler is a Kubernetes controller that automatically starts, stops, or restarts pods from a deployment at a specified tim

66 Nov 19, 2022

Here is the live demonstration of endpoints and celery worker along with RabbitMQ

whelp-task Here is the live demonstration of endpoints and celery worker along with RabbitMQ Before running the application make sure that you have yo

0 Nov 14, 2021

The easiest way to automate your data

Hello, world! 👋 We've rebuilt data engineering for the data science era. Prefect is a new workflow management system, designed for modern infrastruct

10.9k Jan 04, 2023

Automate SQL Jobs Monitoring with python

Automate_SQLJobsMonitoring_python Using python 3rd party modules we can automate

1 Dec 27, 2021

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

CoSA is a scheduler for spatial DNN accelerators that generate high-performance schedules in one shot using mixed integer programming

44 Dec 13, 2022

Vertigo is an application used to schedule @code4tomorrow classes.

Vertigo Vertigo is an application used to schedule @code4tomorrow classes. It uses the Google Sheets API and is deployed using AWS. Documentation Lear

4 Feb 10, 2022

A simple scheduler tool that provides desktop notifications about classes and opens their meet links in the browser automatically at the start of the class.

This application provides desktop notifications about classes and opens their meet links in browser automatically at the start of the class.

14 Jun 29, 2022

Clepsydra is a mini framework for task scheduling

Intro Clepsydra is a mini framework for task scheduling All parts are designed to be replaceable. Main ideas are: No pickle! Tasks are stored in reada

15 Nov 04, 2022

A calendaring app for Django. It is now stable, Please feel free to use it now. Active development has been taken over by bartekgorny.

Django-schedule A calendaring/scheduling application, featuring: one-time and recurring events calendar exceptions (occurrences changed or cancelled)

814 Dec 26, 2022

Ffxiv-blended-job-icons - All action icons for each class/job are blended together to create new backgrounds for each job/class icon!

ffxiv-blended-job-icons All action icons for each class/job are blended together to create new backgrounds for each job/class icon! I used python to c

2 Jul 07, 2022

generate HPC scheduler systems jobs input scripts and submit these scripts to HPC systems and poke until they finish

Related tags

Overview

DPDispatcher

Installation

Usage

Contributing

Comments

Some environment info:

Description

Locate the problem

Name Version Build Channel

Summary

DeePMD-kit Version

TensorFlow Version

Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

Details

Description

Releases(v0.5.1)

v0.5.1(Jan 6, 2023)

What's Changed

v0.5.0(Jan 5, 2023)

What's Changed

v0.4.19(Nov 3, 2022)

What's Changed

v0.4.18(Sep 18, 2022)

What's Changed

v0.4.17(Sep 2, 2022)

v0.4.16(Aug 11, 2022)

v0.4.15(Aug 2, 2022)

v0.4.14(Jul 12, 2022)

v0.4.13(Jul 6, 2022)

v0.4.12(Jun 30, 2022)

v0.4.11(Jun 22, 2022)

What's Changed

v0.4.10(Jun 9, 2022)

v0.4.9(May 9, 2022)

Breaking Change

What's Changed

v0.4.8(Apr 18, 2022)

v0.4.7(Mar 17, 2022)

v0.4.6(Feb 17, 2022)

v0.4.5(Feb 11, 2022)

v0.4.4(Feb 8, 2022)

v0.4.3(Jan 19, 2022)

v0.4.2(Dec 23, 2021)

v0.4.1(Dec 4, 2021)

0.4.0(Dec 2, 2021)

v0.3.46(Nov 20, 2021)

v0.3.45(Nov 1, 2021)

v0.3.44(Oct 29, 2021)

v0.3.43(Oct 12, 2021)

v0.3.42(Sep 28, 2021)

v0.3.41(Sep 9, 2021)

v0.3.40(Sep 9, 2021)

Owner

DeepModeling

dragonscales is a highly customizable asynchronous job-scheduler framework

A flexible python library for building your own cron-like system, with REST APIs and a Web UI.

Python job scheduling for humans.

Remote task execution tool

Python-Repeated-Timer is an open-source & highly performing timer using only standard-libraries.

A task scheduler with task scheduling, timing and task completion time tracking functions

Crontab jobs management in Python

Another Scheduler is a Kubernetes controller that automatically starts, stops, or restarts pods from a deployment at a specified time using a cron annotation.

Here is the live demonstration of endpoints and celery worker along with RabbitMQ

The easiest way to automate your data

Automate SQL Jobs Monitoring with python

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

Vertigo is an application used to schedule @code4tomorrow classes.

A simple scheduler tool that provides desktop notifications about classes and opens their meet links in the browser automatically at the start of the class.

Clepsydra is a mini framework for task scheduling

A calendaring app for Django. It is now stable, Please feel free to use it now. Active development has been taken over by bartekgorny.

Ffxiv-blended-job-icons - All action icons for each class/job are blended together to create new backgrounds for each job/class icon!

Aiorq is a distributed task queue with asyncio and redis

A powerful workflow engine implemented in pure Python

generate HPC scheduler systems jobs input scripts and submit these scripts to HPC systems and poke until they finish