A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

Overview

ParallelFold

Author: Bozitao Zhong

This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model) of Alphafold2 local version.

How to install

First you should install Alphafold2. You can choose one of the following methods to install Alphafold locally.

  • Use official version from DeepMind with docker.
  • There are some other versions install Alphafold without docker.
  • Also you can use my guide which based on non_docker version and it can adjust to different cuda versions (cuda driver >= 10.1)

Then, put these 4 files in your Alphafold folder, this folder should have an original run_alphafold.py file, and I use a run_alphafold.sh file to run Alphafold easily (learned from non_docker version)

4 files:

  • run_alphafold.py: modified version of original run_alphafold.py, it skips featuring steps when there exists feature.pkl in output folder
  • run_alphaold.sh: bash script to run run_alphafold.py
  • run_feature.py: modified version of original run_alphafold.py, it exit python process after finished writing feature.pkl
  • run_feature.sh: bash scripts to run run_feature.py

How to run

First, you need CPUs to run run_feature.sh:

./run_feature.sh -d data -o output -m model_1 -f input/test3.fasta -t 2021-07-27

8 CPUs is enough, according to my test, more CPUs won't help with speed.

GPU can accelerate the hhblits step (but I think you choose this repo because GPU is expensive)

Featuring step will output the feature.pkl and MSA folder in your output folder: ./output/JOBNAME/

PS: Here I put my input files in an input folder to better organize my files, you can remove this.

Second, you can run run_alphafold.sh using GPU:

./run_alphafold.sh -d data -o output -m model_1,model_2,model_3,model_4,model_5 -f input/test.fasta -t 2021-07-27

If you have successfully output feature.pkl, you can have a very fast featuring step

I have also upload my scripts in SJTU HPC (using slurm): sub_alphafold.slurm and sub_feature.slurm

Other Files

In ./Alphafold folder, I modified some python files (hhblits.py, hmmsearch.py, jackhmmer.py) , give these steps more CPUs for acceleration. But these processes have been tested and shown to be unable to accelerate by providing more CPU. Maybe this is because

Probably because DeepMind uses a wrapped process, I'm trying to improve it (work in progress).

If you have any question, please send your problem in issues

Comments
  • 运行脚本后,还是有问题。

    运行脚本后,还是有问题。

    博士好! 我发现我运行脚本后,cpu部分是可以正常运行了,但是GPU部分不管短序列(200+aa)还是长序列(1800+aa),都会报错,我的脚本如下: #!/bin/bash module load anaconda/2020.11 source activate /data/home/zhoujy/run/alphafold2 ./run_feature.sh -d /data/public/alphafold2 -o /data/home/zhoujy/run/output -m model_1 -f /data/home/zhoujy/run/input/Q9NYP9.fasta -t 2021-07-27 ./run_alphafold.sh -d /data/public/alphafold2 -o /data/home/zhoujy/run/output -m model_1,model_2,model_3,model_4,model_5 -f /data/home/zhoujy/run/input/Q9NYP9.fasta -t 2021-07-27

    用了1张GPU卡提交的。

    报错内容如下:

    87 I0927 17:05:14.162350 139818804778816 xla_bridge.py:226] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available. 88 I0927 17:05:23.883118 139818804778816 run_alphafold.py:272] Have 5 models: ['model_1', 'model_2', 'model_3', 'model_4', 'model_5'] 89 I0927 17:05:23.883379 139818804778816 run_alphafold.py:285] Using random seed 491376288278862761 for the data pipeline 90 I0927 17:05:23.892619 139818804778816 run_alphafold.py:151] Running model model_1 91 I0927 17:05:34.480318 139818804778816 model.py:131] Running predict with shape(feat) = {'aatype': (4, 233), 'residue_index': (4, 233), 'seq_length': (4,) , 'template_aatype': (4, 4, 233), 'template_all_atom_masks': (4, 4, 233, 37), 'template_all_atom_positions': (4, 4, 233, 37, 3), 'template_sum_probs': (4 , 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 233), 'msa_mask': (4, 508, 233), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'templat e_mask': (4, 4), 'template_pseudo_beta': (4, 4, 233, 3), 'template_pseudo_beta_mask': (4, 4, 233), 'atom14_atom_exists': (4, 233, 14), 'residx_atom14_to_ atom37': (4, 233, 14), 'residx_atom37_to_atom14': (4, 233, 37), 'atom37_atom_exists': (4, 233, 37), 'extra_msa': (4, 5120, 233), 'extra_msa_mask': (4, 51 20, 233), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 233), 'true_msa': (4, 508, 233), 'extra_has_deletion': (4, 5120, 233), 'extra_deletion_v alue': (4, 5120, 233), 'msa_feat': (4, 508, 233, 49), 'target_feat': (4, 233, 22)} 92 2021-09-27 17:05:35.143686: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:81] Couldn't get ptxas version string: Internal: Run ning ptxas --version returned 32512 93 2021-09-27 17:05:35.324896: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilati on of ptx to sass: 'Internal: ptxas exited with non-zero error code 32512, output: ' If the error message indicates that a file could not be written, pl ease verify that sufficient filesystem space is provided. 94 Fatal Python error: Aborted 95 96 Thread 0x00007f2a1a311740 (most recent call first): 97 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/interpreters/xla.py", line 387 in backend_compile 98 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/interpreters/xla.py", line 324 in xla_primitive_callable 99 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/util.py", line 188 in cached 100 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/util.py", line 195 in wrapper 101 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/interpreters/xla.py", line 275 in apply_primitive 102 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/core.py", line 612 in process_primitive 103 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/core.py", line 267 in bind 104 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 388 in shift_right_logical 105 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/prng.py", line 229 in threefry_seed 106 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/prng.py", line 191 in seed_with_impl 107 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/random.py", line 105 in PRNGKey 108 File "/data/run01/zhoujy/ParallelFold-main/alphafold/model/model.py", line 133 in predict 109 File "/data/run01/zhoujy/ParallelFold-main/run_alphafold.py", line 158 in predict_structure 110 File "/data/run01/zhoujy/ParallelFold-main/run_alphafold.py", line 289 in main 111 File "/data/home/zhoujy/.local/lib/python3.8/site-packages/absl/app.py", line 258 in _run_main 112 File "/data/home/zhoujy/.local/lib/python3.8/site-packages/absl/app.py", line 312 in run 113 File "/data/run01/zhoujy/ParallelFold-main/run_alphafold.py", line 316 in

    从92-113行,不论序列长短都会出现这种报错。这是什么原因引起的呢? @Zuricho

    opened by zhoujingyu13687306871 16
  • Where Can I find The Protein sequence?

    Where Can I find The Protein sequence?

    After Reading the Article, AlphaFold Deployment and Optimization on HPC Platform, I want make some experiments according to the arctile, But I cannot find the Protein sequence online. Can you tell me the way to downloading the fasta file in the article?

    opened by yanchenmochen 4
  • How to run GPU part?

    How to run GPU part?

    How do I run model inference on GPU part of the process after featurization step? Does the model inference step automatically find feature.pkl in some folder?

    opened by hrzolix 4
  • How to accelerate the HHBLITS step with GPU

    How to accelerate the HHBLITS step with GPU

    Halo! Thanks for your good job! I have some question about this job:

    Q1: Do you Know how to accelerate the HHBLITS step with GPU? image

    Q2: I use --cpu 8 to run jackhmmer but alway just use 2 cpu and I dont know why

    image

    opened by Licko0909 4
  • 2022-01-11 09:19:03.536275: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code 32512, output: '  If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided. Fatal Python error: Aborted

    2022-01-11 09:19:03.536275: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code 32512, output: ' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided. Fatal Python error: Aborted

    2022-01-11 09:19:02.638037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 28422 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:41:00.0, compute capability: 7.0) I0111 09:19:03.171788 47078973446272 model.py:165] Running predict with shape(feat) = {'aatype': (4, 45), 'residue_index': (4, 45), 'seq_length': (4,), 'template_aatype': (4, 4, 45), 'template_all_atom_masks': (4, 4, 45, 37), 'template_all_atom_positions': (4, 4, 45, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 45), 'msa_mask': (4, 508, 45), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 45, 3), 'template_pseudo_beta_mask': (4, 4, 45), 'atom14_atom_exists': (4, 45, 14), 'residx_atom14_to_atom37': (4, 45, 14), 'residx_atom37_to_atom14': (4, 45, 37), 'atom37_atom_exists': (4, 45, 37), 'extra_msa': (4, 5120, 45), 'extra_msa_mask': (4, 5120, 45), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 45), 'true_msa': (4, 508, 45), 'extra_has_deletion': (4, 5120, 45), 'extra_deletion_value': (4, 5120, 45), 'msa_feat': (4, 508, 45, 49), 'target_feat': (4, 45, 22)} 2022-01-11 09:19:03.503247: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:81] Couldn't get ptxas version string: Internal: Running ptxas --version returned 32512 2022-01-11 09:19:03.536275: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code 32512, output: ' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided. Fatal Python error: Aborted

    Thread 0x00002ad16d7d1880 (most recent call first): File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 360 in backend_compile File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 297 in xla_primitive_callable File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/_src/util.py", line 179 in cached File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/_src/util.py", line 186 in wrapper File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 248 in apply_primitive File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 603 in process_primitive File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 264 in bind File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 382 in shift_right_logical File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/_src/random.py", line 75 in PRNGKey File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/alphafold/model/model.py", line 167 in predict File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/run_alphafold.py", line 210 in predict_structure File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/run_alphafold.py", line 429 in main File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258 in _run_main File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312 in run File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/run_alphafold.py", line 455 in ./run_alphafold.sh: line 233: 7015 Aborted python $alphafold_script --fasta_paths=$fasta_path --model_names=$model_selection --data_dir=$data_dir --output_dir=$output_dir --jackhmmer_binary_path=$jackhmmer_binary_path --hhblits_binary_path=$hhblits_binary_path --hhsearch_binary_path=$hhsearch_binary_path --hmmsearch_binary_path=$hmmsearch_binary_path --hmmbuild_binary_path=$hmmbuild_binary_path --kalign_binary_path=$kalign_binary_path --uniref90_database_path=$uniref90_database_path --mgnify_database_path=$mgnify_database_path --bfd_database_path=$bfd_database_path --small_bfd_database_path=$small_bfd_database_path --uniclust30_database_path=$uniclust30_database_path --uniprot_database_path=$uniprot_database_path --pdb70_database_path=$pdb70_database_path --pdb_seqres_database_path=$pdb_seqres_database_path --template_mmcif_dir=$template_mmcif_dir --max_template_date=$max_template_date --obsolete_pdbs_path=$obsolete_pdbs_path --db_preset=$db_preset --model_preset=$model_preset --benchmark=$benchmark --amber_relaxation=$amber_relaxation --recycling=$recycling --run_feature=$run_feature --logtostderr

    opened by chenshixinnb 3
  • ValueError: jaxlib is version 0.1.69, but this version of jax requires version 0.1.74.

    ValueError: jaxlib is version 0.1.69, but this version of jax requires version 0.1.74.

    根据您的步骤安装conda环境, 在conda环境中执行:import jax; print(jax.devices()) 报错:ValueError: jaxlib is version 0.1.69, but this version of jax requires version 0.1.74. 请问如何解决呢,谢谢!

    opened by chenshixinnb 3
  • somthing wrong occured when I run the job

    somthing wrong occured when I run the job

    hi,dear author , I installed the required modules according to the link requirements, but the following error occurred when I was running the script. Can you help me find out what is causing it? My installation steps are as follows: 1、conda create --prefix=/data/home/zhoujy/run/alphafold2 python=3.8 2、conda activate /data/home/zhoujy/run/alphafold2 3、conda install cudatoolkit=10.1 cudnn 4、pip install tensorflow==2.3.0 5、pip install biopython==1.79 chex==0.0.7 dm-haiku==0.0.4 dm-tree==0.1.6 immutabledict==2.0.0 jax==0.2.14 ml-collections==0.1.0 6、pip install --upgrade jax jaxlib==0.1.69+cuda101 -f https://storage.googleapis.com/jax-releases/jax_releases.html

    and then , I run the script:

    #!/bin/bash module load anaconda/2020.11 source activate /data/home/zhoujy/run/alphafold2 ./run_feature.sh -d /data/public/alphafold2 -o /data/home/zhoujy/run/output -m model_1 -f /data/home/zhoujy/run/input/Tb927.10.2950.fasta -t 2021-07-27

    result show as follows: Traceback (most recent call last): File "/data/run01/zhoujy/ParallelFold-main/run_feature.py", line 33, in from alphafold.model import data File "/data/run01/zhoujy/ParallelFold-main/alphafold/model/data.py", line 20, in from alphafold.model import utils File "/data/run01/zhoujy/ParallelFold-main/alphafold/model/utils.py", line 21, in import haiku as hk File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/haiku/init.py", line 17, in from haiku import data_structures File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/haiku/data_structures.py", line 17, in from haiku._src.data_structures import to_immutable_dict File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/haiku/_src/data_structures.py", line 30, in from haiku._src import utils File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/haiku/_src/utils.py", line 24, in import jax File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/init.py", line 16, in from .api import ( File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/api.py", line 38, in from . import core File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/core.py", line 31, in from . import dtypes File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/dtypes.py", line 31, in from .lib import xla_client File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/lib/init.py", line 51, in from jaxlib import pytree ImportError: cannot import name 'pytree' from 'jaxlib' (/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jaxlib/init.py)

    why ? I need you help

    opened by zhoujingyu13687306871 2
  • Limit RAM usage

    Limit RAM usage

    Im trying to run a fasta file with 3643 in length. MSA part was done, but the inference part tried to allocate 80 GB of VRAM on GPU which I dont have access to, Graphic cards are NVIDIA Tesla V100 16 GB. Now im trying to run inference on CPU which is a very slow process, and the job keeps using a lot of RAM and expand the usage as the time passes. Can I limit usage of RAM somehow? Or can I run inference on more graphic cards maybe with parallel process?

    opened by hrzolix 1
  • GPU利用率问题

    GPU利用率问题

    博士好!我昨天进行多次尝试后,现在可以运行了,但是我发现运行run_alphafold.sh脚本的时候,涉及GPU计算部分,在相当长的一段时间处于CPU运行状态,GPU利用率长时间为0,我尝试计算一条序列长为2000的蛋白质,用了4个V100的卡,计算了9天,这个速度和情况这个是否正常呢?另外前面在安装tensorflow阶段,是否有必要安装GPU版的tensorflow呢?

    @Zuricho

    opened by zhoujingyu13687306871 1
  • Error after GPU part

    Error after GPU part

    Hi, after installation the "CPU part" (jackhammer and hhblits) work well. But when i start the gpu part, i've got this error message: TypeError: take requires ndarray or scalar arguments, got <class 'list'> at position 0.

    1st part: ./run_feature.sh -d data -o ./tmp -m model_1,model_2,model_3,model_4,model_5 -f ./query/1crn.fasta -t 2021-07-27 2st part: ./run_alphafold.sh -d data -o ./tmp -m model_1,model_2,model_3,model_4,model_5 -f ./query/1crn.fasta -t 2021-07-27

    Full error message: File "/softwares/alphafold/run_alphafold.py", line 316, in app.run(main) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/softwares/alphafold/run_alphafold.py", line 289, in main predict_structure( File "/softwares/alphafold/run_alphafold.py", line 188, in predict_structure relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein) File "/softwares/alphafold/alphafold/relax/relax.py", line 58, in process out = amber_minimize.run_pipeline( File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 482, in run_pipeline ret.update(get_violation_metrics(prot)) File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 356, in get_violation_metrics structural_violations, struct_metrics = find_violations(prot) File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 338, in find_violations violations = folding.find_structural_violations( File "/softwares/alphafold/alphafold/model/folding.py", line 757, in find_structural_violations atom14_atom_radius = batch['atom14_atom_exists'] * utils.batched_gather( File "/softwares/alphafold/alphafold/model/utils.py", line 39, in batched_gather return take_fn(params, indices) File "/softwares/alphafold/alphafold/model/utils.py", line 36, in take_fn = lambda p, i: jnp.take(p, i, axis=axis) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 5383, in take return _take(a, indices, None if axis is None else operator.index(axis), out, File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback return fun(*args, **kwargs) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/api.py", line 411, in cache_miss out_flat = xla.xla_call( File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1618, in bind return call_bind(self, fun, *args, **params) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1609, in call_bind outs = primitive.process(top_trace, fun, tracers, params) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1621, in process return trace.process_call(self, fun, tracers, params) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 615, in process_call return primitive.impl(f, *tracers, **params) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 622, in _xla_call_impl compiled_fun = _xla_callable(fun, device, backend, name, donated_invars, File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/linear_util.py", line 262, in memoized_fun ans = call(fun, *args) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 694, in _xla_callable return lower_xla_callable(fun, device, backend, name, donated_invars, *arg_specs).compile().unsafe_call File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 702, in lower_xla_callable jaxpr, out_avals, consts = pe.trace_to_jaxpr_final( File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1522, in trace_to_jaxpr_final jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(fun, main, in_avals) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1500, in trace_to_subjaxpr_dynamic ans = fun.call_wrapped(*in_tracers) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/linear_util.py", line 166, in call_wrapped ans = self.f(*args, **dict(self.params, **kwargs)) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 5390, in _take _check_arraylike("take", a) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 559, in _check_arraylike raise TypeError(msg.format(fun_name, type(arg), pos)) jax._src.traceback_util.UnfilteredStackTrace: TypeError: take requires ndarray or scalar arguments, got <class 'list'> at position 0.

    The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.


    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "/softwares/alphafold/run_alphafold.py", line 316, in app.run(main) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/softwares/alphafold/run_alphafold.py", line 289, in main predict_structure( File "/softwares/alphafold/run_alphafold.py", line 188, in predict_structure relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein) File "/softwares/alphafold/alphafold/relax/relax.py", line 58, in process out = amber_minimize.run_pipeline( File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 482, in run_pipeline ret.update(get_violation_metrics(prot)) File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 356, in get_violation_metrics structural_violations, struct_metrics = find_violations(prot) File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 338, in find_violations violations = folding.find_structural_violations( File "/softwares/alphafold/alphafold/model/folding.py", line 757, in find_structural_violations atom14_atom_radius = batch['atom14_atom_exists'] * utils.batched_gather( File "/softwares/alphafold/alphafold/model/utils.py", line 39, in batched_gather return take_fn(params, indices) File "/softwares/alphafold/alphafold/model/utils.py", line 36, in take_fn = lambda p, i: jnp.take(p, i, axis=axis) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 5383, in take return _take(a, indices, None if axis is None else operator.index(axis), out, File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 5390, in _take _check_arraylike("take", a) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 559, in _check_arraylike raise TypeError(msg.format(fun_name, type(arg), pos)) TypeError: take requires ndarray or scalar arguments, got <class 'list'> at position 0.

    opened by ebettler 1
  • Running ParallelFold on reduced database?

    Running ParallelFold on reduced database?

    Is it possible to run ParallelFold on reduced_dbs, or is it not yet supported? I tried to use -c reduced_dbs but it did not work. Then I tried modifying the bfd_path set in run_alphafold.sh, somehow it threw directory/file cannot found error. (I'm pretty sure it's there bc I'm able to run alphafold using it). Thank you for your help in advance!

    opened by xinyu-g 0
  • Is CPU acceleration failed?

    Is CPU acceleration failed?

    Last day, I make some experiments in a Server to run the ./run_alphafold.sh -d /dataset/ -o result -p monomer -m model_2 -i input/T1061.fasta and I read the log, confused, the T1061 is 949AA. ` I0822 07:33:00.806264 140553952322112 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpxbrk9wt6/output.sE 0.0001 -E 0.0001 --cpu 8 -N 1 input/T1061.fasta /dataset//uniref90/uniref90.fasta" I0822 07:33:01.157015 140553952322112 utils.py:36] Started Jackhmmer (uniref90.fasta) query I0822 07:37:27.058227 140553952322112 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 265.901 seconds I0822 07:37:27.072012 140553952322112 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpnn6am537/output.sE 0.0001 -E 0.0001 --cpu 8 -N 1 input/T1061.fasta /dataset//mgnify/mgy_clusters_2018_12.fa" I0822 07:37:27.439405 140553952322112 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query I0822 07:42:42.192071 140553952322112 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 314.752 seconds I0822 07:42:42.364272 140553952322112 hhsearch.py:85] Launching subprocess "/opt/conda/bin/hhsearch -i /tmp/tmpog4q4684/query.a3m -o /tmp/tmpog40/pdb70" I0822 07:42:42.712445 140553952322112 utils.py:36] Started HHsearch query I0822 07:44:18.199999 140553952322112 utils.py:40] Finished HHsearch query in 95.487 seconds I0822 07:44:18.555797 140553952322112 hhblits.py:128] Launching subprocess "/opt/conda/bin/hhblits -i input/T1061.fasta -cpu 4 -oa3m /tmp/tmpz9oq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /dataset//bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_optst30_2018_08" I0822 07:44:19.050110 140553952322112 utils.py:36] Started HHblits query

    I0822 09:01:02.278290 140553952322112 utils.py:40] Finished HHblits query in 4603.228 seconds ` feature extraction spend time: 5305.185729026794 feature extraction Completed succesfully

    I print the feature extraction time, find that , the 5305 is almost equals to the sum of each db search time, but according to the article, I think the feature extraction spend time should be almost equal to HHblits search, so can you explain the confusing problem?

    opened by yanchenmochen 3
  • failed to alloc 2147483648 bytes on host: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS

    failed to alloc 2147483648 bytes on host: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS

    When I use the code to compute T1050.fasta, which is composed of 700 residuses, the command line output the problem。 The Environment is GPU: A100, Ubuntu,but I use higher version jax and jaxlib, is it the problem causing this?

    (parafold) [email protected]:~# pip list | grep jax jax 0.3.15 jaxlib 0.3.15+cuda11.cudnn82

    opened by yanchenmochen 3
  • Too many command-line arguments

    Too many command-line arguments

    Hi,

    First of all, thanks for developing this tool, I'm looking forward to playing with it!

    I installed the ParallelFold into a Ubuntu 18 machine, and the full alphafold database into an external drive.

    When running the command: $ ./run_alphafold.sh -d /media/qhr/"My Passport"/alphafold/AlphaFold_DB -o output -p monomer_ptm -i input/GA98.fasta -m model_1 -f

    I get the Error: Too many command-line arguments.

    Also get the same error by calling directly to run_alphafold.py: $ python3 run_alphafold.py --fasta_paths=input/GA98.fasta --model_preset=monomer --data_dir=/media/qhr/"My Passport"/alphafold/AlphaFold_DB --output_dir=output --uniref90_database_path=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/uniref90 --mgnify_database_path=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/mgnify --template_mmcif_dir=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/pdb_mmcif --obsolete_pdbs_path=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/pdb_mmcif/obsolete.dat --use_gpu_relax=True bfd_database_path=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/bfd --max_template_date=2020-05-14

    Is it possible that the space in the name of the external drive "My Passport" is causing such error?

    Thanks! Ana

    opened by AnaValero 1
  • Alphafold2 v/s Parafold timings

    Alphafold2 v/s Parafold timings

    I have a fundamental doubt about the difference between Alphafold2 and Parafold running procedure, how to determine whether Parafold is doing Parallel task unlike sequential tasks performed by Alphafold2 for the first step involving Jackhmmer, Jackhmmer and HHblits searches.

    Snippets of log files obtained from running Alphafold2 and Parafold

    Alphafold2 log:

    I0409 14:04:28.020900 139865793787712 run_alphafold.py:376] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
    I0409 14:04:28.021180 139865793787712 run_alphafold.py:393] Using random seed 1420247507508611084 for the data pipeline
    I0409 14:04:28.021463 139865793787712 run_alphafold.py:161] Predicting seq1
    I0409 14:04:28.037414 139865793787712 jackhmmer.py:133] Launching subprocess "/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpm1u84thu/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1/fasta_files/seq1.fasta /alphafold_data//uniref90/uniref90.fasta"
    I0409 14:04:28.111756 139865793787712 utils.py:36] Started Jackhmmer (uniref90.fasta) query
    I0409 14:10:17.276236 139865793787712 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 349.164 seconds
    I0409 14:10:17.462168 139865793787712 jackhmmer.py:133] Launching subprocess "/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpub1qi595/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /fasta_files/seq1.fasta /alphafold_data//mgnify/mgy_clusters_2018_12.fa"
    I0409 14:10:17.513182 139865793787712 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
    I0409 14:16:32.112656 139865793787712 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 374.599 seconds
    I0409 14:16:33.369129 139865793787712 hhsearch.py:85] Launching subprocess "/.conda/envs/alphafold/bin/hhsearch -i /tmp/tmpyot74k7r/query.a3m -o /tmp/tmpyot74k7r/output.hhr -maxseq 1000000 -d /alphafold_data//pdb70/pdb70"
    I0409 14:16:33.466009 139865793787712 utils.py:36] Started HHsearch query
    I0409 14:22:32.148045 139865793787712 utils.py:40] Finished HHsearch query in 358.682 seconds
    I0409 14:22:32.838686 139865793787712 hhblits.py:128] Launching subprocess "/.conda/envs/alphafold/bin/hhblits -i /fasta_files/seq1.fasta -cpu 4 -oa3m /tmp/tmpedyoxta1/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /alphafold_data//bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /alphafold_data//uniclust30/uniclust30_2018_08/uniclust30_2018_08"
    I0409 14:22:32.926801 139865793787712 utils.py:36] Started HHblits query
    I0409 18:56:30.223437 139865793787712 utils.py:40] Finished HHblits query in 16437.296 seconds
    

    Parafold log:

    I0427 21:17:27.915049 140305630689088 run_alphafold.py:397] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
    I0427 21:17:27.915312 140305630689088 run_alphafold.py:414] Using random seed 1534697036303804749 for the data pipeline
    I0427 21:17:27.915629 140305630689088 run_alphafold.py:165] Predicting seq2
    I0427 21:17:27.925500 140305630689088 jackhmmer.py:133] Launching subprocess "/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmp5fo28348/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /fasta_files/seq2.fasta /alphafold_data//uniref90/uniref90.fasta"
    I0427 21:17:27.996705 140305630689088 utils.py:36] Started Jackhmmer (uniref90.fasta) query
    I0427 21:23:54.643056 140305630689088 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 386.646 seconds
    I0427 21:23:54.829476 140305630689088 jackhmmer.py:133] Launching subprocess "/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmprs3za6w_/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /fasta_files/seq2.fasta /alphafold_data//mgnify/mgy_clusters_2018_12.fa"
    I0427 21:23:54.875119 140305630689088 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
    I0427 21:31:38.409492 140305630689088 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 463.534 seconds
    I0427 21:31:39.768360 140305630689088 hhsearch.py:85] Launching subprocess "/.conda/envs/alphafold/bin/hhsearch -i /tmp/tmpjgr58ebb/query.a3m -o /tmp/tmpjgr58ebb/output.hhr -maxseq 1000000 -d /alphafold_data//pdb70/pdb70"
    I0427 21:31:39.850885 140305630689088 utils.py:36] Started HHsearch query
    I0427 21:39:23.420352 140305630689088 utils.py:40] Finished HHsearch query in 463.569 seconds
    I0427 21:39:24.173583 140305630689088 hhblits.py:128] Launching subprocess "/.conda/envs/alphafold/bin/hhblits -i /fasta_files/seq2.fasta -cpu 4 -oa3m /tmp/tmpmzl5arhr/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /alphafold_data//bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /alphafold_data//uniclust30/uniclust30_2018_08/uniclust30_2018_08"
    I0427 21:39:24.259592 140305630689088 utils.py:36] Started HHblits query
    I0428 01:34:31.302148 140305630689088 utils.py:40] Finished HHblits query in 14107.042 seconds
    

    They look similar to me, and both use 8cpus, 8cpus, and 4cpus, respectively. Please clarify this for me.

    Thank you Aditi

    opened by adi1bioinfo 0
  • An error in feature generation

    An error in feature generation

    Hi, When I used your new version to make fearure.pkl, this error occurred, could you give any advice on how to solve it?

    FATAL Flags parsing error: Unknown command line flag 'model_names'. Did you mean: model_preset ? Pass --helpshort or --helpfull to see help on flags.

    opened by YiningWang2 1
Releases(v1.1)
Owner
Bozitao Zhong
Protein Design
Bozitao Zhong
Code for MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks

MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks This is the code for the paper: MentorNet: Learning Data-Driven Curriculum fo

Google 302 Dec 23, 2022
A Pytorch reproduction of Range Loss, which is proposed in paper 《Range Loss for Deep Face Recognition with Long-Tailed Training Data》

RangeLoss Pytorch This is a Pytorch reproduction of Range Loss, which is proposed in paper 《Range Loss for Deep Face Recognition with Long-Tailed Trai

Youzhi Gu 7 Nov 27, 2021
The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

WSRGlow The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution. Audio sa

Kexun Zhang 96 Jan 03, 2023
Deep Learning for Time Series Classification

Deep Learning for Time Series Classification This is the companion repository for our paper titled "Deep learning for time series classification: a re

Hassan ISMAIL FAWAZ 1.2k Jan 02, 2023
Implementation of "Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis"

Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis Abstract: This work targets at using a general deep lea

163 Dec 14, 2022
dyld_shared_cache processing / Single-Image loading for BinaryNinja

Dyld Shared Cache Parser Author: cynder (kat) Dyld Shared Cache Support for BinaryNinja Without any of the fuss of requiring manually loading several

cynder 76 Dec 28, 2022
Python package for downloading ECMWF reanalysis data and converting it into a time series format.

ecmwf_models Readers and converters for data from the ECMWF reanalysis models. Written in Python. Works great in combination with pytesmo. Citation If

TU Wien - Department of Geodesy and Geoinformation 31 Dec 26, 2022
Event-forecasting - Event Forecasting Algorithms With Python

event-forecasting Event Forecasting Algorithms Theory Correlating events in comp

Intellia ICT 4 Feb 15, 2022
Code for IntraQ, PyTorch implementation of our paper under review

IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization paper Requirements Python = 3.7.10 Pytorch == 1.7

1 Nov 19, 2021
Code Repository for Liquid Time-Constant Networks (LTCs)

Liquid time-constant Networks (LTCs) [Update] A Pytorch version is added in our sister repository: https://github.com/mlech26l/keras-ncp This is the o

Ramin Hasani 553 Dec 27, 2022
SuperSDR: multiplatform KiwiSDR + CAT transceiver integrator

SuperSDR SuperSDR integrates a realtime spectrum waterfall and audio receive from any KiwiSDR around the world, together with a local (or remote) cont

Marco Cogoni 30 Nov 29, 2022
This a classic fintech problem that introduces real life difficulties such as data imbalance. Check out the notebook to find out more!

Credit Card Fraud Detection Introduction Online transactions have become a crucial part of any business over the years. Many of those transactions use

Jonathan Hasbani 0 Jan 20, 2022
Non-Attentive-Tacotron - This is Pytorch Implementation of Google's Non-attentive Tacotron.

Non-attentive Tacotron - PyTorch Implementation This is Pytorch Implementation of Google's Non-attentive Tacotron, text-to-speech system. There is som

Jounghee Kim 46 Dec 19, 2022
Source code of our work: "Benchmarking Deep Models for Salient Object Detection"

SALOD Source code of our work: "Benchmarking Deep Models for Salient Object Detection". In this works, we propose a new benchmark for SALient Object D

22 Dec 30, 2022
Locally cache assets that are normally streamed in POPULATION: ONE

Population One Localizer This is no longer needed as of the build shipped on 03/03/22, thank you bigbox :) Locally cache assets that are normally stre

Ahman Woods 2 Mar 04, 2022
Official implementation of Monocular Quasi-Dense 3D Object Tracking

Monocular Quasi-Dense 3D Object Tracking Monocular Quasi-Dense 3D Object Tracking (QD-3DT) is an online framework detects and tracks objects in 3D usi

Visual Intelligence and Systems Group 441 Dec 20, 2022
This is a library for training and applying sparse fine-tunings with torch and transformers.

This is a library for training and applying sparse fine-tunings with torch and transformers. Please refer to our paper Composable Sparse Fine-Tuning f

Cambridge Language Technology Lab 37 Dec 30, 2022
The official PyTorch code implementation of "Personalized Trajectory Prediction via Distribution Discrimination" in ICCV 2021.

Personalized Trajectory Prediction via Distribution Discrimination (DisDis) The official PyTorch code implementation of "Personalized Trajectory Predi

25 Dec 20, 2022
Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples This repository is the official implementation of paper [Qimera: Data-free Q

Kanghyun Choi 21 Nov 03, 2022
A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL

🌟 HNSW + PostgreSQL Indexer HNSWPostgreSQLIndexer Jina is a production-ready, scalable Indexer for the Jina neural search framework. It combines the

Jina AI 25 Oct 14, 2022