E2EDNA2 - An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides

Related tags

Deep LearningE2EDNA2
Overview

Documentation

E2EDNA 2.0 - OpenMM Implementation of E2EDNA !

An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides.

Michael Kilgour, Tao Liu, Ilya S. Dementyev, Lena Simine

mjakilgour gmail com

For original version of E2EDNA: J. Chem. Inf. Model. 2021, 61, 9, 4139–4144 (https://doi.org/10.1021/acs.jcim.1c00696) https://github.com/InfluenceFunctional/E2EDNA

Installation

This installation path has been tested on macOS and it relies on conda and pip package managers.

  1. Download the E2EDNA 2.0 package from this repository.
  2. Register and download NUPACK from http://www.nupack.org/downloads, you will need the path to ~/nupack-###/source/package directory
  3. In the E2EDNA2 directory please modify the macos_installation.sh script: update the path to nupack (see step 2)
  4. From E2EDNA2 folder run macos_installation.sh.
    • Caveat: in case conda activate e2edna command gives an error or if after the script finishes e2edna enviroment has not been activated, please either replace the activation command with with source activate /path-to-env/e2edna
    • OR alternatively copy paste commands from the script without modifications to command line and run one by one, this will go around the unconfigured shell issue.
  5. Register and download MMB from https://simtk.org/projects/rnatoolbox . Place the Installer### folder into the e2edna folder. NB: Do not specify DYLD_LIBRARY_PATH against the recommendations of the MMB installation guide. This is to avoid interference with the OpenMM module.
  6. Update 3 paths in main.py:
 params['workdir'] = '/path-to-e2edna/localruns'                         # working directory   
       
 params['mmb dir'] = '/path-to-e2edna/e2edna/Installer.###/lib'          # path to MMB dylib files
      
 params['mmb']     = '/path-to-e2edna/Installer.###/bin/MMB-executable'  # path to MMB executable    

Running a job

Quickstart

  • Set 'params' in main.py, as indicated in "Installation".
  • Run the bash script automate_tests.sh to test all 8 modes automatically.
  • Alternatively, a single run can be carried out by run_num, mode, aptamer sequence, and ligand's structural file. For example,
python main.py --run_num=1 --mode='free aptamer' --aptamerSeq='TAATGTTAATTG' --ligand='False' --ligandType='' --ligandSeq=''
python main.py --run_num=2 --mode='full dock' --aptamerSeq='TAATGTTAATTG' --ligand='YQTQ.pdb' --ligandType='peptide' --ligandSeq='YQTQTNSPRRAR'
    
# --ligand='False'        # if no ligand. --ligandType and --ligandSeq will be ignored.
# --ligandType='peptide'  # or 'DNA' or 'RNA' or 'other'. Assuming 'other' ligand can be described by Amber14 force field.
# --ligandSeq=''          # if no sequence. For instance, when ligandType is 'other'

Functionality: Eight different modes of operation

E2EDNA 2.0 takes in a DNA aptamer sequence in FASTA format, and optionally a short peptide or other small molecule, and returns details of the aptamer structure and binding behaviour. This code implements several distinct analysis modes so users may customize the level of computational cost and accuracy.

  • 2d structure → returns NUPACK or seqfold analysis of aptamer secondary structure. Very fast, O(<1s). If using NUPACK, includes probability of observing a certain fold and of suboptimal folds within kT of the minimum.
  • 3d coarse → returns MMB fold of the best secondary structure. Fast O(5-30 mins). Results in a strained 3D structure which obeys base pairing rules and certain stacking interactions.
  • 3d smooth → identical to '3d coarse', with a short MD relaxation in solvent. ~Less than double the cost of '3d coarse' depending on relaxation time.
  • coarse dock → uses the 3D structure from '3d coarse' as the initial condition for a LightDock simulation, and returns best docking configurations and scores. Depending on docking parameters, adds O(5-30mins) to '3d coarse'.
  • smooth dock → identical to 'coarse dock', instead using the relaxed structure from '3d smooth'. Similar cost.
  • free aptamer → fold the aptamer in MMB and run extended MD sampling to identify a representative, equilibrated 2D and 3D structure. Slow O(hours).
  • full dock → Return best docking configurations and scores from a LightDock run using the fully-equilibrated aptamer structure 'free aptamer'. Similar cost (LightDock is relatively cheap)
  • full binding → Same steps as 'full docking', with follow-up extended MD simulation of the best binding configuration. Slowest O(hours).

Test run: inputs and outcomes

Running this script automate_tests.sh will automatically run simple very light simulations of all 8 modes. Here we explain what outputs to look for and what success looks like.

  • Mode 1:2d structure Input: fasta sequence, e.g, CGCGCGCGCGCGC

Outputs:

Success evaluation: observe the dot-parenthesis representation for 2d structure, e.g., ..(...)..

  • Mode 2:3d coarse

Input: ‘3d unrefined’, fasta sequence, e.g, CGCGCGCGCGCGC

Outputs:

Success evaluation: Visualize foldedAptamer_0.pdb in VMD or PyMOL

  • Mode 3:3d coarse

Input: fasta sequence, e.g, CGCGCGCGCGCGC

Outputs:

Success evaluation:

  • Mode 4:coarse dock

Input: fasta sequence, e.g, CGCGCGCGCGCGC; PDB of target ligand, e.g., ‘target.pdb’

Outputs:

Success evaluation:

  • Mode 5:smooth dock Input: fasta sequence, e.g, CGCGCGCGCGCGC; PDB of target ligand, e.g., ‘target.pdb’

Outputs:

Success evaluation:

  • Mode 6: free aptamer Given a DNA sequence, its secondary structure will be predicted and represented by a contact map of dot-parenthesis notation. Under the guidance of the predicted secondary structure, the sequence will then be folded into an initial three dimensional conformation. Last step is to run a molecular-dynamics simulation to sample its conformational space and find out the representative conformation from the MD trajectory. (if we ask contact predictor for >1 ssStructure)

Input: fasta sequence, e.g, CGCGCGCGCGCGC

Modifications to the code: set params[‘mode’] = ‘free aptamer’ params['sequence'] =’CGCGCGCGCGCGC’

Outputs: Secondary structure prediction: such as ((....))....((.(...).)).. in “record.txt” MMB folded structure: “foldedAptamer_0.pdb”

MD simulation: Binary trajectory: “clean_foldedAptamer_0_processed_complete_trajectory.dcd” Topology: “clean_foldedAptamer_0_processed.pdb” Representative conformation: “repStructure_0.pdb”

Success evaluation: The DCD trajectory file is generated, and file “log.txt” shows that the MD sampling of free aptamer is 100% complete. Visualize MD trajectory of free aptamer using the topology and the binary trajectory file. Visualize representative conformation of the DNA aptamer.

  • Mode 7: full dock Given a DNA sequence, its secondary structure will be predicted and represented by a contact map of dot-parenthesis notation. Under the guidance of the predicted secondary structure, the sequence will then be folded into an initial three dimensional conformation. Next is to run a molecular-dynamics simulation to sample its conformational space and find out the representative conformation from the MD trajectory. Finally, the representative structure will be docked by a target ligand of interest (its structure must be provided as a PDB file).

Input: fasta sequence, e.g, CGCGCGCGCGCGC; PDB of target ligand, e.g., ‘target.pdb’ Modifications to the code: set params[‘mode’] = ‘full docking’ params['sequence'] =’CGCGCGCGCGCGC’ params[‘target’] = ‘target.pdb’ # need to update the code for this.

Outputs: Secondary structure prediction: such as ((....))....((.(...).)).. in “record.txt” MMB folded structure: “foldedAptamer_0.pdb”

MD simulation: Binary trajectory: “clean_foldedAptamer_0_processed_complete_trajectory.dcd” Topology: “clean_foldedAptamer_0_processed.pdb” Representative conformation: “repStructure_0.pdb” Docking: Aptamer-ligand complex structure: “top_1.pdb”. Docking score is in “record.txt”.

Success evaluation: The DCD trajectory file is generated, and file “log.txt” shows that the MD sampling of free aptamer is 100% complete. Visualize MD trajectory of free aptamer using the topology and the binary trajectory file. Visualize representative conformation of the DNA aptamer. Visualize aptamer-ligand complex structure.

  • Mode 8: full binding Given a DNA sequence, its secondary structure will be predicted and represented by a contact map of dot-parenthesis notation. Under the guidance of the predicted secondary structure, the sequence will then be folded into an initial three dimensional conformation. Next is to run a molecular-dynamics simulation to sample its conformational space and find out the representative conformation from the MD trajectory. The representative structure will be docked by a target ligand of interest (its structure must be provided as a PDB file). Finally, the aptamer-ligand complex molecule will be sampled by MD simulation to investigate its dynamics.

Input: fasta sequence, e.g, CGCGCGCGCGCGC; PDB of target ligand, e.g., ‘target.pdb’ Modifications: set params[‘mode’] = ‘full binding’ params['sequence'] =’CGCGCGCGCGCGC’ params[‘target’] = ‘target.pdb’ # need to update the code for this.

Outputs: Secondary structure prediction: such as ((....))....((.(...).)).. in “record.txt” MMB folded structure: “foldedAptamer_0.pdb” MD simulation of free aptamer: Binary trajectory: “clean_foldedAptamer_0_processed_complete_trajectory.dcd” Topology: “clean_foldedAptamer_0_processed.pdb” Representative conformation: “repStructure_0.pdb” Docking: Aptamer-ligand complex structure: “top_1.pdb”. Docking score is in “record.txt”. MD simulation of aptamer-ligand complex: Binary trajectory: “clean_complex_0_0_processed_trajectory.dcd” Topology: “clean_complex_0_0_processed.pdb”

Success evaluation: File “log.txt” shows that the MD sampling of free aptamer is 100% complete and the DCD trajectory file is generated. Visualize MD trajectory of free aptamer using the topology and the binary trajectory file. Visualize representative conformation of the DNA aptamer. Visualize aptamer-ligand complex structure The DCD trajectory file is generated, and file “log_complex.txt” shows that the MD sampling of aptamer-ligand is 100% complete. Visualize MD trajectory of aptamer-ligand using its binary and topolog file. It is worth noting that the aptamer might seem far apart from the target ligand, which could be a result of the periodic boundary condition. Should we correct it or leave user to do it?

MD simulation might stop at the onset with “Particle coordinate is nan” error. It could be due to the energy minimization being too aggressive so tha the coordinate gets out of boundary, then integrator cannot work on those non-sense coordinate values. In this case, re-run the pipeline.

MMB folding could take a while if multiple refolding takes place for any tricky sequence.

__ work in progress__

Physical Parameters

Default force field is AMBER 14. Other AMBER fields and explicit water models are trivial to implement. Implicit water requires moving to building systems from AMBER prmtop files. CHARMM may also be easily implemented, but hasn't been tested. AMOEBA 2013 parameters do not include nucleic acids, and AMOEBABIO18 parameters are not implemented in OpenMM.

* params['force field'] = 'AMBER'
* params['water model'] = 'tip3p'

Default parameters here - for guidance on adjustments start here.

params['box offset'] = 1.0 # nanometers
params['barostat interval'] = 25
params['friction'] = 1.0 # 1/picosecond
params['nonbonded method'] = PME
params['nonbonded cutoff'] = 1.0 # nanometers
params['ewald error tolerance'] = 5e-4
params['constraints'] = HBonds
params['rigid water'] = True
params['constraint tolerance'] = 1e-6
params['pressure'] = 1 

Increasing hydrogen mass e.g., to 4 AMU enables longer time-steps up to ~3-4 fs. See documentation for details.

params['hydrogen mass'] = 1.0 # in amu

Temperature, pH and ionic strength are taken into account for 2D folding in NUPACK, ion concentration in MD simulation, and protonation of molecules for MD (safest near 7-7.4).

params['temperature'] = 310 # Kelvin - used to predict secondary structure and for MD thermostatting
params['ionic strength'] = .163 # mmol - used to predict secondary structure and add ions to simulation box
params['pH'] = 7.4 # simulation will automatically protonate the peptide up to this pH

The peptide backbone constraint constant is the constant used to constrain backbone dihedrals. A minimum of 10000, as it is currently set, is recommended for good constraints (deviations < 5° were always seen with this value). For more info, please read README_CONSTRAINTS.md.

params['peptide backbone constraint constant'] = 10000

Implicit Solvent

params['implicit solvent'] = True
if params['implicit solvent']:
    params['implicit solvent model'] = OBC1  # only meaningful if implicit solvent is True
    params['leap template'] = 'leap_template.in'
    # TODO add more options to params: implicitSolventSaltConc, soluteDielectric, solventDielectric, implicitSolventKappa

Starting with a folded DNA aptamer structure (instead of just a FASTA sequence)

params['skip MMB'] = True  # it will skip '2d analysis' and 'do MMB'
if params['skip MMB'] is True:
    params['folded initial structure'] = 'foldedSequence_0.pdb'  # if wishing to skip MMB, must provide a folded structure
Comments
  • JOSS Review

    JOSS Review

    Hi all,

    Thanks for the invitation to review and congrats on the submission.

    The general idea behind this submission is sound, and follows-up on a 2021 publication from the same authors on E2EDNA v1.0, published in JCIM. From my understanding, the code is essentially a re-write to use OpenMM instead of Tinker as the MD engine. While this is valuable - makes it simpler to install/run - the authors do not realize, in my opinion, this change to its fullest potential. The authors repository is not so much a "package" in the traditional sense, but more of a collection of scripts that automate a certain rigid protocol. I would rather see for instance, NUPACK being an optional dependency - as a user, I could simply provide my own DNA molecules instead of being forced to use NUPACK. In this sense, I think this repository could use more work to stand out on its own compared to last year's publication.

    In addition to this comments, I have a general comment on the repository itself. The authors should take some time to clean up files that are no longer useful for the protocol or that are simply part of the development workflow. Folders named old, or IDE config folders (.idea) should not be part of a published version of the repository, specially when they are even marked to be ignored in the .gitignore file. Same with the existence of both a requirements.txt file and an environment.yml file, whereas only the latter is used. As such, I believe that the authors should spend some time cleaning up the repository and setting up a more "traditional" structure to help potential users navigate through their code base more easily.

    Further, I have a few starter questions about the manuscript, code and, licenses that I think should be clarified. Hopefully these will help the authors improve their work and repository/code.

    Licenses

    • You're licensing the tool under the Apache license but you are including data (parameter sets) that falls under a difference license. In particular, I see the parameter files for the Amoeba forcefield taken from Tinker/OpenMM almost verbatim. Did you check with the appropriate developers if this sharing of the forcefield parameter files is allowed under their license, without any attribution?

    Installation

    • The installation process is quite complex. As a user, I'd have to register and download NUPACK and MMB, as well as edit a series of files in order to get a functional installation. This is simply a suggestion for the developers to keep in mind.

    • Related to the point above, have the authors considered using conda directly to install their software, instead of a custom shell script? pdbfixer is available as a conda package, and you could specify pip packages there too, e.g. lightdock. The installation could be reduced to a simple: 1) install nupack 2) install mmb 3) run conda env create -f e2edna-env.yml.

    • On this last point, the authors should strip the granular version of the env yaml file otherwise conda will struggle with versions on anything but the authors' hardware.

    • According to the README, the code is only tested on MacOS, although I'd imagine the most use would be on a compute cluster running Linux. Have the authors tried running their code on Linux?

    Misc

    • In several sections of their documentation, the authors mention "OpenDNA". Was this the previous name of this package?
    • It would be greatly beneficial for a user to have config files with installation paths, simulation settings etc, instead of having to edit source code. Would the authors be open to this change?

    Comments on the Manuscript

    • In the "Statement of Need", the authors mention an "all-python" package several times. Being pedantic, this is not entirely true as their code relies on quite some compiled code in their dependencies (lightdock, openmm).
    opened by JoaoRodrigues 12
  • Feature Request: Argument parsing

    Feature Request: Argument parsing

    Hello,

    Would you be interested in more fully utilizing command-line argument parsing (e.g. using argparse)? I always feel a bit uncomfortable having to edit source code to use a program. It would be great if you could set the parameters strictly from the CL at runtime, such as workdir, mmb dir, and mmb, instead of editing main.py which is tracked by git.

    Additionally, using argparse would give the opportunity to provide a very helpful user interface. For instance, the user could run: python main.py --help to get a help message explaining what their options are.

    enhancement 
    opened by schackartk 10
  • 7 feature request argument parsing

    7 feature request argument parsing

    Overview

    This pull request implements argparse so that the user is less likely to need to edit source code in main.py. However, more work will need to be done to include parameters related to environmental condotions like ph, etc.

    Other than implementing argparse, functionality is the same. Some things are still a bit awkward because I didn't want to change too much beyond that.

    Affected files

    The following files have changes:

    • main.py: add shebang line, add argument parsing and validation
    • automate_tests.sh: update arguments to align with argparse
    • README.md: describe current functionality and arguments

    Notes

    main.py

    There were a few things that may need to be changed to work most efficiently and predictably.

    The relationship between --ligand, --ligand_type, and --ligand_seq is a bit complex and can probably be improved. Ideally, I think --ligand would be optional, yielding a default of None. This makes more sense than having to use --ligand False. Then --ligand_type, and --ligand_seq could also be optional with a default of None (instead of an empty string). Only when --ligand is present, you validate the others are there and if not parser.error(). I also think the authors should consider if --ligand_seq is truly required if --l;igand is either 'peptide', 'DNA' or 'RNA'. Currently this is enforced (by parser.error()), but if it is actually optional, that should be updated.

    I left the code that uses different params based on whether it is run as local or cluster, but I am not sure if it is necessary. I especially think that the hard-coded paths used when it is cluster should be removed, and turned into arguments. In which case, it is the same as the usual arguments, and may make --device obsolete if there is no difference between local and cluster.

    I implemented wildcards to help the user find their MMB paths (lib and executable) within the --mmb_dir and --mmb. I am hoping the defaults will make it so users don't have to change this argument.

    I removed the operating system argument and instead used platform to detect it. This new implementation has only been tested on my WSL system, so please check this works. One issue is if the result of platform.system().lower() doesn't match an expected value on mac. Initially mine returned Linux, which is why I ran lower() to make it 'linux' which is compatible with the previous implementation.

    Lots of argument validation now happens in get_args(), so hopefully more helpful error messages are produced.

    I added a feature so that both --aptamer and --ligand_seq can be names of files. In that case, the file contents are read in and used as the sequences. Literal strings can still be used instead of file names.

    Readme.md

    I hope my additions are helpful in describing the current functionality.

    One thing I was uncertain is the description of ligand type saying "(default: Amber14)" I didn't see this anywhere that params were set. It is not the default to any arguments I set up. If this needs to be a default, please take note of this.

    Conclusions

    Currently, all modes in automate_tests.sh run for me, so it seems that these changes are compatible. It would be great to have unit and integration tests with pytest to confirm.

    Please check that it works on MacOS still, as I have only tested on WSL.

    No additional dependencies have been added, only core libraries were used.

    Please feel free to make any changes you see fit or discuss!

    enhancement 
    opened by schackartk 7
  • Question: GPL-3.0 license required for this repo because of lightdock?

    Question: GPL-3.0 license required for this repo because of lightdock?

    Hello @brianjimenez - Hope this message finds you well.

    I am trying to figure out what license is the best choice for our E2EDNA 2.0 software and am aware that LightDock is licensed under GNU GPLv3. According to the license guide website (link) provided by GitHub, the GNU GPLv3 seems to require "larger works using a licensed work" to be under the same license. Currently our E2EDNA 2.0 is under Apache-2 license which does not include the condition of "same license". In my opinion, Apache-2 license could give some flexibility because a future version of the E2EDNA software may provide multiple options of different auto-docker package.

    A little summary of how LightDock is used in E2EDNA 2.0 now: lightdock-0.9.2 is installed by pip and the python scripts such as lightdock3.py are directly called without modification. Does our way of using LightDock fall into the category where we can only choose GNU GPLv3 for our E2EDNA 2.0? I am not sure of this question therefore would like to hear the LightDock developer's opinions.

    Thank you very much!

    question 
    opened by taoliu032 4
  • Lightdock Rust nucleic support

    Lightdock Rust nucleic support

    Dear E2DNA2 developers,

    Since you are using LightDock in some parts of your pipeline, it could be of your interest the 0.2.0 release of the Rust implementation of the framework. This new release adds support for protein-nucleic complex prediction and typically runs 5x-6x faster compared to the Python+C implementations of the Python LightDock flavor, and two orders of magnitude less amount of memory. There is more information on how to compile and use the Rust version here.

    Hope it helps!

    enhancement 
    opened by brianjimenez 3
  • Enhancement: Avoid runtime exception when

    Enhancement: Avoid runtime exception when "run" folder exists

    If the output directory for the current run already exists, right now an exception is produced:

    Start automating tests one by one...
    ====================================
    TESTING MODE #1: '2d structure'
    Traceback (most recent call last):
      File "main.py", line 229, in <module>
        opendna = opendna(params)  # instantiate the class
      File "/home/ken/personal/E2EDNA2/opendna.py", line 53, in __init__
        self.setup()  # if we don't need a workdir & MMB files (eg, give a 3D structure), don't make one.
      File "/home/ken/personal/E2EDNA2/opendna.py", line 147, in setup
        os.mkdir(self.workDir)
    FileExistsError: [Errno 17] File exists: '/home/ken/personal/E2EDNA2/localruns/run1'
    
    END OF TEST #1. Results are saved to folder "run1", where:
            2d structure: in record.txt
    

    An exception could be avoided by validating that the output directory does not exist, and providing a useful message such as "The output directory for this run already exists at './localrun/run1'", and an optional -f/--force flag could be provided to overwrite the output directory.

    opened by schackartk 2
  • Bug: Runtime exception when params['workdir'] does not exist

    Bug: Runtime exception when params['workdir'] does not exist

    When the directory in the variable params['workdir'] does not exist, the program fails at runtime:

    Traceback (most recent call last):
      File "main.py", line 229, in <module>
        opendna = opendna(params)  # instantiate the class
      File "/home/ken/personal/E2EDNA2/opendna.py", line 53, in __init__
        self.setup()  # if we don't need a workdir & MMB files (eg, give a 3D structure), don't make one.
      File "/home/ken/personal/E2EDNA2/opendna.py", line 147, in setup
        os.mkdir(self.workDir)
    FileNotFoundError: [Errno 2] No such file or directory: '/home/ken/personal/E2EDNA2/localruns/run1'
    

    This could be fixed by checking for the directory, and creating it if it does not exist:

    if not os.path.isdir(params['workdir'])
        os.mkdir(params['workdir'])
    
    opened by schackartk 2
  • Error: 'str' object is not callable; in opendna.py, line 535

    Error: 'str' object is not callable; in opendna.py, line 535

    Hello,

    I am excited to try out this tool!

    I have installed all dependencies successfully (I believe), and I am running the script automate_tests.sh. Most tests are passing, but tests 4, 5, 7, and 8 are failing during the docking step with the same exception.

    TESTING MODE #4: 'coarse dock'
    Starting Fresh Run 4
    Simulation mode: coarse dock
    Simulating TAATGTTAATTG with YQTQ.pdb
    Getting Secondary Structure(s)
    Running over 1 possible 2D structures.
    2D structure #0 is                              : .(((....))).
    
    Folding Aptamer from Sequence. Fold speed = quick.
    Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
    2D structure after MMB folding (from MDAnalysis): .(((....))).
    Initial fold fidelity = 1.000
    Initial fold fidelity = 1.000 (from MDAnalysis)
    Folded the aptamer and generated the folded structure: foldedAptamer_0.pdb
    
    No relaxation (smoothing) of the folded aptamer.
    
    Docking
    Traceback (most recent call last):
      File "main.py", line 230, in <module>
        opendnaOutput = opendna.run()  # retrieve binding information (eventually this should become a normalized c-number)    
      File "/home/ken/personal/E2EDNA2/opendna.py", line 297, in run
        outputDict['dock scores {}'.format(self.i)] = self.dock(self.pdbDict['representative aptamer {}'.format(self.i)], self.targetPDB)  # eg, "peptide.pdb" which can be created given peptide sequence by buildPeptide in function dock
      File "/home/ken/personal/E2EDNA2/opendna.py", line 535, in dock
        ld.run()
    TypeError: 'str' object is not callable
    

    I am unsure what the underlying problem is, but maybe it has to do with a mistake between:

    • The instance variable run on line 487 of instances.py: self.run = params['ld run']
    • The method run() on line 504 of instances.py: def run(self):

    Because the instance variable from line 487 is the string value set on line 220 in main.py: params['ld run'] = 'lightdock3.py'. Maybe this variable is somehow shadowing the method run(), and so it is failing to "call" str() (i.e. 'lightdock3.py'())?

    I would appreciate any help with resolving this.

    Thank you!

    bug 
    opened by schackartk 2
  • Bug: Mysterious error when using invalid mode

    Bug: Mysterious error when using invalid mode

    If the mode is misspelled or an invalid choice, an excpetion occurs:

    $ python main.py --run_num=1 --mode='fulldock' --aptamerSeq='GCGCGCGCGATATATAT' --ligand='my_ligand.pdb' --ligandType='other' --ligandSeq=''
    Traceback (most recent call last):
      File "main.py", line 229, in <module>
        opendna = opendna(params)  # instantiate the class
      File "/home/ken/personal/E2EDNA2/opendna.py", line 52, in __init__
        if self.actionDict['make workdir']:
    KeyError: 'make workdir'
    

    The exception doesn't seem to mention the invalid --mode, so the user may be confused as to what happened.

    I have confirmed that this runs fine once the mode name is corrected.

    This issue is resolved in #8 by using argparse and specifying the valid choices. Here is what is displayed from the code in that pull request:

    $ ./main.py -r 1 -m 'fulldock' -a aptamers/my_aptamer.txt -l my_ligand.pdb -t other -f
    usage: main.py [-h] [-f] -r INT -m MODE -a SEQ -l PDB [-t TYPE] [-s SEQ]
                   [-d RUN] [-p DEV] [-w DIR] [-md DIR] [-mb MMB]
    main.py: error: argument -m/--mode: invalid choice: 'fulldock' (choose from '2d structure', '3d coarse', '3d smooth', 'coarse dock', 'smooth dock', 'free aptamer', 'full dock', 'full binding')
    
    opened by schackartk 1
  • Enhancement: More control over output location

    Enhancement: More control over output location

    It seems a bit restrictive to enforce that the output directory be structured as {workdir}/run{runnum}/. Most tools allow you to specify the output directory yourself.

    This could be useful to the user (myself included) for organizing runs, and automating using a workflow manager. For instance, if I am running several combinations of aptamer, ligands, and modes, I may want my output directories to be {aptamer}/{ligand}/{mode}/. This structure is meaningful to me unlike the folder name "run1".

    While this is not resolved in #8 , it would reduce the number of arguments. Instead of having both --workdir and --run_num, you could just have a single --outdir argument.

    enhancement 
    opened by schackartk 1
  • Bug: Ligand file in a folder causes exception

    Bug: Ligand file in a folder causes exception

    If the ligand pdb file is in a folder instead of the root of the repo, an exception occurs:

    $ ls ligands/
    my_ligand.pdb
    
    $ python main.py --run_num=1 --mode='full dock' --aptamerSeq='GCGCGCGCGATATATAT' --ligand='ligands/my_ligand.pdb' --ligandType='other' --ligandSeq=''
    Starting Fresh Run 1
    Traceback (most recent call last):
      File "main.py", line 229, in <module>
        opendna = opendna(params)  # instantiate the class
      File "/home/ken/personal/E2EDNA2/opendna.py", line 53, in __init__
        self.setup()  # if we don't need a workdir & MMB files (eg, give a 3D structure), don't make one.
      File "/home/ken/personal/E2EDNA2/opendna.py", line 179, in setup
        copyfile(self.targetPDB, self.workDir + '/' + self.targetPDB)
      File "/home/ken/personal/E2EDNA2/env/lib/python3.7/shutil.py", line 121, in copyfile
        with open(dst, 'wb') as fdst:
    FileNotFoundError: [Errno 2] No such file or directory: '/home/ken/personal/E2EDNA2/localruns/run1/ligands/my_ligand.pdb'
    

    I don't see any reason that the ligand file should not be in a folder, so this should not fail.

    bug 
    opened by schackartk 2
Releases(v2.0.0)
  • v2.0.0(May 16, 2022)

    This release is associated with the JOSS publication: https://doi.org/10.21105/joss.04182 The release has also been archived on Zenodo: https://doi.org/10.5281/zenodo.6546661

    Clarification: the archive folder will have a name of "E2EDNA2-2.0.0", once downloaded from below. It refers to the version v2.0.0 of E2EDNA. The name "E2EDNA2" is inherited from the repository name.

    To view the repository: https://github.com/siminegroup/E2EDNA2/tree/v2.0.0 Full Changelog: https://github.com/siminegroup/E2EDNA2/commits/v2.0.0

    Source code(tar.gz)
    Source code(zip)
Owner
computational chemistry group at McGill University
LIAO Shuiying 6 Dec 01, 2022
[IEEE Transactions on Computational Imaging] Self-Gated Memory Recurrent Network for Efficient Scalable HDR Deghosting

Few-shot Deep HDR Deghosting This repository contains code and pretrained models for our paper: Self-Gated Memory Recurrent Network for Efficient Scal

Susmit Agrawal 4 Dec 29, 2021
Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Regression Transformer Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression . Development se

International Business Machines 27 Jan 05, 2023
Solver for Large-Scale Rank-One Semidefinite Relaxations

STRIDE: spectrahedral proximal gradient descent along vertices A Solver for Large-Scale Rank-One Semidefinite Relaxations About STRIDE is designed for

48 Dec 20, 2022
Make your AirPlay devices as TTS speakers

Apple AirPlayer Home Assistant integration component, make your AirPlay devices as TTS speakers. Before Use 2021.6.X or earlier Apple Airplayer compon

George Zhao 117 Dec 15, 2022
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Bayesian Methods for Hackers Using Python and PyMC The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chap

Cameron Davidson-Pilon 25.1k Jan 02, 2023
Official repository for CVPR21 paper "Deep Stable Learning for Out-Of-Distribution Generalization".

StableNet StableNet is a deep stable learning method for out-of-distribution generalization. This is the official repo for CVPR21 paper "Deep Stable L

120 Dec 28, 2022
This repository implements variational graph auto encoder by Thomas Kipf.

Variational Graph Auto-encoder in Pytorch This repository implements variational graph auto-encoder by Thomas Kipf. For details of the model, refer to

DaehanKim 215 Jan 02, 2023
This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection', CVPR 2019.

Code-and-Dataset-for-CapSal This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detec

lu zhang 48 Aug 19, 2022
AI-Fitness-Tracker - AI Fitness Tracker With Python

AI-Fitness-Tracker We have build a AI based Fitness Tracker using OpenCV and Pyt

Sharvari Mangale 5 Feb 09, 2022
Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Black-Box-Tuning Source code for paper "Black-Box Tuning for Language-Model-as-a

Tianxiang Sun 149 Jan 04, 2023
Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

JSON 2 XML All codes assume running from root directory. Please update the sys path at the beginning of the codes before running. Over View Json2Xml t

Nguyễn Trường Lâu 6 Aug 22, 2022
FTIR-Deep Learning - FTIR Deep Learning With Python

CANDIY-spectrum Human analyis of chemical spectra such as Mass Spectra (MS), Inf

Wei Mei 1 Jan 03, 2022
ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

Hao Su's Lab, UCSD 48 Dec 30, 2022
Improving Object Detection by Estimating Bounding Box Quality Accurately

Improving Object Detection by Estimating Bounding Box Quality Accurately Abstrac

2 Apr 14, 2022
Official PyTorch Implementation for "Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes"

PVDNet: Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes This repository contains the official PyTorch implementatio

Junyong Lee 98 Nov 06, 2022
Stock-Prediction - prediction of stock market movements using sentiment analysis and deep learning.

Stock-Prediction- In this project, we aim to enhance the prediction of stock market movements using sentiment analysis and deep learning. We divide th

5 Jan 25, 2022
the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

NeurAI 4 Jul 27, 2022
Video Matting Refinement For Python

Video-matting refinement Library (use pip to install) scikit-image numpy av matplotlib Run Static background python path_to_video.mp4 Moving backgroun

3 Jan 11, 2022
pcnaDeep integrates cutting-edge detection techniques with tracking and cell cycle resolving models.

pcnaDeep: a deep-learning based single-cell cycle profiler with PCNA signal Welcome! pcnaDeep integrates cutting-edge detection techniques with tracki

ChanLab 8 Oct 18, 2022