E2EDNA2 - An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides

Related tags

Deep LearningE2EDNA2
Overview

Documentation

E2EDNA 2.0 - OpenMM Implementation of E2EDNA !

An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides.

Michael Kilgour, Tao Liu, Ilya S. Dementyev, Lena Simine

mjakilgour gmail com

For original version of E2EDNA: J. Chem. Inf. Model. 2021, 61, 9, 4139–4144 (https://doi.org/10.1021/acs.jcim.1c00696) https://github.com/InfluenceFunctional/E2EDNA

Installation

This installation path has been tested on macOS and it relies on conda and pip package managers.

  1. Download the E2EDNA 2.0 package from this repository.
  2. Register and download NUPACK from http://www.nupack.org/downloads, you will need the path to ~/nupack-###/source/package directory
  3. In the E2EDNA2 directory please modify the macos_installation.sh script: update the path to nupack (see step 2)
  4. From E2EDNA2 folder run macos_installation.sh.
    • Caveat: in case conda activate e2edna command gives an error or if after the script finishes e2edna enviroment has not been activated, please either replace the activation command with with source activate /path-to-env/e2edna
    • OR alternatively copy paste commands from the script without modifications to command line and run one by one, this will go around the unconfigured shell issue.
  5. Register and download MMB from https://simtk.org/projects/rnatoolbox . Place the Installer### folder into the e2edna folder. NB: Do not specify DYLD_LIBRARY_PATH against the recommendations of the MMB installation guide. This is to avoid interference with the OpenMM module.
  6. Update 3 paths in main.py:
 params['workdir'] = '/path-to-e2edna/localruns'                         # working directory   
       
 params['mmb dir'] = '/path-to-e2edna/e2edna/Installer.###/lib'          # path to MMB dylib files
      
 params['mmb']     = '/path-to-e2edna/Installer.###/bin/MMB-executable'  # path to MMB executable    

Running a job

Quickstart

  • Set 'params' in main.py, as indicated in "Installation".
  • Run the bash script automate_tests.sh to test all 8 modes automatically.
  • Alternatively, a single run can be carried out by run_num, mode, aptamer sequence, and ligand's structural file. For example,
python main.py --run_num=1 --mode='free aptamer' --aptamerSeq='TAATGTTAATTG' --ligand='False' --ligandType='' --ligandSeq=''
python main.py --run_num=2 --mode='full dock' --aptamerSeq='TAATGTTAATTG' --ligand='YQTQ.pdb' --ligandType='peptide' --ligandSeq='YQTQTNSPRRAR'
    
# --ligand='False'        # if no ligand. --ligandType and --ligandSeq will be ignored.
# --ligandType='peptide'  # or 'DNA' or 'RNA' or 'other'. Assuming 'other' ligand can be described by Amber14 force field.
# --ligandSeq=''          # if no sequence. For instance, when ligandType is 'other'

Functionality: Eight different modes of operation

E2EDNA 2.0 takes in a DNA aptamer sequence in FASTA format, and optionally a short peptide or other small molecule, and returns details of the aptamer structure and binding behaviour. This code implements several distinct analysis modes so users may customize the level of computational cost and accuracy.

  • 2d structure → returns NUPACK or seqfold analysis of aptamer secondary structure. Very fast, O(<1s). If using NUPACK, includes probability of observing a certain fold and of suboptimal folds within kT of the minimum.
  • 3d coarse → returns MMB fold of the best secondary structure. Fast O(5-30 mins). Results in a strained 3D structure which obeys base pairing rules and certain stacking interactions.
  • 3d smooth → identical to '3d coarse', with a short MD relaxation in solvent. ~Less than double the cost of '3d coarse' depending on relaxation time.
  • coarse dock → uses the 3D structure from '3d coarse' as the initial condition for a LightDock simulation, and returns best docking configurations and scores. Depending on docking parameters, adds O(5-30mins) to '3d coarse'.
  • smooth dock → identical to 'coarse dock', instead using the relaxed structure from '3d smooth'. Similar cost.
  • free aptamer → fold the aptamer in MMB and run extended MD sampling to identify a representative, equilibrated 2D and 3D structure. Slow O(hours).
  • full dock → Return best docking configurations and scores from a LightDock run using the fully-equilibrated aptamer structure 'free aptamer'. Similar cost (LightDock is relatively cheap)
  • full binding → Same steps as 'full docking', with follow-up extended MD simulation of the best binding configuration. Slowest O(hours).

Test run: inputs and outcomes

Running this script automate_tests.sh will automatically run simple very light simulations of all 8 modes. Here we explain what outputs to look for and what success looks like.

  • Mode 1:2d structure Input: fasta sequence, e.g, CGCGCGCGCGCGC

Outputs:

Success evaluation: observe the dot-parenthesis representation for 2d structure, e.g., ..(...)..

  • Mode 2:3d coarse

Input: ‘3d unrefined’, fasta sequence, e.g, CGCGCGCGCGCGC

Outputs:

Success evaluation: Visualize foldedAptamer_0.pdb in VMD or PyMOL

  • Mode 3:3d coarse

Input: fasta sequence, e.g, CGCGCGCGCGCGC

Outputs:

Success evaluation:

  • Mode 4:coarse dock

Input: fasta sequence, e.g, CGCGCGCGCGCGC; PDB of target ligand, e.g., ‘target.pdb’

Outputs:

Success evaluation:

  • Mode 5:smooth dock Input: fasta sequence, e.g, CGCGCGCGCGCGC; PDB of target ligand, e.g., ‘target.pdb’

Outputs:

Success evaluation:

  • Mode 6: free aptamer Given a DNA sequence, its secondary structure will be predicted and represented by a contact map of dot-parenthesis notation. Under the guidance of the predicted secondary structure, the sequence will then be folded into an initial three dimensional conformation. Last step is to run a molecular-dynamics simulation to sample its conformational space and find out the representative conformation from the MD trajectory. (if we ask contact predictor for >1 ssStructure)

Input: fasta sequence, e.g, CGCGCGCGCGCGC

Modifications to the code: set params[‘mode’] = ‘free aptamer’ params['sequence'] =’CGCGCGCGCGCGC’

Outputs: Secondary structure prediction: such as ((....))....((.(...).)).. in “record.txt” MMB folded structure: “foldedAptamer_0.pdb”

MD simulation: Binary trajectory: “clean_foldedAptamer_0_processed_complete_trajectory.dcd” Topology: “clean_foldedAptamer_0_processed.pdb” Representative conformation: “repStructure_0.pdb”

Success evaluation: The DCD trajectory file is generated, and file “log.txt” shows that the MD sampling of free aptamer is 100% complete. Visualize MD trajectory of free aptamer using the topology and the binary trajectory file. Visualize representative conformation of the DNA aptamer.

  • Mode 7: full dock Given a DNA sequence, its secondary structure will be predicted and represented by a contact map of dot-parenthesis notation. Under the guidance of the predicted secondary structure, the sequence will then be folded into an initial three dimensional conformation. Next is to run a molecular-dynamics simulation to sample its conformational space and find out the representative conformation from the MD trajectory. Finally, the representative structure will be docked by a target ligand of interest (its structure must be provided as a PDB file).

Input: fasta sequence, e.g, CGCGCGCGCGCGC; PDB of target ligand, e.g., ‘target.pdb’ Modifications to the code: set params[‘mode’] = ‘full docking’ params['sequence'] =’CGCGCGCGCGCGC’ params[‘target’] = ‘target.pdb’ # need to update the code for this.

Outputs: Secondary structure prediction: such as ((....))....((.(...).)).. in “record.txt” MMB folded structure: “foldedAptamer_0.pdb”

MD simulation: Binary trajectory: “clean_foldedAptamer_0_processed_complete_trajectory.dcd” Topology: “clean_foldedAptamer_0_processed.pdb” Representative conformation: “repStructure_0.pdb” Docking: Aptamer-ligand complex structure: “top_1.pdb”. Docking score is in “record.txt”.

Success evaluation: The DCD trajectory file is generated, and file “log.txt” shows that the MD sampling of free aptamer is 100% complete. Visualize MD trajectory of free aptamer using the topology and the binary trajectory file. Visualize representative conformation of the DNA aptamer. Visualize aptamer-ligand complex structure.

  • Mode 8: full binding Given a DNA sequence, its secondary structure will be predicted and represented by a contact map of dot-parenthesis notation. Under the guidance of the predicted secondary structure, the sequence will then be folded into an initial three dimensional conformation. Next is to run a molecular-dynamics simulation to sample its conformational space and find out the representative conformation from the MD trajectory. The representative structure will be docked by a target ligand of interest (its structure must be provided as a PDB file). Finally, the aptamer-ligand complex molecule will be sampled by MD simulation to investigate its dynamics.

Input: fasta sequence, e.g, CGCGCGCGCGCGC; PDB of target ligand, e.g., ‘target.pdb’ Modifications: set params[‘mode’] = ‘full binding’ params['sequence'] =’CGCGCGCGCGCGC’ params[‘target’] = ‘target.pdb’ # need to update the code for this.

Outputs: Secondary structure prediction: such as ((....))....((.(...).)).. in “record.txt” MMB folded structure: “foldedAptamer_0.pdb” MD simulation of free aptamer: Binary trajectory: “clean_foldedAptamer_0_processed_complete_trajectory.dcd” Topology: “clean_foldedAptamer_0_processed.pdb” Representative conformation: “repStructure_0.pdb” Docking: Aptamer-ligand complex structure: “top_1.pdb”. Docking score is in “record.txt”. MD simulation of aptamer-ligand complex: Binary trajectory: “clean_complex_0_0_processed_trajectory.dcd” Topology: “clean_complex_0_0_processed.pdb”

Success evaluation: File “log.txt” shows that the MD sampling of free aptamer is 100% complete and the DCD trajectory file is generated. Visualize MD trajectory of free aptamer using the topology and the binary trajectory file. Visualize representative conformation of the DNA aptamer. Visualize aptamer-ligand complex structure The DCD trajectory file is generated, and file “log_complex.txt” shows that the MD sampling of aptamer-ligand is 100% complete. Visualize MD trajectory of aptamer-ligand using its binary and topolog file. It is worth noting that the aptamer might seem far apart from the target ligand, which could be a result of the periodic boundary condition. Should we correct it or leave user to do it?

MD simulation might stop at the onset with “Particle coordinate is nan” error. It could be due to the energy minimization being too aggressive so tha the coordinate gets out of boundary, then integrator cannot work on those non-sense coordinate values. In this case, re-run the pipeline.

MMB folding could take a while if multiple refolding takes place for any tricky sequence.

__ work in progress__

Physical Parameters

Default force field is AMBER 14. Other AMBER fields and explicit water models are trivial to implement. Implicit water requires moving to building systems from AMBER prmtop files. CHARMM may also be easily implemented, but hasn't been tested. AMOEBA 2013 parameters do not include nucleic acids, and AMOEBABIO18 parameters are not implemented in OpenMM.

* params['force field'] = 'AMBER'
* params['water model'] = 'tip3p'

Default parameters here - for guidance on adjustments start here.

params['box offset'] = 1.0 # nanometers
params['barostat interval'] = 25
params['friction'] = 1.0 # 1/picosecond
params['nonbonded method'] = PME
params['nonbonded cutoff'] = 1.0 # nanometers
params['ewald error tolerance'] = 5e-4
params['constraints'] = HBonds
params['rigid water'] = True
params['constraint tolerance'] = 1e-6
params['pressure'] = 1 

Increasing hydrogen mass e.g., to 4 AMU enables longer time-steps up to ~3-4 fs. See documentation for details.

params['hydrogen mass'] = 1.0 # in amu

Temperature, pH and ionic strength are taken into account for 2D folding in NUPACK, ion concentration in MD simulation, and protonation of molecules for MD (safest near 7-7.4).

params['temperature'] = 310 # Kelvin - used to predict secondary structure and for MD thermostatting
params['ionic strength'] = .163 # mmol - used to predict secondary structure and add ions to simulation box
params['pH'] = 7.4 # simulation will automatically protonate the peptide up to this pH

The peptide backbone constraint constant is the constant used to constrain backbone dihedrals. A minimum of 10000, as it is currently set, is recommended for good constraints (deviations < 5° were always seen with this value). For more info, please read README_CONSTRAINTS.md.

params['peptide backbone constraint constant'] = 10000

Implicit Solvent

params['implicit solvent'] = True
if params['implicit solvent']:
    params['implicit solvent model'] = OBC1  # only meaningful if implicit solvent is True
    params['leap template'] = 'leap_template.in'
    # TODO add more options to params: implicitSolventSaltConc, soluteDielectric, solventDielectric, implicitSolventKappa

Starting with a folded DNA aptamer structure (instead of just a FASTA sequence)

params['skip MMB'] = True  # it will skip '2d analysis' and 'do MMB'
if params['skip MMB'] is True:
    params['folded initial structure'] = 'foldedSequence_0.pdb'  # if wishing to skip MMB, must provide a folded structure
Comments
  • JOSS Review

    JOSS Review

    Hi all,

    Thanks for the invitation to review and congrats on the submission.

    The general idea behind this submission is sound, and follows-up on a 2021 publication from the same authors on E2EDNA v1.0, published in JCIM. From my understanding, the code is essentially a re-write to use OpenMM instead of Tinker as the MD engine. While this is valuable - makes it simpler to install/run - the authors do not realize, in my opinion, this change to its fullest potential. The authors repository is not so much a "package" in the traditional sense, but more of a collection of scripts that automate a certain rigid protocol. I would rather see for instance, NUPACK being an optional dependency - as a user, I could simply provide my own DNA molecules instead of being forced to use NUPACK. In this sense, I think this repository could use more work to stand out on its own compared to last year's publication.

    In addition to this comments, I have a general comment on the repository itself. The authors should take some time to clean up files that are no longer useful for the protocol or that are simply part of the development workflow. Folders named old, or IDE config folders (.idea) should not be part of a published version of the repository, specially when they are even marked to be ignored in the .gitignore file. Same with the existence of both a requirements.txt file and an environment.yml file, whereas only the latter is used. As such, I believe that the authors should spend some time cleaning up the repository and setting up a more "traditional" structure to help potential users navigate through their code base more easily.

    Further, I have a few starter questions about the manuscript, code and, licenses that I think should be clarified. Hopefully these will help the authors improve their work and repository/code.

    Licenses

    • You're licensing the tool under the Apache license but you are including data (parameter sets) that falls under a difference license. In particular, I see the parameter files for the Amoeba forcefield taken from Tinker/OpenMM almost verbatim. Did you check with the appropriate developers if this sharing of the forcefield parameter files is allowed under their license, without any attribution?

    Installation

    • The installation process is quite complex. As a user, I'd have to register and download NUPACK and MMB, as well as edit a series of files in order to get a functional installation. This is simply a suggestion for the developers to keep in mind.

    • Related to the point above, have the authors considered using conda directly to install their software, instead of a custom shell script? pdbfixer is available as a conda package, and you could specify pip packages there too, e.g. lightdock. The installation could be reduced to a simple: 1) install nupack 2) install mmb 3) run conda env create -f e2edna-env.yml.

    • On this last point, the authors should strip the granular version of the env yaml file otherwise conda will struggle with versions on anything but the authors' hardware.

    • According to the README, the code is only tested on MacOS, although I'd imagine the most use would be on a compute cluster running Linux. Have the authors tried running their code on Linux?

    Misc

    • In several sections of their documentation, the authors mention "OpenDNA". Was this the previous name of this package?
    • It would be greatly beneficial for a user to have config files with installation paths, simulation settings etc, instead of having to edit source code. Would the authors be open to this change?

    Comments on the Manuscript

    • In the "Statement of Need", the authors mention an "all-python" package several times. Being pedantic, this is not entirely true as their code relies on quite some compiled code in their dependencies (lightdock, openmm).
    opened by JoaoRodrigues 12
  • Feature Request: Argument parsing

    Feature Request: Argument parsing

    Hello,

    Would you be interested in more fully utilizing command-line argument parsing (e.g. using argparse)? I always feel a bit uncomfortable having to edit source code to use a program. It would be great if you could set the parameters strictly from the CL at runtime, such as workdir, mmb dir, and mmb, instead of editing main.py which is tracked by git.

    Additionally, using argparse would give the opportunity to provide a very helpful user interface. For instance, the user could run: python main.py --help to get a help message explaining what their options are.

    enhancement 
    opened by schackartk 10
  • 7 feature request argument parsing

    7 feature request argument parsing

    Overview

    This pull request implements argparse so that the user is less likely to need to edit source code in main.py. However, more work will need to be done to include parameters related to environmental condotions like ph, etc.

    Other than implementing argparse, functionality is the same. Some things are still a bit awkward because I didn't want to change too much beyond that.

    Affected files

    The following files have changes:

    • main.py: add shebang line, add argument parsing and validation
    • automate_tests.sh: update arguments to align with argparse
    • README.md: describe current functionality and arguments

    Notes

    main.py

    There were a few things that may need to be changed to work most efficiently and predictably.

    The relationship between --ligand, --ligand_type, and --ligand_seq is a bit complex and can probably be improved. Ideally, I think --ligand would be optional, yielding a default of None. This makes more sense than having to use --ligand False. Then --ligand_type, and --ligand_seq could also be optional with a default of None (instead of an empty string). Only when --ligand is present, you validate the others are there and if not parser.error(). I also think the authors should consider if --ligand_seq is truly required if --l;igand is either 'peptide', 'DNA' or 'RNA'. Currently this is enforced (by parser.error()), but if it is actually optional, that should be updated.

    I left the code that uses different params based on whether it is run as local or cluster, but I am not sure if it is necessary. I especially think that the hard-coded paths used when it is cluster should be removed, and turned into arguments. In which case, it is the same as the usual arguments, and may make --device obsolete if there is no difference between local and cluster.

    I implemented wildcards to help the user find their MMB paths (lib and executable) within the --mmb_dir and --mmb. I am hoping the defaults will make it so users don't have to change this argument.

    I removed the operating system argument and instead used platform to detect it. This new implementation has only been tested on my WSL system, so please check this works. One issue is if the result of platform.system().lower() doesn't match an expected value on mac. Initially mine returned Linux, which is why I ran lower() to make it 'linux' which is compatible with the previous implementation.

    Lots of argument validation now happens in get_args(), so hopefully more helpful error messages are produced.

    I added a feature so that both --aptamer and --ligand_seq can be names of files. In that case, the file contents are read in and used as the sequences. Literal strings can still be used instead of file names.

    Readme.md

    I hope my additions are helpful in describing the current functionality.

    One thing I was uncertain is the description of ligand type saying "(default: Amber14)" I didn't see this anywhere that params were set. It is not the default to any arguments I set up. If this needs to be a default, please take note of this.

    Conclusions

    Currently, all modes in automate_tests.sh run for me, so it seems that these changes are compatible. It would be great to have unit and integration tests with pytest to confirm.

    Please check that it works on MacOS still, as I have only tested on WSL.

    No additional dependencies have been added, only core libraries were used.

    Please feel free to make any changes you see fit or discuss!

    enhancement 
    opened by schackartk 7
  • Question: GPL-3.0 license required for this repo because of lightdock?

    Question: GPL-3.0 license required for this repo because of lightdock?

    Hello @brianjimenez - Hope this message finds you well.

    I am trying to figure out what license is the best choice for our E2EDNA 2.0 software and am aware that LightDock is licensed under GNU GPLv3. According to the license guide website (link) provided by GitHub, the GNU GPLv3 seems to require "larger works using a licensed work" to be under the same license. Currently our E2EDNA 2.0 is under Apache-2 license which does not include the condition of "same license". In my opinion, Apache-2 license could give some flexibility because a future version of the E2EDNA software may provide multiple options of different auto-docker package.

    A little summary of how LightDock is used in E2EDNA 2.0 now: lightdock-0.9.2 is installed by pip and the python scripts such as lightdock3.py are directly called without modification. Does our way of using LightDock fall into the category where we can only choose GNU GPLv3 for our E2EDNA 2.0? I am not sure of this question therefore would like to hear the LightDock developer's opinions.

    Thank you very much!

    question 
    opened by taoliu032 4
  • Lightdock Rust nucleic support

    Lightdock Rust nucleic support

    Dear E2DNA2 developers,

    Since you are using LightDock in some parts of your pipeline, it could be of your interest the 0.2.0 release of the Rust implementation of the framework. This new release adds support for protein-nucleic complex prediction and typically runs 5x-6x faster compared to the Python+C implementations of the Python LightDock flavor, and two orders of magnitude less amount of memory. There is more information on how to compile and use the Rust version here.

    Hope it helps!

    enhancement 
    opened by brianjimenez 3
  • Enhancement: Avoid runtime exception when

    Enhancement: Avoid runtime exception when "run" folder exists

    If the output directory for the current run already exists, right now an exception is produced:

    Start automating tests one by one...
    ====================================
    TESTING MODE #1: '2d structure'
    Traceback (most recent call last):
      File "main.py", line 229, in <module>
        opendna = opendna(params)  # instantiate the class
      File "/home/ken/personal/E2EDNA2/opendna.py", line 53, in __init__
        self.setup()  # if we don't need a workdir & MMB files (eg, give a 3D structure), don't make one.
      File "/home/ken/personal/E2EDNA2/opendna.py", line 147, in setup
        os.mkdir(self.workDir)
    FileExistsError: [Errno 17] File exists: '/home/ken/personal/E2EDNA2/localruns/run1'
    
    END OF TEST #1. Results are saved to folder "run1", where:
            2d structure: in record.txt
    

    An exception could be avoided by validating that the output directory does not exist, and providing a useful message such as "The output directory for this run already exists at './localrun/run1'", and an optional -f/--force flag could be provided to overwrite the output directory.

    opened by schackartk 2
  • Bug: Runtime exception when params['workdir'] does not exist

    Bug: Runtime exception when params['workdir'] does not exist

    When the directory in the variable params['workdir'] does not exist, the program fails at runtime:

    Traceback (most recent call last):
      File "main.py", line 229, in <module>
        opendna = opendna(params)  # instantiate the class
      File "/home/ken/personal/E2EDNA2/opendna.py", line 53, in __init__
        self.setup()  # if we don't need a workdir & MMB files (eg, give a 3D structure), don't make one.
      File "/home/ken/personal/E2EDNA2/opendna.py", line 147, in setup
        os.mkdir(self.workDir)
    FileNotFoundError: [Errno 2] No such file or directory: '/home/ken/personal/E2EDNA2/localruns/run1'
    

    This could be fixed by checking for the directory, and creating it if it does not exist:

    if not os.path.isdir(params['workdir'])
        os.mkdir(params['workdir'])
    
    opened by schackartk 2
  • Error: 'str' object is not callable; in opendna.py, line 535

    Error: 'str' object is not callable; in opendna.py, line 535

    Hello,

    I am excited to try out this tool!

    I have installed all dependencies successfully (I believe), and I am running the script automate_tests.sh. Most tests are passing, but tests 4, 5, 7, and 8 are failing during the docking step with the same exception.

    TESTING MODE #4: 'coarse dock'
    Starting Fresh Run 4
    Simulation mode: coarse dock
    Simulating TAATGTTAATTG with YQTQ.pdb
    Getting Secondary Structure(s)
    Running over 1 possible 2D structures.
    2D structure #0 is                              : .(((....))).
    
    Folding Aptamer from Sequence. Fold speed = quick.
    Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
    2D structure after MMB folding (from MDAnalysis): .(((....))).
    Initial fold fidelity = 1.000
    Initial fold fidelity = 1.000 (from MDAnalysis)
    Folded the aptamer and generated the folded structure: foldedAptamer_0.pdb
    
    No relaxation (smoothing) of the folded aptamer.
    
    Docking
    Traceback (most recent call last):
      File "main.py", line 230, in <module>
        opendnaOutput = opendna.run()  # retrieve binding information (eventually this should become a normalized c-number)    
      File "/home/ken/personal/E2EDNA2/opendna.py", line 297, in run
        outputDict['dock scores {}'.format(self.i)] = self.dock(self.pdbDict['representative aptamer {}'.format(self.i)], self.targetPDB)  # eg, "peptide.pdb" which can be created given peptide sequence by buildPeptide in function dock
      File "/home/ken/personal/E2EDNA2/opendna.py", line 535, in dock
        ld.run()
    TypeError: 'str' object is not callable
    

    I am unsure what the underlying problem is, but maybe it has to do with a mistake between:

    • The instance variable run on line 487 of instances.py: self.run = params['ld run']
    • The method run() on line 504 of instances.py: def run(self):

    Because the instance variable from line 487 is the string value set on line 220 in main.py: params['ld run'] = 'lightdock3.py'. Maybe this variable is somehow shadowing the method run(), and so it is failing to "call" str() (i.e. 'lightdock3.py'())?

    I would appreciate any help with resolving this.

    Thank you!

    bug 
    opened by schackartk 2
  • Bug: Mysterious error when using invalid mode

    Bug: Mysterious error when using invalid mode

    If the mode is misspelled or an invalid choice, an excpetion occurs:

    $ python main.py --run_num=1 --mode='fulldock' --aptamerSeq='GCGCGCGCGATATATAT' --ligand='my_ligand.pdb' --ligandType='other' --ligandSeq=''
    Traceback (most recent call last):
      File "main.py", line 229, in <module>
        opendna = opendna(params)  # instantiate the class
      File "/home/ken/personal/E2EDNA2/opendna.py", line 52, in __init__
        if self.actionDict['make workdir']:
    KeyError: 'make workdir'
    

    The exception doesn't seem to mention the invalid --mode, so the user may be confused as to what happened.

    I have confirmed that this runs fine once the mode name is corrected.

    This issue is resolved in #8 by using argparse and specifying the valid choices. Here is what is displayed from the code in that pull request:

    $ ./main.py -r 1 -m 'fulldock' -a aptamers/my_aptamer.txt -l my_ligand.pdb -t other -f
    usage: main.py [-h] [-f] -r INT -m MODE -a SEQ -l PDB [-t TYPE] [-s SEQ]
                   [-d RUN] [-p DEV] [-w DIR] [-md DIR] [-mb MMB]
    main.py: error: argument -m/--mode: invalid choice: 'fulldock' (choose from '2d structure', '3d coarse', '3d smooth', 'coarse dock', 'smooth dock', 'free aptamer', 'full dock', 'full binding')
    
    opened by schackartk 1
  • Enhancement: More control over output location

    Enhancement: More control over output location

    It seems a bit restrictive to enforce that the output directory be structured as {workdir}/run{runnum}/. Most tools allow you to specify the output directory yourself.

    This could be useful to the user (myself included) for organizing runs, and automating using a workflow manager. For instance, if I am running several combinations of aptamer, ligands, and modes, I may want my output directories to be {aptamer}/{ligand}/{mode}/. This structure is meaningful to me unlike the folder name "run1".

    While this is not resolved in #8 , it would reduce the number of arguments. Instead of having both --workdir and --run_num, you could just have a single --outdir argument.

    enhancement 
    opened by schackartk 1
  • Bug: Ligand file in a folder causes exception

    Bug: Ligand file in a folder causes exception

    If the ligand pdb file is in a folder instead of the root of the repo, an exception occurs:

    $ ls ligands/
    my_ligand.pdb
    
    $ python main.py --run_num=1 --mode='full dock' --aptamerSeq='GCGCGCGCGATATATAT' --ligand='ligands/my_ligand.pdb' --ligandType='other' --ligandSeq=''
    Starting Fresh Run 1
    Traceback (most recent call last):
      File "main.py", line 229, in <module>
        opendna = opendna(params)  # instantiate the class
      File "/home/ken/personal/E2EDNA2/opendna.py", line 53, in __init__
        self.setup()  # if we don't need a workdir & MMB files (eg, give a 3D structure), don't make one.
      File "/home/ken/personal/E2EDNA2/opendna.py", line 179, in setup
        copyfile(self.targetPDB, self.workDir + '/' + self.targetPDB)
      File "/home/ken/personal/E2EDNA2/env/lib/python3.7/shutil.py", line 121, in copyfile
        with open(dst, 'wb') as fdst:
    FileNotFoundError: [Errno 2] No such file or directory: '/home/ken/personal/E2EDNA2/localruns/run1/ligands/my_ligand.pdb'
    

    I don't see any reason that the ligand file should not be in a folder, so this should not fail.

    bug 
    opened by schackartk 2
Releases(v2.0.0)
  • v2.0.0(May 16, 2022)

    This release is associated with the JOSS publication: https://doi.org/10.21105/joss.04182 The release has also been archived on Zenodo: https://doi.org/10.5281/zenodo.6546661

    Clarification: the archive folder will have a name of "E2EDNA2-2.0.0", once downloaded from below. It refers to the version v2.0.0 of E2EDNA. The name "E2EDNA2" is inherited from the repository name.

    To view the repository: https://github.com/siminegroup/E2EDNA2/tree/v2.0.0 Full Changelog: https://github.com/siminegroup/E2EDNA2/commits/v2.0.0

    Source code(tar.gz)
    Source code(zip)
Owner
computational chemistry group at McGill University
Code for CVPR 2018 paper --- Texture Mapping for 3D Reconstruction with RGB-D Sensor

G2LTex This repository contains the implementation of "Texture Mapping for 3D Reconstruction with RGB-D Sensor (CVPR2018)" based on mvs-texturing. Due

Fu Yanping(付燕平) 129 Dec 30, 2022
Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis

Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis Requirements python 3.7 pytorch-gpu 1.7 numpy 1.19.4 pytorch_

12 Oct 29, 2022
Implementation of "Semi-supervised Domain Adaptive Structure Learning"

Semi-supervised Domain Adaptive Structure Learning - ASDA This repo contains the source code and dataset for our ASDA paper. Illustration of the propo

3 Dec 13, 2021
Python implementation of Bayesian optimization over permutation spaces.

Bayesian Optimization over Permutation Spaces This repository contains the source code and the resources related to the paper "Bayesian Optimization o

Aryan Deshwal 9 Dec 23, 2022
DLFlow is a deep learning framework.

DLFlow是一套深度学习pipeline,它结合了Spark的大规模特征处理能力和Tensorflow模型构建能力。利用DLFlow可以快速处理原始特征、训练模型并进行大规模分布式预测,十分适合离线环境下的生产任务。利用DLFlow,用户只需专注于模型开发,而无需关心原始特征处理、pipeline构建、生产部署等工作。

DiDi 152 Oct 27, 2022
Automated Evidence Collection for Fake News Detection

Automated Evidence Collection for Fake News Detection This is the code repo for the Automated Evidence Collection for Fake News Detection paper accept

Mrinal Rawat 2 Apr 12, 2022
Model Zoo for AI Model Efficiency Toolkit

We provide a collection of popular neural network models and compare their floating point and quantized performance.

Qualcomm Innovation Center 137 Jan 03, 2023
Deeper insights into graph convolutional networks for semi-supervised learning

deeper_insights_into_GCNs Deeper insights into graph convolutional networks for semi-supervised learning References data and utils.py come from Implem

Davidham3 17 Dec 16, 2022
CoANet: Connectivity Attention Network for Road Extraction From Satellite Imagery

CoANet: Connectivity Attention Network for Road Extraction From Satellite Imagery This paper (CoANet) has been published in IEEE TIP 2021. This code i

Jie Mei 53 Dec 03, 2022
AI assistant built in python.the features are it can display time,say weather,open-google,youtube,instagram.

AI assistant built in python.the features are it can display time,say weather,open-google,youtube,instagram.

AK-Shanmugananthan 1 Nov 29, 2021
Text Generation by Learning from Demonstrations

Text Generation by Learning from Demonstrations The README was last updated on March 7, 2021. The repo is based on fairseq (v0.9.?). Paper arXiv Prere

38 Oct 21, 2022
A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squares.

W.I.P-Aim-Memory-Game A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squar

dE_soot 1 Dec 08, 2021
This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

shangbuhuan 52 Nov 25, 2022
Facial Expression Detection In The Realtime

The human's facial expressions is very important to detect thier emotions and sentiment. It can be very efficient to use to make our computers make interviews. Furthermore, we have robots now can det

Adel El-Nabarawy 4 Mar 01, 2022
Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

Algo Phantoms 81 Nov 26, 2022
TabNet for fastai

TabNet for fastai This is an adaptation of TabNet (Attention-based network for tabular data) for fastai (=2.0) library. The original paper https://ar

Mikhail Grankin 116 Oct 21, 2022
Unofficial PyTorch implementation of Google AI's VoiceFilter system

VoiceFilter Note from Seung-won (2020.10.25) Hi everyone! It's Seung-won from MINDs Lab, Inc. It's been a long time since I've released this open-sour

MINDs Lab 883 Jan 07, 2023
Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation This paper has been accepted and early accessed

Yun Liu 39 Sep 20, 2022
Predict halo masses from simulations via graph neural networks

HaloGraphNet Predict halo masses from simulations via Graph Neural Networks. Given a dark matter halo and its galaxies, creates a graph with informati

Pablo Villanueva Domingo 20 Nov 15, 2022
Learning based AI for playing multi-round Koi-Koi hanafuda card games. Have fun.

Koi-Koi AI Learning based AI for playing multi-round Koi-Koi hanafuda card games. Platform Python PyTorch PySimpleGUI (for the interface playing vs AI

Sanghai Guan 10 Nov 20, 2022