Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.

Overview

logo


Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.

Please see our release notes for the latest updates to Selene.

Installation

We recommend using Selene with Python 3.6 or above. Package installation should only take a few minutes (less than 10 minutes, typically ~2-3 minutes) with any of these methods (conda, pip, source).

First, install PyTorch. If you have an NVIDIA GPU, install a version of PyTorch that supports it--Selene will run much faster with a discrete GPU. The library is currently compatible with PyTorch versions between 0.4.1 and 1.4.0. We will continue to update Selene to be compatible with the latest version of PyTorch.

Installing selene with Anaconda (for Linux):

conda install -c bioconda selene-sdk

Installing selene with pip:

pip install selene-sdk

Note that we do not recommend pip-installing older versions of Selene (below 0.4.0), as these releases were less stable.

We currently only have a source distribution available for pip-installation.

Installing selene from source:

First, download the latest commits from the source repository (or download the latest tagged version of Selene for a stable release):

git clone https://github.com/FunctionLab/selene.git

The setup.py script requires NumPy, Cython, and setuptools. Please make sure you have these already installed.

If you plan on working in the selene repository directly, we recommend setting up a conda environment using selene-cpu.yml or selene-gpu.yml (if CUDA is enabled on your machine) and activating it. These environment YAML files list specific versions of package dependencies that we have used in the past to test Selene.

Selene contains some Cython files. You can build these by running

python setup.py build_ext --inplace

If you would like to locally install Selene, you can run

python setup.py install

About Selene

Selene is composed of a command-line interface and an API (the selene-sdk Python package). Users supply their data, model architecture, and configuration parameters, and Selene runs the user-specified operations (training, evaluation, prediction) for that sequence-based model.

For a more detailed overview of the components in the Selene software development kit (SDK), please consult the page here.

summary figure

Help

Please post bugs or feature requests to our Github issues.

Join our Google group if you have questions about the package, case studies, or model development.

Documentation

The documentation for Selene is available here. If you are interested in running Selene through the command-line interface (CLI), this document describes how the configuration file format (used by the CLI) works and details all the possible configuration parameters you may need to build your own configuration file.

Examples

We provide 2 sets of examples: Jupyter notebook tutorials and case studies that we've described in our manuscript. The Jupyter notebooks are more accessible in that they can be easily perused and run on a laptop. We also take the opportunity to show how Selene can be used through the CLI (via configuration files) as well as through the API. Finally, the notebooks are particularly useful for demonstrating various visualization components that Selene contains. The API, along with the visualization functions, are much less emphasized in the manuscript's case studies.

In the case studies, we demonstrate more complex use cases (e.g. training on much larger datasets) that we could not present in a Jupyter notebook. Further, we show how you can use the outputs of variant effect prediction in a subsequent statistical analysis (case 3). These examples reflect how we most often use Selene in our own projects, whereas the Jupyter notebooks survey the many different ways and contexts in which we can use Selene.

We recommend that the examples be run on a machine with a CUDA-enabled GPU. All examples take significantly longer when run on a CPU machine. (See the following sections for time estimates.)

Important: The tutorials and manuscript examples were originally run on Selene version 0.1.3---and later with Selene 0.2.0 (PyTorch version 0.4.1). Selene has since been updated and files such as selene-gpu.yml specify PyTorch version 1.0.0. Please note that models created with an older version of PyTorch (such as those downloadable with the manuscript case studies) are NOT compatible with newer versions of PyTorch. If you run into errors loading trained model weights files, it is likely the result of differences in PyTorch or CUDA toolkit versions.

Tutorials

Tutorials for Selene are available here.

It is possible to run the tutorials (Jupyter notebook examples) on a standard CPU machine--you should not expect to fully finish running the training examples unless you can run them for more than 2-3 days, but they can all be run to completion on CPU in a couple of days. You can also change the training parameters (e.g. total number of steps) so that they complete in a much faster amount of time.

The non-training examples (variant effect prediction, in silico mutagenesis) can be run fairly quickly (variant effect prediction might take 20-30 minutes, in silico mutagenesis in 10-15 minutes).

Please see the README in the tutorials directory for links and descriptions to the specific tutorials.

Manuscript case studies

The code to reproduce case studies in the manuscript is available here.

Each case has its own directory and README describing how to run these cases. We recommend consulting the step-by-step breakdown of each case study that we provide in the methods section of the manuscript as well.

The manuscript examples were only tested on GPU. Our GPU (NVIDIA Tesla V100) time estimates:

  • Case study 1 finishes in about 1.5 days on a GPU node.
  • Case study 2 takes 6-7 days to run training (distributed the work across 4 v100s) and evaluation.
  • Case study 3 (variant effect prediction) takes about 1 day to run.

The case studies in the manuscript focus on developing deep learning models for classification tasks. Selene does support training and evaluating sequence-based regression models, and we have provided a tutorial to demonstrate this.

Comments
  • Added in silico mutagenesis heatmap

    Added in silico mutagenesis heatmap

    Added method and unit tests for creating in silico mutagenesis result matrix for one feature as described in #27

    This matrix is then passed to seaborn and plotted as a heatmap.

    opened by evancofer 10
  • Naive question about  evaluate

    Naive question about evaluate

    Hi,

    I want to test the trained model in https://github.com/FunctionLab/selene/blob/master/tutorials/quickstart_training/quickstart_training.ipynb, the used .yml file is shown below:


    ops: [evaluate] model: { # TODO: update this line with the absolute path to the file. path: /root/selene/learn/deeperdeepsea.py, class: DeeperDeepSEA, class_args: { sequence_length: 1000, n_targets: 1, }, non_strand_specific: mean }

    sampler: !obj:selene_sdk.samplers.file_samplers.BedFileSampler { filepath: /root/selene/learn/training_outputs/test_data.bed, reference_sequence: !obj:selene_sdk.sequences.Genome { input_path: /root/selene/learn/male.hg19.fasta }, n_samples: 1200, sequence_length: 1000, targets_avail: True, n_features: 2, }

    evaluate_model: !obj:selene_sdk.EvaluateModel { features: !obj:selene_sdk.utils.load_features_list { input_path: /root/selene/learn/distinct_features.txt }, trained_model_path: /root/selene/learn/training_outputs/best_model.pth.tar, n_test_samples: 1200, batch_size: 64, report_gt_feature_n_positives: 10, use_cuda: True,
    }

    random_seed: 1447 output_dir: /root/selene/learn/training_outputs create_subdirectory: False

    Then I got this error:

    Traceback (most recent call last): File "main.py", line 12, in parse_configs_and_run(configs) File "/root/anaconda3/lib/python3.7/site-packages/selene_sdk/utils/config_utils.py", line 341, in parse_configs_and_run execute(operations, configs, current_run_output_dir) File "/root/anaconda3/lib/python3.7/site-packages/selene_sdk/utils/config_utils.py", line 208, in execute evaluate_model = instantiate(evaluate_model_info) File "/root/anaconda3/lib/python3.7/site-packages/selene_sdk/utils/config.py", line 239, in instantiate return _instantiate_proxy_tuple(proxy, bindings) File "/root/anaconda3/lib/python3.7/site-packages/selene_sdk/utils/config.py", line 144, in _instantiate_proxy_tuple obj = proxy.callable(**kwargs) File "/root/anaconda3/lib/python3.7/site-packages/selene_sdk/evaluate_model.py", line 134, in init if type(self.reference_sequence) == Genome and
    AttributeError: 'EvaluateModel' object has no attribute 'reference_sequence'

    Would you like to help me debug?

    Thank you very much!

    Yours sincerely, Shusen

    opened by biomg 7
  • Can we integrate different histone modifications ChIP-Seq datasets?

    Can we integrate different histone modifications ChIP-Seq datasets?

    Hi, I am new to your tool and after going through your case studies I got a general overview of the framework. I just want to ask can I supply peaks from different histone modifications ChIP-Seq (like H3K27ac, H3K9ac) as features for a particular tissue and then use them to train the model?

    opened by ashishjain1988 6
  • question about prediction on sequences score

    question about prediction on sequences score

    Hi, I have been reading the documentation and I'm still not sure what the output scores for getting predictions from a trained model means. I noticed that the scores are all from 0-1, is it the probability that a TF will bind to an input region and what is this probability based on?

    Another question I have is that if I set my "center_bin_to_predict" to be 200 when training the model, and my "feature_thresholds" is 0.5, do my input TF binding regions have to be at least 200bp long for Selene to classify it as a "binding region"

    thanks!

    Michelle

    opened by ymkng 6
  • Updating the config file inputs to accept dynamically loaded classes within the selene package

    Updating the config file inputs to accept dynamically loaded classes within the selene package

    Currently, I think this issue is only applicable to the sampler module. We may eventually extend the same idea to other modules in selene.

    A user should be able to choose what kind of data sampler they want to use when training their model. Each of these samplers has very different input requirements. For example, an IntervalsSampler specific config file would have a lot more parameters compared to a MatFilesSampler config file. In the YAML file, a user should specify the type of sampler they want to use. The value of type should be the class name of the sampler. The remaining parameters could be specified under a parameter key:

    type: IntervalsSampler
    parameters:
        genome: <path to indexed genome fasta file>
        query_feature_data: <path to tabix-indexed features .bed.gz file>
        ...
    
    opened by kathyxchen 6
  • Fix YAML documentation for `metrics` configuration parameter in `train_model`

    Fix YAML documentation for `metrics` configuration parameter in `train_model`

    Thanks @bmacedo-lgtm for catching this issue:

    it turns out that the YAML parser we are using will only accept a single custom metrics function with the present syntax described in the Selene CLI docs.

    metrics: {
         roc_auc: !import:sklearn.metrics.roc_auc_score,
         average_precision: !import:sklearn.metrics.average_precision_score
     },
    

    that is, the above snippet that is described in our docs will crash because the parser expects every !import:<method/module> tag to come with a dictionary of input arguments, even if the dictionary is empty.

    It seems like Selene will run with no errors if I just modify this to

    metrics: {
         roc_auc: !import:sklearn.metrics.roc_auc_score {},
         average_precision: !import:sklearn.metrics.average_precision_score {}
     },
    

    but more testing is needed to better characterize and handle this bug.

    bug documentation 
    opened by kathyxchen 5
  • selene inaccurate predictions

    selene inaccurate predictions

    I followed the getting started tutorial and trained the GM12878 CTCF binding model. Then I inputted enhancer regions and compared the model's predictions to experimental binding of CTCF in these regions. The model's predictions were vastly different from the experimental CHIP seq binding and I'm wondering why that is?

    best,

    Michelle

    opened by ymkng 5
  • Naive question about tutorial

    Naive question about tutorial

    Hi,

    Thanks for developing this package. Its very useful for people who wants to get started with machine learning in genomics.

    I have a very naive question about Getting Started tutorial. I would like to know whats the specific role of "deepsea_TF_intervals.txt" file in training the model. Are you using the deepsea_TF_intervals file to predict the CTCF binding sites ? or the model is trained on certain number of CTCF peaks and tries to predict the rest ? in this case whats the role of deepsea_TF_intervals ?

    Thanks, Goutham A

    PS: Is there any google groups to ask some methodological or theoretical questions ?

    opened by gouthamatla 5
  • Heatmap visualization of in silico mutagenesis and variant effect prediction for a single genomic feature.

    Heatmap visualization of in silico mutagenesis and variant effect prediction for a single genomic feature.

    We would like to include a module or function that allows users to visualize the effects of single base substitutions. This would be a heatmap of the changes in model prediction for a single genomic feature, where the columns of the heatmap correspond to each base position in the input sequence. The rows of the heatmap are the base substitutions.

    In in silico mutagenesis, the rows would be all 4 bases (with the reference/original base grayed out) because we substitute in all other bases.

    @jzthree I forgot what we decided for visualization of variant effect prediction. Would you be able to provide a description here?

    opened by kathyxchen 5
  • Slowdown of variant effect prediction with `torch.backends.cudnn.deterministic = True`

    Slowdown of variant effect prediction with `torch.backends.cudnn.deterministic = True`

    The recent addition of

    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    

    has led to a 10x slowdown of variant effect prediction (e.g. taking 1000s to process 10k variants when previously it took 100s to process 10k variants) for one of my DL models. I've tested that this is the reason for the slowdown by running the same script commenting in/out these 2 lines.

    I'm wondering if this is because the model was not trained with these parameters, since I would hope this slowdown doesn't happen for models that are trained with deterministic=True? @rfriedman22 would love to get your input. I'm also going to try training some models with/without the random seed to see if I can figure out what's going on and how we can incorporate a workaround.

    opened by kathyxchen 4
  • Allow for assessing performance in multi-class classification setting

    Allow for assessing performance in multi-class classification setting

    Reference Issues/PRs

    No issue to reference.

    What does this implement/fix? Explain your changes.

    In the case of multi-class classification, each prediction task is not entirely separate. A user might want to compute the micro average of the ROC or PR curves, or compute the F1 score (micro or macro averaged). Computing these metrics requires knowledge of all targets and all predictions, at the same time. Currently the implementation does not allow this because it loops over each column of predictions separately, computes performance metrics, and then averages them together at the end.

    I added a condition to the compute_score function that allows for this. In the config file, the user specifies the targets as single values (i.e. a column vector). The NN will output K predictions per example, corresponding to the probability that the example belongs to each of K classes.

    Performance of the original implementation can be rescued by either (1) one-hot encoding the targets or (2) using implementations of performance metrics that use macro averaging, and specifying them in the config file.

    What testing did you do to verify the changes in this PR?

    Ran Selene using data where targets is a column vector of integer values 0,...,K-1 and NN architecture outputs K values for each example, corresponding to probabilities of the example belonging to each of the K classes. Wrote custom wrappers of performance metrics and confirmed that the metrics are only called once per epoch, rather than K times.

    opened by rfriedman22 4
  • print selene_sdk version, add config and model file to output, add ra…

    print selene_sdk version, add config and model file to output, add ra…

    …ndom suffix to output directory name

    Reference Issues/PRs

    What does this implement/fix? Explain your changes.

    What testing did you do to verify the changes in this PR?

    opened by ygliu2016 3
  • Model source code file should be copied to the output directory

    Model source code file should be copied to the output directory

    At present, Selene does not save a copy of python file containing the model source code. This seems like an oversight; the output directory contents alone should be enough to use a model trained with Selene.

    enhancement 
    opened by evancofer 0
  • Loggers never close

    Loggers never close

    I've found that in the case of train_model, none of logging handlers ever close (i.e. selene_sdk.train_model.log, selene_sdk.train_model.train.txt, and selene_sdk.train_model.validation.txt). I assume this happens in evaluate_model too but I haven't looked too closely.

    This creates problems because I frequently write code that involves creating multiple trainers. Since this happens in the same script, each trainer ends up sharing loggers. If trainer 1 is made before trainer 2, then trainer 2 logs to the trainer 1 log as well as the trainer 2 log.

    It would be useful if somewhere within the TrainModel object, the Handlers could be removed from the loggers at some point. To me it seems most useful after calling train_and_validate but it could also be done in an implementation of __del__.

    In a related issue, it would be helpful if I could set a flag telling the logger not to write to stderr. Since the logs are getting written elsewhere anyways, it feels redundant to write the logs to stderr, especially since it gets mixed in with my own custom logging.

    opened by rfriedman22 0
  • Updating seqweaver

    Updating seqweaver

    Reference Issues/PRs

    What does this implement/fix? Explain your changes.

    One new script (selene_sdk/updating_seqweaver.py) implements a class which constructs and trains new data for seqweaver. selene_sdk/targets/genomic_features.py was modified to add strand specificity when parsing the bed file. I had to change the indexing order in order to maintain the following bed file column order: [chr, start, end, strand, target]. selene_sdk/targets/_genomic_features.pyx was modified to fix the indexing after adding the strand column.

    What testing did you do to verify the changes in this PR?

    I tested the update_seqweaver.py script, which uses the other scripts as an API, using the original training dataset for Seqweaver (~218 CLIP-seq datasets for 90 RBPs).

    opened by aviyalitman 0
  • Variant effect prediction REF mismatch

    Variant effect prediction REF mismatch

    Dear Selene/Sei developers, thank you for the colossal work you've undertaken.

    I have a question regarding the variant effect prediction functionality of Sei model. I am using the model to calculate the variant effects using gnomAD, there seem to be many mismatches between the REFs of the gnomAD variants and the GRCh38 assembly fasta file. So, my question is, in these cases, does Selene use the REF of the given VCF file or does it use the corresponding NT from the GRCh38 fasta file?

    opened by okurman 1
Releases(0.5.0)
  • 0.5.0(Jun 8, 2021)

    Version 0.5.0

    New functionality

    • sampler.MultiSampler: MultiSampler accepts any Selene sampler for each of the train, validation, and test partitions where previously MultiFileSampler only accepted FileSamplers. We will deprecate MultiFileSampler in our next major release.
    • DataLoader: Parallel data loading based on PyTorch's DataLoader class, which can be used with Selene's MultiSampler and MultiFileSampler class. (see: sampler.SamplerDataLoader, sampler.H5DataLoader)
    • To support parallelism via multiprocessing, the sampler that SamplerDataLoader used needs to be picklable. To enable this, opening file operations are delayed to when any method that needs the file is called. There is no change to the API and setting init_unpicklable=True in __init__ for Genome and all OnlineSampler classes will fully reproduce the functionality in selene_sdk<=0.4.8.
    • sampler.RandomPositionsSampler: added support for center_bin_to_predict taking in a list/tuple of two integers to specify the region from which to query the targets---that is, center_bin_to_predict by default (center_bin_to_predict=<int>) queries targets based on the center bin size, but can be specified as start and end integers that are not at the center if desired.
    • EvaluateModel: accepts a list of metrics (by default computing ROC AUC and average precision) with which to evaluate the test dataset.

    Usage

    • Command-line interface (CLI): You can now run the CLI directly with python -m selene_sdk (if you have cloned the repository, make sure you have locally installed selene_sdk via python setup.py install, or selene_sdk is in the same directory as your script / added to PYTHONPATH). Developers can make a copy of the selene_sdk/cli.py script and use it the same way that selene_cli.py was used in earlier versions of Selene (python -u cli.py <config-yml> [--lr])

    Bug fixes

    • EvaluateModel: use_features_ord allows you to evaluate a trained model on only a subset of chromatin features (targets) predicted by the model. If you are using a FileSampler for your test dataset, you now have the option to pass in a subsetted matrix; however, this matrix must be ordered the same way as features (the original targets prediction ordering) and not in the same ordering as use_features_ord. However, the final model predictions and targets (test_predictions.npz and test_targets.npz) will be outputted according to the use_features_ord list and ordering.
    • MatFileSampler: Previously the MatFileSampler reset the pointer to the start of the matrix too early (going back to the first sample before we had finished sampling the whole matrix).
    • CLI learning rate: Edge cases (e.g. not specifying the learning rate via CLI or config) previously were not handled correctly and did not throw an informative error.
    Source code(tar.gz)
    Source code(zip)
  • 0.4.8(May 10, 2020)

    Enhancements

    • PyTorch now has flexible state dict loading, which allows users more flexibility in loading models that were trained with older/newer versions of PyTorch. Selene has been updated to use this parameter.
    • Added HeartENN model architecture ahead of publication.
    Source code(tar.gz)
    Source code(zip)
  • 0.4.7(Apr 28, 2020)

  • 0.4.6(Apr 24, 2020)

    Updates

    • Allow users to pass in individual sequences to get_predictions in AnalyzeSequences class and get the model prediction directly (as opposed to having it be written to an output file).
    Source code(tar.gz)
    Source code(zip)
  • 0.4.5(Feb 25, 2020)

    Updates

    • Specify upper & lower bounds for Selene's torch dependency
    • Add '.' as a valid delimiter for VCF multiallelic parsing
    • Allow users to evaluate on subsets of features in EvaluateModel

    Bugfixes:

    • BASES_ARR type consistency (specify as a list only) and resetting for lua-trained model vs. Selene-trained model.
    Source code(tar.gz)
    Source code(zip)
  • 0.4.4(Dec 18, 2019)

    Updates

    • Refactored variant effect prediction to simplify the code
    • Removed contains_unk column from output of get_predictions_from_fasta in AnalyzeSequences class

    Bugfixes

    • Fixed variant effect prediction handling for odd-length sequences
    Source code(tar.gz)
    Source code(zip)
  • 0.4.3(Nov 11, 2019)

    Updates:

    • Add a column contains_unk to BED/VCF predictions. This boolean column indicates whether a sequence contains any unknown bases.

    Bugfixes:

    • MultiModelWrapper can be used with CUDA.
    Source code(tar.gz)
    Source code(zip)
  • 0.4.2(Sep 23, 2019)

    Updates:

    • MultiModelWrapper for model evaluation

    Bugfixes:

    • Type check for GenomicFeatures is less strict (can accept int if threshold is 1) (#106)
    • Syntax error in EvaluateModel (#110)
    • RandomPositionsSampler sampling bounds (#114)
    • LR scheduler correctly tracks min loss now (#115)
    • Get predictions for BED file - fix edge case of single-entry BEd file (#118)
    Source code(tar.gz)
    Source code(zip)
  • 0.4.1(Jul 30, 2019)

  • 0.4.0(Jul 30, 2019)

    Updates:

    • Variant effect prediction: adjustments made to variant centering and strand-specific sequence handling so that the sequence context fetched for a variant matches the implementation for code associated with DeepSEA and SeqWeaver (https://hb.flatironinstitute.org/asdbrowser/help, https://github.com/FunctionLab/expecto)
    • Predicting on sequences accepts BED file as input
    • Add compatibility with Lua-trained DeepSEA and SeqWeaver models (converted to PyTorch) - models themselves will be officially released through the ASD browser on HumanBase in the coming weeks.
    • Simplified the prediction handlers output for variant effect prediction - sequences where the reference allele doesn't match the reference genome are no longer diverted to a new file. Rather, a column has been added ref_match that denotes whether the allele matches or not.

    Bug fixes:

    • Predicting on sequences: previously did not output anything if N < batch size
    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(Mar 15, 2019)

    Selene version 0.3.0. Tested previously as a pre-release.

    The updates to 0.3.0:

    • Saving outputs for variant effect prediction to HDF5 or TSV files (used to only be TSV).
    • Allowing users to set a write memory limit for how many predictions to store (for prediction, in silico mutagenesis, variant effect prediction) before writing them to a file.
    • Major refactor for the predict module
    • Updating variant effect prediction sequence creation so that it matches how the sequences are created in ExPecto (that is, how the variant is centered in an N bp sequence).

    Bug fix:

    • Loading model checkpoint in the TrainModel class.
    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(Dec 13, 2018)

    Bug fixes

    • max_steps typo in TrainModel (can now continue training from a model checkpoint)
    • API ordering mismatch for get_data_and_targets between online samplers and file samplers (now can run `EvaluateModel on both kinds of samplers)

    Enhancements

    • Significant improvements to the CLI/config file documentation: https://selene.flatironinstitute.org/overview/cli.html
    • Allow callback handlers so that users can specify different kinds of metrics for training
    • Support for training regression models using MatFileSampler: https://github.com/FunctionLab/selene/blob/master/tutorials/regression_mpra_example/regression_mpra_example.ipynb
    • Allow saving new checkpoints after a certain number of steps in training (as opposed to overwriting the same one)
    • Improved standard output logging for training
    • Updated MatFileSampler so it no longer loads all data directly into memory if using an HDF5 file
    • Allow users to have the option of loading the test set at the start of training, or (default) waiting until evaluation starts (if ops: [train, evaluate]).
    Source code(tar.gz)
    Source code(zip)
  • 0.1.3(Oct 4, 2018)

    IMPORTANT: For a manuscript submission, I have updated this tag with commits containing ONLY changes to some examples and READMEs. We will avoid making further forced updates to tags from now on (and forced updates will never happen if it is related to package code).

    Source code(tar.gz)
    Source code(zip)
  • 0.1.2(Sep 25, 2018)

    The previous release of Selene had a bug where the tabix-indexed blacklist files could not be loaded for selene_sdk.sequences.Genome classes. This release should resolve that issue.

    Source code(tar.gz)
    Source code(zip)
  • 0.1.0(Sep 7, 2018)

    In addition to adding a sampler that loads in .mat or .bed files for sampling in training/testing/validation modes (MultiFileSampler), we also have updated selene.sequences.Genome to include an input of blacklist_regions in its constructor. This allows users to specify whether certain regions of the genome should be ignored entirely (e.g. never get sampled when using an online sampler).

    Source code(tar.gz)
    Source code(zip)
  • 0.0.1(Aug 6, 2018)

  • 0.0.0(Aug 6, 2018)

    This release contains basic functionality to train, evaluate, and apply common sequence-level models. We used DeepSEA and use cases that build off of that model to determine what we should include in the first release. Please consult the tutorials for more information.

    Source code(tar.gz)
    Source code(zip)
Owner
Troyanskaya Laboratory
Troyanskaya Laboratory
Machine learning notebooks in different subjects optimized to run in google collaboratory

Notebooks Name Description Category Link Training pix2pix This notebook shows a simple pipeline for training pix2pix on a simple dataset. Most of the

Zaid Alyafeai 363 Dec 06, 2022
The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".

Code for "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval" (ACL 2021, Long) This is the repository for baseline m

Akari Asai 25 Oct 30, 2022
LeafSnap replicated using deep neural networks to test accuracy compared to traditional computer vision methods.

Deep-Leafsnap Convolutional Neural Networks have become largely popular in image tasks such as image classification recently largely due to to Krizhev

Sujith Vishwajith 48 Nov 27, 2022
Dark Finix: All in one hacking framework with almost 100 tools

Dark Finix - Hacking Framework. Dark Finix is a all in one hacking framework wit

Md. Nur habib 2 Feb 18, 2022
Generative Adversarial Text to Image Synthesis

Text To Image Synthesis This is a tensorflow implementation of synthesizing images. The images are synthesized using the GAN-CLS Algorithm from the pa

Hao 575 Jan 08, 2023
[CVPR 2020] 3D Photography using Context-aware Layered Depth Inpainting

[CVPR 2020] 3D Photography using Context-aware Layered Depth Inpainting [Paper] [Project Website] [Google Colab] We propose a method for converting a

Virginia Tech Vision and Learning Lab 6.2k Jan 01, 2023
Unofficial Implementation of RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series (AAAI 2019)

RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series (AAAI 2019) This repository contains python (3.5.2) implementation of

Doyup Lee 222 Dec 21, 2022
Turning SymPy expressions into JAX functions

sympy2jax Turn SymPy expressions into parametrized, differentiable, vectorizable, JAX functions. All SymPy floats become trainable input parameters. S

Miles Cranmer 38 Dec 11, 2022
Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

Non-Metric Space Library (NMSLIB) Important Notes NMSLIB is generic but fast, see the results of ANN benchmarks. A standalone implementation of our fa

2.9k Jan 04, 2023
[SIGIR22] Official PyTorch implementation for "CORE: Simple and Effective Session-based Recommendation within Consistent Representation Space".

CORE This is the official PyTorch implementation for the paper: Yupeng Hou, Binbin Hu, Zhiqiang Zhang, Wayne Xin Zhao. CORE: Simple and Effective Sess

RUCAIBox 26 Dec 19, 2022
The source code of "SIDE: Center-based Stereo 3D Detector with Structure-aware Instance Depth Estimation", accepted to WACV 2022.

SIDE: Center-based Stereo 3D Detector with Structure-aware Instance Depth Estimation The source code of our work "SIDE: Center-based Stereo 3D Detecto

10 Dec 18, 2022
Python project to take sound as input and output as RGB + Brightness values suitable for DMX

sound-to-light Python project to take sound as input and output as RGB + Brightness values suitable for DMX Current goals: Get one pixel working: Vary

Bobby Cox 1 Nov 17, 2021
Improved Fitness Optimization Landscapes for Sequence Design

ReLSO Improved Fitness Optimization Landscapes for Sequence Design Description Citation How to run Training models Original data source Description In

Krishnaswamy Lab 44 Dec 20, 2022
Continual Learning of Electronic Health Records (EHR).

Continual Learning of Longitudinal Health Records Repo for reproducing the experiments in Continual Learning of Longitudinal Health Records (2021). Re

Jacob 7 Oct 21, 2022
Code for Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing(ICCV21)

NeuralGIF Code for Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing(ICCV21) We present Neural Generalized Implicit F

Garvita Tiwari 104 Nov 18, 2022
This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described in the paper.

Data Efficient Language-Supervised Zero-Shot Recognition with Optimal Transport Distillation This repository contains PyTorch evaluation code, trainin

Meta Research 45 Dec 20, 2022
ICLR21 Tent: Fully Test-Time Adaptation by Entropy Minimization

⛺️ Tent: Fully Test-Time Adaptation by Entropy Minimization This is the official project repository for Tent: Fully-Test Time Adaptation by Entropy Mi

Dequan Wang 204 Dec 25, 2022
True Few-Shot Learning with Language Models

This codebase supports using language models (LMs) for true few-shot learning: learning to perform a task using a limited number of examples from a single task distribution.

Ethan Perez 124 Jan 04, 2023
This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization This codebase is the official implementation of Test-Time Classifier A

47 Dec 28, 2022
A tutorial on training a DarkNet YOLOv4 model for the CrowdHuman dataset

YOLOv4 CrowdHuman Tutorial This is a tutorial demonstrating how to train a YOLOv4 people detector using Darknet and the CrowdHuman dataset. Table of c

JK Jung 118 Nov 10, 2022