Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Overview

Pyserini

Generic badge Maven Central PyPI PyPI Download Stats LICENSE

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations. Retrieval using sparse representations is provided via integration with our group's Anserini IR toolkit, which is built on Lucene. Retrieval using dense representations is provided via integration with Facebook's Faiss library.

Pyserini is primarily designed to provide effective, reproducible, and easy-to-use first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections

With Pyserini, it's easy to reproduce runs on a number of standard IR test collections! A low-effort way to try things out is to look at our online notebooks, which will allow you to get started with just a few clicks.

Package Installation

Install via PyPI (requires Python 3.6+):

pip install pyserini

Sparse retrieval depends on Anserini, which is itself built on Lucene, and thus Java 11.

Dense retrieval depends on neural networks and requires a more complex set of dependencies. A pip installation will automatically pull in the 🤗 Transformers library to satisfy the package requirements. Pyserini also depends on PyTorch and Faiss, but since these packages may require platform-specific custom configuration, they are not explicitly listed in the package requirements. We leave the installation of these packages to you.

The software ecosystem is rapidly evolving and a potential source of frustration is incompatibility among different versions of underlying dependencies. We provide additional detailed installation instructions here.

Development Installation

If you're planning on just using Pyserini, then the pip instructions above are fine. However, if you're planning on contributing to the codebase or want to work with the latest not-yet-released features, you'll need a development installation. For this, clone our repo with the --recurse-submodules option to make sure the tools/ submodule also gets cloned.

The tools/ directory, which contains evaluation tools and scripts, is actually this repo, integrated as a Git submodule (so that it can be shared across related projects). Build as follows (you might get warnings, but okay to ignore):

cd tools/eval && tar xvfz trec_eval.9.0.4.tar.gz && cd trec_eval.9.0.4 && make && cd ../../..
cd tools/eval/ndeval && make && cd ../../..

Next, you'll need to clone and build Anserini. It makes sense to put both pyserini/ and anserini/ in a common folder. After you've successfully built Anserini, copy the fatjar, which will be target/anserini-X.Y.Z-SNAPSHOT-fatjar.jar into pyserini/resources/jars/. As with the pip installation, a potential source of frustration is incompatibility among different versions of underlying dependencies. For these and other issues, we provide additional detailed installation instructions here.

You can confirm everything is working by running the unit tests:

python -m unittest

Assuming all tests pass, you should be ready to go!

Quick Links

How do I search?

Pyserini supports sparse retrieval (e.g., BM25 ranking using bag-of-words representations), dense retrieval (e.g., nearest-neighbor search on transformer-encoded representations), as well hybrid retrieval that integrates both approaches.

Sparse Retrieval

The SimpleSearcher class provides the entry point for sparse retrieval using bag-of-words representations. Anserini supports a number of pre-built indexes for common collections that it'll automatically download for you and store in ~/.cache/pyserini/indexes/. Here's how to use a pre-built index for the MS MARCO passage ranking task and issue a query interactively:

from pyserini.search import SimpleSearcher

searcher = SimpleSearcher.from_prebuilt_index('msmarco-passage')
hits = searcher.search('what is a lobster roll?')

for i in range(0, 10):
    print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')

The results should be as follows:

 1 7157707 11.00830
 2 6034357 10.94310
 3 5837606 10.81740
 4 7157715 10.59820
 5 6034350 10.48360
 6 2900045 10.31190
 7 7157713 10.12300
 8 1584344 10.05290
 9 533614  9.96350
10 6234461 9.92200

To further examine the results:

# Grab the raw text:
hits[0].raw

# Grab the raw Lucene Document:
hits[0].lucene_document

Pre-built indexes are hosted on University of Waterloo servers. The following method will list available pre-built indexes:

SimpleSearcher.list_prebuilt_indexes()

A description of what's available can be found here. Alternatively, see this answer for how to download an index manually.

Dense Retrieval

The SimpleDenseSearcher class provides the entry point for dense retrieval, and its usage is quite similar to SimpleSearcher. The only additional thing we need to specify for dense retrieval is the query encoder.

from pyserini.dsearch import SimpleDenseSearcher, TctColBertQueryEncoder

encoder = TctColBertQueryEncoder('castorini/tct_colbert-msmarco')
searcher = SimpleDenseSearcher.from_prebuilt_index(
    'msmarco-passage-tct_colbert-hnsw',
    encoder
)
hits = searcher.search('what is a lobster roll')

for i in range(0, 10):
    print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')

If you encounter an error (on macOS), you'll need the following:

import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

The results should be as follows:

 1 7157710 70.53742
 2 7157715 70.50040
 3 7157707 70.13804
 4 6034350 69.93666
 5 6321969 69.62683
 6 4112862 69.34587
 7 5515474 69.21354
 8 7157708 69.08416
 9 6321974 69.06841
10 2920399 69.01737

Hybrid Sparse-Dense Retrieval

The HybridSearcher class provides the entry point to perform hybrid sparse-dense retrieval:

from pyserini.search import SimpleSearcher
from pyserini.dsearch import SimpleDenseSearcher, TctColBertQueryEncoder
from pyserini.hsearch import HybridSearcher

ssearcher = SimpleSearcher.from_prebuilt_index('msmarco-passage')
encoder = TctColBertQueryEncoder('castorini/tct_colbert-msmarco')
dsearcher = SimpleDenseSearcher.from_prebuilt_index(
    'msmarco-passage-tct_colbert-hnsw',
    encoder
)
hsearcher = HybridSearcher(dsearcher, ssearcher)
hits = hsearcher.search('what is a lobster roll')

for i in range(0, 10):
    print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')

The results should be as follows:

 1 7157715 71.56022
 2 7157710 71.52962
 3 7157707 71.23887
 4 6034350 70.98502
 5 6321969 70.61903
 6 4112862 70.33807
 7 5515474 70.20574
 8 6034357 70.11168
 9 5837606 70.09911
10 7157708 70.07636

In general, hybrid retrieval will be more effective than dense retrieval, which will be more effective than sparse retrieval.

How do I fetch a document?

Another commonly used feature in Pyserini is to fetch a document (i.e., its text) given its docid. This is easy to do:

from pyserini.search import SimpleSearcher

searcher = SimpleSearcher.from_prebuilt_index('msmarco-passage')
doc = searcher.doc('7157715')

From doc, you can access its contents as well as its raw representation. The contents hold the representation of what's actually indexed; the raw representation is usually the original "raw document". A simple example can illustrate this distinction: for an article from CORD-19, raw holds the complete JSON of the article, which obviously includes the article contents, but has metadata and other information as well. The contents contain extracts from the article that's actually indexed (for example, the title and abstract). In most cases, contents can be deterministically reconstructed from raw. When building the index, we specify flags to store contents and/or raw; it is rarely the case that we store both, since that would be a waste of space. In the case of the pre-built msmacro-passage index, we only store raw. Thus:

# Document contents: what's actually indexed.
# Note, this is not stored in the pre-built msmacro-passage index.
doc.contents()
                                                                                                   
# Raw document
doc.raw()

As you'd expected, doc.id() returns the docid, which is 7157715 in this case. Finally, doc.lucene_document() returns the underlying Lucene Document (i.e., a Java object). With that, you get direct access to the complete Lucene API for manipulating documents.

Since each text in the MS MARCO passage corpus is a JSON object, we can read the document into Python and manipulate:

import json
json_doc = json.loads(doc.raw())

json_doc['contents']
# 'contents' of the document:
# A Lobster Roll is a bread roll filled with bite-sized chunks of lobster meat...

Every document has a docid, of type string, assigned by the collection it is part of. In addition, Lucene assigns each document a unique internal id (confusingly, Lucene also calls this the docid), which is an integer numbered sequentially starting from zero to one less than the number of documents in the index. This can be a source of confusion but the meaning is usually clear from context. Where there may be ambiguity, we refer to the external collection docid and Lucene's internal docid to be explicit. Programmatically, the two are distinguished by type: the first is a string and the second is an integer.

As an important side note, Lucene's internal docids are not stable across different index instances. That is, in two different index instances of the same collection, Lucene is likely to have assigned different internal docids for the same document. This is because the internal docids are assigned based on document ingestion order; this will vary due to thread interleaving during indexing (which is usually performed on multiple threads).

The doc method in searcher takes either a string (interpreted as an external collection docid) or an integer (interpreted as Lucene's internal docid) and returns the corresponding document. Thus, a simple way to iterate through all documents in the collection (and for example, print out its external collection docid) is as follows:

for i in range(searcher.num_docs):
    print(searcher.doc(i).docid())

How do I index and search my own documents?

To build sparse (i.e., Lucene inverted indexes) on your own document collections, following the instructions below. To build dense indexes (e.g., the output of transformer encoders) on your own document collections, see instructions here. The following covers English documents; if you want to index and search multilingual documents, check out this answer.

Pyserini (via Anserini) provides ingestors for document collections in many different formats. The simplest, however, is the following JSON format:

{
  "id": "doc1",
  "contents": "this is the contents."
}

A document is simply comprised of two fields, a docid and contents. Pyserini accepts collections comprised of these documents organized in three different ways:

  • Folder with each JSON in its own file, like this.
  • Folder with files, each of which contains an array of JSON documents, like this.
  • Folder with files, each of which contains a JSON on an individual line, like this (often called JSONL format).

So, the quickest way to get started is to write a script that converts your documents into the above format. Then, you can invoke the indexer (here, we're indexing JSONL, but any of the other formats work as well):

python -m pyserini.index -collection JsonCollection \
                         -generator DefaultLuceneDocumentGenerator \
                         -threads 1 \
                         -input integrations/resources/sample_collection_jsonl \
                         -index indexes/sample_collection_jsonl \
                         -storePositions -storeDocvectors -storeRaw

Three options control the type of index that is built:

  • -storePositions: builds a standard positional index
  • -storeDocvectors: stores doc vectors (required for relevance feedback)
  • -storeRaw: stores raw documents

If you don't specify any of the three options above, Pyserini builds an index that only stores term frequencies. This is sufficient for simple "bag of words" querying (and yields the smallest index size).

Once indexing is done, you can use SimpleSearcher to search the index:

from pyserini.search import SimpleSearcher

searcher = SimpleSearcher('indexes/sample_collection_jsonl')
hits = searcher.search('document')

for i in range(len(hits)):
    print(f'{i+1:2} {hits[i].docid:4} {hits[i].score:.5f}')

You should get something like the following:

 1 doc2 0.25620
 2 doc3 0.23140

If you want to perform a batch retrieval run (e.g., directly from the command line), organize all your queries in a tsv file, like here. The format is simple: the first field is a query id, and the second field is the query itself. Note that the file extension must end in .tsv so that Pyserini knows what format the queries are in.

Then, you can run:

$ python -m pyserini.search --topics integrations/resources/sample_queries.tsv \
                            --index indexes/sample_collection_jsonl \
                            --output run.sample.txt \
                            --bm25

$ cat run.sample.txt 
1 Q0 doc2 1 0.256200 Anserini
1 Q0 doc3 2 0.231400 Anserini
2 Q0 doc1 1 0.534600 Anserini
3 Q0 doc1 1 0.256200 Anserini
3 Q0 doc2 2 0.256199 Anserini
4 Q0 doc3 1 0.483000 Anserini

Note that output run file is in standard TREC format.

You can also add extra fields in your documents when needed, e.g. text features. For example, the SpaCy Named Entity Recognition (NER) result of contents could be stored as an additional field NER.

{
  "id": "doc1",
  "contents": "The Manhattan Project and its atomic bomb helped bring an end to World War II. Its legacy of peaceful uses of atomic energy continues to have an impact on history and science.",
  "NER": {
            "ORG": ["The Manhattan Project"],
            "MONEY": ["World War II"]
         }
}

Reproduction Guides

With Pyserini, it's easy to reproduce runs on a number of standard IR test collections!

Sparse Retrieval

Dense Retrieval

Baselines

Pyserini provides baselines for a number of datasets.

Additional Documentation

Known Issues

Anserini is designed to work with JDK 11. There was a JRE path change above JDK 9 that breaks pyjnius 1.2.0, as documented in this issue, also reported in Anserini here and here. This issue was fixed with pyjnius 1.2.1 (released December 2019). The previous error was documented in this notebook and this notebook documents the fix.

Release History

With v0.11.0.0 and before, Pyserini versions adopted the convention of X.Y.Z.W, where X.Y.Z tracks the version of Anserini, and W is used to distinguish different releases on the Python end. Starting with Anserini v0.12.0, Anserini and Pyserini versions have become decoupled.

Comments
  • Dense search replication, starting from hgf model

    Dense search replication, starting from hgf model

    Here's I think our end target: start with hgf model from model hub - assume that's fix.

    1. Be able to encode corpus and queries - scripts for doing so should be in https://github.com/castorini/pyserini/tree/master/scripts
    2. Scripts for building hnsw index, also in scripts/
    3. (1) and (2) are what we store as "pre-built".

    This will allow replication and bring every part of the pipeline in sync - other than training the encoder model.

    @MXueguang @justram @jacklin64 thoughts?

    opened by lintool 18
  • Multiple language support?

    Multiple language support?

    Hi,

    Does pyserini currently support languages other than language? Specifically, I am asking about using features such as creating an index by python -m pyserini.index -collection JsonCollection -generator DefaultLuceneDocumentGenerator ... and using searcher.search. If yes, how do I integrate it in python script?

    Thank you!

    opened by velocityCavalry 16
  • SimpleSearcher.search memory leak

    SimpleSearcher.search memory leak

    When calling search method of SimpleSearcher I noticed RAM usage increase with every new iteration. Could you tell me please how to decrease memory leak?

    opened by dmitrijeuseew 16
  • Fold qrels into pyserini directly

    Fold qrels into pyserini directly

    Follow up to #310 - there, we folded the eval scripts directly into pyserini. Now let's do the same with the qrels.

    In actuality, the qrels are already in the anserini jar, since this entire directory is included in the fatjar: https://github.com/castorini/anserini/tree/master/src/main/resources/topics-and-qrels

    Trick is how to get the qrels out...

    This is, in fact, how we can access the topics in anserini: https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/search/topicreader/Topics.java#L22 https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/search/topicreader/TopicReader.java#L143

    And pyserini just wraps the Java methods above.


    With that background, I propose to apply the same treatment to qrels.

    1. Add a method in Anserini (on the Java end) to read qrels from resources/topics-and-qrels/ into a String. We can use the same "ids" as the topics. Build around here: https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/util/Qrels.java
    2. On the Python end, we call the Java method, which reads the qrels as a string. Then we write back the string into ~/.cache/pyserini.
    3. Our eval scripts can then reference ~/.cache/pyserini.

    And at the end of the day, we'll be able to do this directly:

    $ python -m pyserini.search --topics robust04 --index robust04 --output run.robust04.txt --bm25
    $ python -m pyserini.eval.trec_eval --qrels robust04 -m map -m P.30 run.robust04.txt
    

    (With no need to download any intermediate data... everything is self contained!)

    @MXueguang thoughts? Do you like it? Any better way?

    opened by lintool 16
  • Add automate downloading of indexes

    Add automate downloading of indexes

    Currently, this change supports 'ms-marco-passage', 'ms-marco-doc' and 'TREC Disks 4 & 5'.

    • If the index exists, skip the download and use the index under '(pyserini)/indexes'.
    • If not, download the index to cache(~/.cache/pyserini/indexes) and extract the index to (pyserini)/indexes. Finally, delete the gz file in cache. Should we keep the gz file in cache?
    opened by qguo96 16
  • Resolve tiny differences between Anserini and Pyserini on MS MARCO: query iteration order

    Resolve tiny differences between Anserini and Pyserini on MS MARCO: query iteration order

    If we look at the Python replications: https://github.com/castorini/pyserini/blob/master/docs/pypi-replication.md Compared against Anserini replications: e.g., https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-doc-leaderboard.md

    We'll note tiny differences - e.g., for MS MARCO doc, baselines - pyserini:

    #####################
    MRR @100: 0.2770296928568709
    QueriesRanked: 5193
    #####################
    

    Compared to anserini:

    #####################
    MRR @100: 0.2770296928568702
    QueriesRanked: 5193
    #####################
    

    Previously, we tracked it down issue #257

    I'd like to fix it so get identical results moving forward - my proposed fix is a bit janky, but it'll work: Let's just store, in Python code, an array of integers corresponding to ids of the queries in the original queries file. When we're iterating over the dataset in pyserini.search, we just follow the order of the integers.

    Slightly better, we introduce a new query iterator abstraction and hide this implementation detail in there. So the query iterator would take in the current dictionary, and an optional array holding the iteration order.

    Thoughts @MXueguang? I was thinking you could work on this?

    opened by lintool 15
  • DPR replication docs

    DPR replication docs

    Hi @MXueguang - when everything is implemented DPR should probably get it's own separate replication page, like for MS MARCO: https://github.com/castorini/pyserini/blob/master/docs/experiments-msmarco-passage.md

    Containing both spare, hybrid, and dense retrieval.

    Then we can add a replication log also - starting point for people interested in working more on it.

    opened by lintool 14
  • Incorrect encoding on Windows

    Incorrect encoding on Windows

    When using pyserini under Windows, it seems that the encoding of strings is breaking when passed to the JNI via the pyjnius package.

    It happens when a string is encoded as UTF-8 like this JString(my_str.encode('utf-8')) (e.g., https://github.com/castorini/pyserini/blob/master/pyserini/search/_searcher.py#L114). It only occurs under Windows as it must collide with the default Windows encoding CP-1252.

    I discussed this issue with the maintainers of pyjnius and it seems that to make it work independently from the platform, the .encode('utf-8') could simply be dropped.

    Was there a reason why this manual encoding was used in pyserini?

    I created a branch with the changes, I could do a PR if you wish.

    opened by stekiri 13
  • Dense retrieval draft

    Dense retrieval draft

    An example of usage, since dense index doesn't contains raw data, I loaded the corpus separately.

    import numpy as np
    from pyserini.search import SimpleDenseSearcher
    
    searcher = SimpleDenseSearcher.from_prebuilt_index('msmarco_passage_0', 'collection.tsv')
    
    query_emb = np.random.random(768).astype('float32')
    result = searcher.search(query_emb)
    
    result[0].raw
    >> 'Lander, WY Sales Tax Rate. The current total local sales tax rate in Lander, WY is 5.000%. The December 2015 total local sales tax rate was also 5.000%. Lander, WY is in Fremont County. Lander is in the following zip codes: 82520.'
    
    result[0].docid
    >> '350921'
    
    result[0].score
    >> 0.42547345
    
    searcher.doc('123')
    >> Document(docid='123', raw='With a number of condo developments springing up in the city, it can be difficult to narrow down your choices for the perfect Montreal condo for sale. Our skilled agents organize your steps towards meeting your goals with our condo projects located in popular and trendy neighbourhoods.')
    
    opened by MXueguang 13
  • IndexOutOfBoundsException calling get_term_counts

    IndexOutOfBoundsException calling get_term_counts

    This is code to print the top tf.idf-weighted terms from documents in a run:

    reader = IndexReader.from_prebuilt_index('robust04')
    for topic, docs in run.items():
        print('---', topic)
        for doc in docs:
            print('---', doc)
            vec = reader.get_document_vector(doc)
            weighted = []
            for term, tf in vec.items():
                print('---', term, tf)
                df, cf = reader.get_term_counts(term)
                tfidf = tf / df
                heapq.heappush(weighted, (tfidf, term))
            for weight, term in heapq.nlargest(10, weighted):
                print(topic, doc, term, weight)
    

    The run I am iterating is a BM25 retrieval run on robust04 from Pyserini. On topic 301, document FBIS4-40260, term 'it' (tf=2), I get the following error:

    Traceback (most recent call last):
      File "/Users/soboroff/pyserini-fire/./top-terms.py", line 33, in <module>
        df, cf = reader.get_term_counts(term)
      File "/Users/soboroff/pyserini-fire/venv/lib/python3.10/site-packages/pyserini/index/_base.py", line 259, in get_term_counts
        term_map = self.object.getTermCountsWithAnalyzer(self.reader, JString(term.encode('utf-8')), analyzer)
      File "jnius/jnius_export_class.pxi", line 884, in jnius.JavaMethod.__call__
      File "jnius/jnius_export_class.pxi", line 1056, in jnius.JavaMethod.call_staticmethod
      File "jnius/jnius_utils.pxi", line 91, in jnius.check_exception
    jnius.JavaException: JVM exception occurred: Index 0 out of bounds for length 0 java.lang.IndexOutOfBoundsException
    
    opened by isoboroff 12
  • Unable to do Dense search against own index

    Unable to do Dense search against own index

    My environment:

    • OS - Ubuntu 18.04
    • Java 11.0.11
    • Python 3.8.8
    • Python Package versions:
      • torch 1.8.1
      • faiss-cpu 1.7.0
      • pyserini 0.12.0

    Problem 1

    I followed instructions to create my own minimal index and was able to run the Sparse Retrieval example successfully. However, when I tried to run the Dense retrieval example using the TctColBertQueryEncoder, I encountered the following issues that seem to be caused by me having a newer version of the transformers library, where the requires_faiss and requires_pytorch methods have been replaced with a more general requires_backends method in transformers.file_utils. The following files were affected.

    pyserini/dsearch/_dsearcher.py
    pyserini/dsearch/_model.py
    

    Problem 2

    Replacing them in place in the Pyserini code in my site-packages allowed me to move forward, but now I get the error message:

    RuntimeError: Error in faiss::FileIOReader::FileIOReader(const char*) at /__w/faiss-wheels/faiss-wheels/faiss/faiss/impl/io.cpp:81: Error: 'f' failed: could not open /path/to/lucene_index/index for reading: No such file or directory
    

    The /path/to/lucene_index above is a folder where my lucene index was built using pyserini.index. I am guessing that an additional ANN index might be required to be built from the data to allow Dense searching to happen? I looked in the help for pyserini.index but there did not seem to be anything that indicated creation of ANN index.

    I can live with the first problem (since I have a local solution) but obviously some fix to that would be nice. For the second problem, some documentation or help with building a local index for dense searching will be very much appreciated.

    Thanks!

    opened by sujitpal 12
  • Broken links in prebuilt READMEs

    Broken links in prebuilt READMEs

    From here: https://github.com/castorini/pyserini/blob/master/docs/prebuilt-indexes.md

    Link to robust04 README is broken. Might want to go through and make sure they all work...

    opened by lintool 0
  • Fill in missing conditions in MS MARCO V1 repro maxtrix

    Fill in missing conditions in MS MARCO V1 repro maxtrix

    Here: https://castorini.github.io/pyserini/2cr/msmarco-v1-passage.html

    Screen Shot 2022-12-18 at 10 35 34 AM

    We're missing a bunch of conditions that we should add.

    @MXueguang this is probably pretty easy to do right?

    opened by lintool 0
  • Refactor Dependencies

    Refactor Dependencies

    Initial PR Based on https://github.com/castorini/pyserini/issues/1375

    Modularize imports so that LuceneSearcher does not rely on Faiss, torch, and transformers

    opened by ToluClassics 1
  • Importing LuceneSearcher relies on FAISS and Torch

    Importing LuceneSearcher relies on FAISS and Torch

    Currently, importing LuceneSearcher fails if faiss and torch aren't installed. (They aren't installed by design because they're platform-specific, see: https://github.com/castorini/pyserini#installation)

    This is likely caused by the imports in the following init file: https://github.com/castorini/pyserini/blob/master/pyserini/search/init.py#L23-L26

    A fix would need to modularize those imports.

    If no one gets to it before me, I will attempt to send a PR to fix this.

    opened by cakiki 1
Releases(pyserini-0.19.2)
Owner
Castorini
Deep learning for natural language processing and information retrieval at the University of Waterloo
Castorini
Pull sensitive data from users on windows including discord tokens and chrome data.

⭐ For a 🍪 Pegasus Pull sensitive data from users on windows including discord tokens and chrome data. Features 🟩 Discord tokens 🟩 Geolocation data

Addi 44 Dec 31, 2022
Molecular Sets (MOSES): A benchmarking platform for molecular generation models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

Neelesh C A 3 Oct 14, 2022
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

MEGVII Research 141 Dec 30, 2022
Code for ICDM2020 full paper: "Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning"

Subg-Con Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning (Jiao et al., ICDM 2020): https://arxiv.org/abs/2009.10273 Over

34 Jul 06, 2022
LAMDA: Label Matching Deep Domain Adaptation

LAMDA: Label Matching Deep Domain Adaptation This is the implementation of the paper LAMDA: Label Matching Deep Domain Adaptation which has been accep

Tuan Nguyen 9 Sep 06, 2022
Efficient 3D human pose estimation in video using 2D keypoint trajectories

3D human pose estimation in video with temporal convolutions and semi-supervised training This is the implementation of the approach described in the

Meta Research 3.1k Dec 29, 2022
Super Pix Adv - Offical implemention of Robust Superpixel-Guided Attentional Adversarial Attack (CVPR2020)

Super_Pix_Adv Offical implemention of Robust Superpixel-Guided Attentional Adver

DLight 8 Oct 26, 2022
A PyTorch Implementation of PGL-SUM from "Combining Global and Local Attention with Positional Encoding for Video Summarization", Proc. IEEE ISM 2021

PGL-SUM: Combining Global and Local Attention with Positional Encoding for Video Summarization PyTorch Implementation of PGL-SUM From "PGL-SUM: Combin

Evlampios Apostolidis 35 Dec 22, 2022
A custom DeepStack model that has been trained detecting ONLY the USPS logo

This repository provides a custom DeepStack model that has been trained detecting ONLY the USPS logo. This was created after I discovered that the Deepstack OpenLogo custom model I was using did not

Stephen Stratoti 9 Dec 27, 2022
A PyTorch implementation of DenseNet.

A PyTorch Implementation of DenseNet This is a PyTorch implementation of the DenseNet-BC architecture as described in the paper Densely Connected Conv

Brandon Amos 771 Dec 15, 2022
A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

Somnus `Chen 2 Jun 09, 2022
A Simple LSTM-Based Solution for "Heartbeat Signal Classification and Prediction" in Tianchi

LSTM-Time-Series-Prediction A Simple LSTM-Based Solution for "Heartbeat Signal Classification and Prediction" in Tianchi Contest. The Link of the Cont

KevinCHEN 1 Jun 13, 2022
training script for space time memory network

Trainig Script for Space Time Memory Network This codebase implemented training code for Space Time Memory Network with some cyclic features. Requirem

Yuxi Li 100 Dec 20, 2022
DM-ACME compatible implementation of the Arm26 environment from Mujoco

ACME-compatible implementation of Arm26 from Mujoco This repository contains a customized implementation of Mujoco's Arm26 model, that can be used wit

1 Dec 24, 2021
PyTorch implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks"

DiscoGAN in PyTorch PyTorch implementation of Learning to Discover Cross-Domain Relations with Generative Adversarial Networks. * All samples in READM

Taehoon Kim 1k Jan 04, 2023
Official implementation of Long-Short Transformer in PyTorch.

Long-Short Transformer (Transformer-LS) This repository hosts the code and models for the paper: Long-Short Transformer: Efficient Transformers for La

NVIDIA Corporation 198 Dec 29, 2022
A collection of resources and papers on Diffusion Models, a darkhorse in the field of Generative Models

This repository contains a collection of resources and papers on Diffusion Models and Score-based Models. If there are any missing valuable resources

5.1k Jan 08, 2023
This repository provides the code for MedViLL(Medical Vision Language Learner).

MedViLL This repository provides the code for MedViLL(Medical Vision Language Learner). Our proposed architecture MedViLL is a single BERT-based model

SuperSuperMoon 39 Jan 05, 2023
A library for uncertainty quantification based on PyTorch

Torchuq [logo here] TorchUQ is an extensive library for uncertainty quantification (UQ) based on pytorch. TorchUQ currently supports 10 representation

TorchUQ 96 Dec 12, 2022
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Region_Learner The Pytorch implementation for "Video-Text Pre-training with Learned Regions" (arxiv) We are still cleaning up the code further and pre

Rui Yan 0 Mar 20, 2022