Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Overview

Pyserini

Generic badge Maven Central PyPI PyPI Download Stats LICENSE

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations. Retrieval using sparse representations is provided via integration with our group's Anserini IR toolkit, which is built on Lucene. Retrieval using dense representations is provided via integration with Facebook's Faiss library.

Pyserini is primarily designed to provide effective, reproducible, and easy-to-use first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections

With Pyserini, it's easy to reproduce runs on a number of standard IR test collections! A low-effort way to try things out is to look at our online notebooks, which will allow you to get started with just a few clicks.

Package Installation

Install via PyPI (requires Python 3.6+):

pip install pyserini

Sparse retrieval depends on Anserini, which is itself built on Lucene, and thus Java 11.

Dense retrieval depends on neural networks and requires a more complex set of dependencies. A pip installation will automatically pull in the 🤗 Transformers library to satisfy the package requirements. Pyserini also depends on PyTorch and Faiss, but since these packages may require platform-specific custom configuration, they are not explicitly listed in the package requirements. We leave the installation of these packages to you.

The software ecosystem is rapidly evolving and a potential source of frustration is incompatibility among different versions of underlying dependencies. We provide additional detailed installation instructions here.

Development Installation

If you're planning on just using Pyserini, then the pip instructions above are fine. However, if you're planning on contributing to the codebase or want to work with the latest not-yet-released features, you'll need a development installation. For this, clone our repo with the --recurse-submodules option to make sure the tools/ submodule also gets cloned.

The tools/ directory, which contains evaluation tools and scripts, is actually this repo, integrated as a Git submodule (so that it can be shared across related projects). Build as follows (you might get warnings, but okay to ignore):

cd tools/eval && tar xvfz trec_eval.9.0.4.tar.gz && cd trec_eval.9.0.4 && make && cd ../../..
cd tools/eval/ndeval && make && cd ../../..

Next, you'll need to clone and build Anserini. It makes sense to put both pyserini/ and anserini/ in a common folder. After you've successfully built Anserini, copy the fatjar, which will be target/anserini-X.Y.Z-SNAPSHOT-fatjar.jar into pyserini/resources/jars/. As with the pip installation, a potential source of frustration is incompatibility among different versions of underlying dependencies. For these and other issues, we provide additional detailed installation instructions here.

You can confirm everything is working by running the unit tests:

python -m unittest

Assuming all tests pass, you should be ready to go!

Quick Links

How do I search?

Pyserini supports sparse retrieval (e.g., BM25 ranking using bag-of-words representations), dense retrieval (e.g., nearest-neighbor search on transformer-encoded representations), as well hybrid retrieval that integrates both approaches.

Sparse Retrieval

The SimpleSearcher class provides the entry point for sparse retrieval using bag-of-words representations. Anserini supports a number of pre-built indexes for common collections that it'll automatically download for you and store in ~/.cache/pyserini/indexes/. Here's how to use a pre-built index for the MS MARCO passage ranking task and issue a query interactively:

from pyserini.search import SimpleSearcher

searcher = SimpleSearcher.from_prebuilt_index('msmarco-passage')
hits = searcher.search('what is a lobster roll?')

for i in range(0, 10):
    print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')

The results should be as follows:

 1 7157707 11.00830
 2 6034357 10.94310
 3 5837606 10.81740
 4 7157715 10.59820
 5 6034350 10.48360
 6 2900045 10.31190
 7 7157713 10.12300
 8 1584344 10.05290
 9 533614  9.96350
10 6234461 9.92200

To further examine the results:

# Grab the raw text:
hits[0].raw

# Grab the raw Lucene Document:
hits[0].lucene_document

Pre-built indexes are hosted on University of Waterloo servers. The following method will list available pre-built indexes:

SimpleSearcher.list_prebuilt_indexes()

A description of what's available can be found here. Alternatively, see this answer for how to download an index manually.

Dense Retrieval

The SimpleDenseSearcher class provides the entry point for dense retrieval, and its usage is quite similar to SimpleSearcher. The only additional thing we need to specify for dense retrieval is the query encoder.

from pyserini.dsearch import SimpleDenseSearcher, TctColBertQueryEncoder

encoder = TctColBertQueryEncoder('castorini/tct_colbert-msmarco')
searcher = SimpleDenseSearcher.from_prebuilt_index(
    'msmarco-passage-tct_colbert-hnsw',
    encoder
)
hits = searcher.search('what is a lobster roll')

for i in range(0, 10):
    print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')

If you encounter an error (on macOS), you'll need the following:

import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

The results should be as follows:

 1 7157710 70.53742
 2 7157715 70.50040
 3 7157707 70.13804
 4 6034350 69.93666
 5 6321969 69.62683
 6 4112862 69.34587
 7 5515474 69.21354
 8 7157708 69.08416
 9 6321974 69.06841
10 2920399 69.01737

Hybrid Sparse-Dense Retrieval

The HybridSearcher class provides the entry point to perform hybrid sparse-dense retrieval:

from pyserini.search import SimpleSearcher
from pyserini.dsearch import SimpleDenseSearcher, TctColBertQueryEncoder
from pyserini.hsearch import HybridSearcher

ssearcher = SimpleSearcher.from_prebuilt_index('msmarco-passage')
encoder = TctColBertQueryEncoder('castorini/tct_colbert-msmarco')
dsearcher = SimpleDenseSearcher.from_prebuilt_index(
    'msmarco-passage-tct_colbert-hnsw',
    encoder
)
hsearcher = HybridSearcher(dsearcher, ssearcher)
hits = hsearcher.search('what is a lobster roll')

for i in range(0, 10):
    print(f'{i+1:2} {hits[i].docid:7} {hits[i].score:.5f}')

The results should be as follows:

 1 7157715 71.56022
 2 7157710 71.52962
 3 7157707 71.23887
 4 6034350 70.98502
 5 6321969 70.61903
 6 4112862 70.33807
 7 5515474 70.20574
 8 6034357 70.11168
 9 5837606 70.09911
10 7157708 70.07636

In general, hybrid retrieval will be more effective than dense retrieval, which will be more effective than sparse retrieval.

How do I fetch a document?

Another commonly used feature in Pyserini is to fetch a document (i.e., its text) given its docid. This is easy to do:

from pyserini.search import SimpleSearcher

searcher = SimpleSearcher.from_prebuilt_index('msmarco-passage')
doc = searcher.doc('7157715')

From doc, you can access its contents as well as its raw representation. The contents hold the representation of what's actually indexed; the raw representation is usually the original "raw document". A simple example can illustrate this distinction: for an article from CORD-19, raw holds the complete JSON of the article, which obviously includes the article contents, but has metadata and other information as well. The contents contain extracts from the article that's actually indexed (for example, the title and abstract). In most cases, contents can be deterministically reconstructed from raw. When building the index, we specify flags to store contents and/or raw; it is rarely the case that we store both, since that would be a waste of space. In the case of the pre-built msmacro-passage index, we only store raw. Thus:

# Document contents: what's actually indexed.
# Note, this is not stored in the pre-built msmacro-passage index.
doc.contents()
                                                                                                   
# Raw document
doc.raw()

As you'd expected, doc.id() returns the docid, which is 7157715 in this case. Finally, doc.lucene_document() returns the underlying Lucene Document (i.e., a Java object). With that, you get direct access to the complete Lucene API for manipulating documents.

Since each text in the MS MARCO passage corpus is a JSON object, we can read the document into Python and manipulate:

import json
json_doc = json.loads(doc.raw())

json_doc['contents']
# 'contents' of the document:
# A Lobster Roll is a bread roll filled with bite-sized chunks of lobster meat...

Every document has a docid, of type string, assigned by the collection it is part of. In addition, Lucene assigns each document a unique internal id (confusingly, Lucene also calls this the docid), which is an integer numbered sequentially starting from zero to one less than the number of documents in the index. This can be a source of confusion but the meaning is usually clear from context. Where there may be ambiguity, we refer to the external collection docid and Lucene's internal docid to be explicit. Programmatically, the two are distinguished by type: the first is a string and the second is an integer.

As an important side note, Lucene's internal docids are not stable across different index instances. That is, in two different index instances of the same collection, Lucene is likely to have assigned different internal docids for the same document. This is because the internal docids are assigned based on document ingestion order; this will vary due to thread interleaving during indexing (which is usually performed on multiple threads).

The doc method in searcher takes either a string (interpreted as an external collection docid) or an integer (interpreted as Lucene's internal docid) and returns the corresponding document. Thus, a simple way to iterate through all documents in the collection (and for example, print out its external collection docid) is as follows:

for i in range(searcher.num_docs):
    print(searcher.doc(i).docid())

How do I index and search my own documents?

To build sparse (i.e., Lucene inverted indexes) on your own document collections, following the instructions below. To build dense indexes (e.g., the output of transformer encoders) on your own document collections, see instructions here. The following covers English documents; if you want to index and search multilingual documents, check out this answer.

Pyserini (via Anserini) provides ingestors for document collections in many different formats. The simplest, however, is the following JSON format:

{
  "id": "doc1",
  "contents": "this is the contents."
}

A document is simply comprised of two fields, a docid and contents. Pyserini accepts collections comprised of these documents organized in three different ways:

  • Folder with each JSON in its own file, like this.
  • Folder with files, each of which contains an array of JSON documents, like this.
  • Folder with files, each of which contains a JSON on an individual line, like this (often called JSONL format).

So, the quickest way to get started is to write a script that converts your documents into the above format. Then, you can invoke the indexer (here, we're indexing JSONL, but any of the other formats work as well):

python -m pyserini.index -collection JsonCollection \
                         -generator DefaultLuceneDocumentGenerator \
                         -threads 1 \
                         -input integrations/resources/sample_collection_jsonl \
                         -index indexes/sample_collection_jsonl \
                         -storePositions -storeDocvectors -storeRaw

Three options control the type of index that is built:

  • -storePositions: builds a standard positional index
  • -storeDocvectors: stores doc vectors (required for relevance feedback)
  • -storeRaw: stores raw documents

If you don't specify any of the three options above, Pyserini builds an index that only stores term frequencies. This is sufficient for simple "bag of words" querying (and yields the smallest index size).

Once indexing is done, you can use SimpleSearcher to search the index:

from pyserini.search import SimpleSearcher

searcher = SimpleSearcher('indexes/sample_collection_jsonl')
hits = searcher.search('document')

for i in range(len(hits)):
    print(f'{i+1:2} {hits[i].docid:4} {hits[i].score:.5f}')

You should get something like the following:

 1 doc2 0.25620
 2 doc3 0.23140

If you want to perform a batch retrieval run (e.g., directly from the command line), organize all your queries in a tsv file, like here. The format is simple: the first field is a query id, and the second field is the query itself. Note that the file extension must end in .tsv so that Pyserini knows what format the queries are in.

Then, you can run:

$ python -m pyserini.search --topics integrations/resources/sample_queries.tsv \
                            --index indexes/sample_collection_jsonl \
                            --output run.sample.txt \
                            --bm25

$ cat run.sample.txt 
1 Q0 doc2 1 0.256200 Anserini
1 Q0 doc3 2 0.231400 Anserini
2 Q0 doc1 1 0.534600 Anserini
3 Q0 doc1 1 0.256200 Anserini
3 Q0 doc2 2 0.256199 Anserini
4 Q0 doc3 1 0.483000 Anserini

Note that output run file is in standard TREC format.

You can also add extra fields in your documents when needed, e.g. text features. For example, the SpaCy Named Entity Recognition (NER) result of contents could be stored as an additional field NER.

{
  "id": "doc1",
  "contents": "The Manhattan Project and its atomic bomb helped bring an end to World War II. Its legacy of peaceful uses of atomic energy continues to have an impact on history and science.",
  "NER": {
            "ORG": ["The Manhattan Project"],
            "MONEY": ["World War II"]
         }
}

Reproduction Guides

With Pyserini, it's easy to reproduce runs on a number of standard IR test collections!

Sparse Retrieval

Dense Retrieval

Baselines

Pyserini provides baselines for a number of datasets.

Additional Documentation

Known Issues

Anserini is designed to work with JDK 11. There was a JRE path change above JDK 9 that breaks pyjnius 1.2.0, as documented in this issue, also reported in Anserini here and here. This issue was fixed with pyjnius 1.2.1 (released December 2019). The previous error was documented in this notebook and this notebook documents the fix.

Release History

With v0.11.0.0 and before, Pyserini versions adopted the convention of X.Y.Z.W, where X.Y.Z tracks the version of Anserini, and W is used to distinguish different releases on the Python end. Starting with Anserini v0.12.0, Anserini and Pyserini versions have become decoupled.

Comments
  • Dense search replication, starting from hgf model

    Dense search replication, starting from hgf model

    Here's I think our end target: start with hgf model from model hub - assume that's fix.

    1. Be able to encode corpus and queries - scripts for doing so should be in https://github.com/castorini/pyserini/tree/master/scripts
    2. Scripts for building hnsw index, also in scripts/
    3. (1) and (2) are what we store as "pre-built".

    This will allow replication and bring every part of the pipeline in sync - other than training the encoder model.

    @MXueguang @justram @jacklin64 thoughts?

    opened by lintool 18
  • Multiple language support?

    Multiple language support?

    Hi,

    Does pyserini currently support languages other than language? Specifically, I am asking about using features such as creating an index by python -m pyserini.index -collection JsonCollection -generator DefaultLuceneDocumentGenerator ... and using searcher.search. If yes, how do I integrate it in python script?

    Thank you!

    opened by velocityCavalry 16
  • SimpleSearcher.search memory leak

    SimpleSearcher.search memory leak

    When calling search method of SimpleSearcher I noticed RAM usage increase with every new iteration. Could you tell me please how to decrease memory leak?

    opened by dmitrijeuseew 16
  • Fold qrels into pyserini directly

    Fold qrels into pyserini directly

    Follow up to #310 - there, we folded the eval scripts directly into pyserini. Now let's do the same with the qrels.

    In actuality, the qrels are already in the anserini jar, since this entire directory is included in the fatjar: https://github.com/castorini/anserini/tree/master/src/main/resources/topics-and-qrels

    Trick is how to get the qrels out...

    This is, in fact, how we can access the topics in anserini: https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/search/topicreader/Topics.java#L22 https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/search/topicreader/TopicReader.java#L143

    And pyserini just wraps the Java methods above.


    With that background, I propose to apply the same treatment to qrels.

    1. Add a method in Anserini (on the Java end) to read qrels from resources/topics-and-qrels/ into a String. We can use the same "ids" as the topics. Build around here: https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/util/Qrels.java
    2. On the Python end, we call the Java method, which reads the qrels as a string. Then we write back the string into ~/.cache/pyserini.
    3. Our eval scripts can then reference ~/.cache/pyserini.

    And at the end of the day, we'll be able to do this directly:

    $ python -m pyserini.search --topics robust04 --index robust04 --output run.robust04.txt --bm25
    $ python -m pyserini.eval.trec_eval --qrels robust04 -m map -m P.30 run.robust04.txt
    

    (With no need to download any intermediate data... everything is self contained!)

    @MXueguang thoughts? Do you like it? Any better way?

    opened by lintool 16
  • Add automate downloading of indexes

    Add automate downloading of indexes

    Currently, this change supports 'ms-marco-passage', 'ms-marco-doc' and 'TREC Disks 4 & 5'.

    • If the index exists, skip the download and use the index under '(pyserini)/indexes'.
    • If not, download the index to cache(~/.cache/pyserini/indexes) and extract the index to (pyserini)/indexes. Finally, delete the gz file in cache. Should we keep the gz file in cache?
    opened by qguo96 16
  • Resolve tiny differences between Anserini and Pyserini on MS MARCO: query iteration order

    Resolve tiny differences between Anserini and Pyserini on MS MARCO: query iteration order

    If we look at the Python replications: https://github.com/castorini/pyserini/blob/master/docs/pypi-replication.md Compared against Anserini replications: e.g., https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-doc-leaderboard.md

    We'll note tiny differences - e.g., for MS MARCO doc, baselines - pyserini:

    #####################
    MRR @100: 0.2770296928568709
    QueriesRanked: 5193
    #####################
    

    Compared to anserini:

    #####################
    MRR @100: 0.2770296928568702
    QueriesRanked: 5193
    #####################
    

    Previously, we tracked it down issue #257

    I'd like to fix it so get identical results moving forward - my proposed fix is a bit janky, but it'll work: Let's just store, in Python code, an array of integers corresponding to ids of the queries in the original queries file. When we're iterating over the dataset in pyserini.search, we just follow the order of the integers.

    Slightly better, we introduce a new query iterator abstraction and hide this implementation detail in there. So the query iterator would take in the current dictionary, and an optional array holding the iteration order.

    Thoughts @MXueguang? I was thinking you could work on this?

    opened by lintool 15
  • DPR replication docs

    DPR replication docs

    Hi @MXueguang - when everything is implemented DPR should probably get it's own separate replication page, like for MS MARCO: https://github.com/castorini/pyserini/blob/master/docs/experiments-msmarco-passage.md

    Containing both spare, hybrid, and dense retrieval.

    Then we can add a replication log also - starting point for people interested in working more on it.

    opened by lintool 14
  • Incorrect encoding on Windows

    Incorrect encoding on Windows

    When using pyserini under Windows, it seems that the encoding of strings is breaking when passed to the JNI via the pyjnius package.

    It happens when a string is encoded as UTF-8 like this JString(my_str.encode('utf-8')) (e.g., https://github.com/castorini/pyserini/blob/master/pyserini/search/_searcher.py#L114). It only occurs under Windows as it must collide with the default Windows encoding CP-1252.

    I discussed this issue with the maintainers of pyjnius and it seems that to make it work independently from the platform, the .encode('utf-8') could simply be dropped.

    Was there a reason why this manual encoding was used in pyserini?

    I created a branch with the changes, I could do a PR if you wish.

    opened by stekiri 13
  • Dense retrieval draft

    Dense retrieval draft

    An example of usage, since dense index doesn't contains raw data, I loaded the corpus separately.

    import numpy as np
    from pyserini.search import SimpleDenseSearcher
    
    searcher = SimpleDenseSearcher.from_prebuilt_index('msmarco_passage_0', 'collection.tsv')
    
    query_emb = np.random.random(768).astype('float32')
    result = searcher.search(query_emb)
    
    result[0].raw
    >> 'Lander, WY Sales Tax Rate. The current total local sales tax rate in Lander, WY is 5.000%. The December 2015 total local sales tax rate was also 5.000%. Lander, WY is in Fremont County. Lander is in the following zip codes: 82520.'
    
    result[0].docid
    >> '350921'
    
    result[0].score
    >> 0.42547345
    
    searcher.doc('123')
    >> Document(docid='123', raw='With a number of condo developments springing up in the city, it can be difficult to narrow down your choices for the perfect Montreal condo for sale. Our skilled agents organize your steps towards meeting your goals with our condo projects located in popular and trendy neighbourhoods.')
    
    opened by MXueguang 13
  • IndexOutOfBoundsException calling get_term_counts

    IndexOutOfBoundsException calling get_term_counts

    This is code to print the top tf.idf-weighted terms from documents in a run:

    reader = IndexReader.from_prebuilt_index('robust04')
    for topic, docs in run.items():
        print('---', topic)
        for doc in docs:
            print('---', doc)
            vec = reader.get_document_vector(doc)
            weighted = []
            for term, tf in vec.items():
                print('---', term, tf)
                df, cf = reader.get_term_counts(term)
                tfidf = tf / df
                heapq.heappush(weighted, (tfidf, term))
            for weight, term in heapq.nlargest(10, weighted):
                print(topic, doc, term, weight)
    

    The run I am iterating is a BM25 retrieval run on robust04 from Pyserini. On topic 301, document FBIS4-40260, term 'it' (tf=2), I get the following error:

    Traceback (most recent call last):
      File "/Users/soboroff/pyserini-fire/./top-terms.py", line 33, in <module>
        df, cf = reader.get_term_counts(term)
      File "/Users/soboroff/pyserini-fire/venv/lib/python3.10/site-packages/pyserini/index/_base.py", line 259, in get_term_counts
        term_map = self.object.getTermCountsWithAnalyzer(self.reader, JString(term.encode('utf-8')), analyzer)
      File "jnius/jnius_export_class.pxi", line 884, in jnius.JavaMethod.__call__
      File "jnius/jnius_export_class.pxi", line 1056, in jnius.JavaMethod.call_staticmethod
      File "jnius/jnius_utils.pxi", line 91, in jnius.check_exception
    jnius.JavaException: JVM exception occurred: Index 0 out of bounds for length 0 java.lang.IndexOutOfBoundsException
    
    opened by isoboroff 12
  • Unable to do Dense search against own index

    Unable to do Dense search against own index

    My environment:

    • OS - Ubuntu 18.04
    • Java 11.0.11
    • Python 3.8.8
    • Python Package versions:
      • torch 1.8.1
      • faiss-cpu 1.7.0
      • pyserini 0.12.0

    Problem 1

    I followed instructions to create my own minimal index and was able to run the Sparse Retrieval example successfully. However, when I tried to run the Dense retrieval example using the TctColBertQueryEncoder, I encountered the following issues that seem to be caused by me having a newer version of the transformers library, where the requires_faiss and requires_pytorch methods have been replaced with a more general requires_backends method in transformers.file_utils. The following files were affected.

    pyserini/dsearch/_dsearcher.py
    pyserini/dsearch/_model.py
    

    Problem 2

    Replacing them in place in the Pyserini code in my site-packages allowed me to move forward, but now I get the error message:

    RuntimeError: Error in faiss::FileIOReader::FileIOReader(const char*) at /__w/faiss-wheels/faiss-wheels/faiss/faiss/impl/io.cpp:81: Error: 'f' failed: could not open /path/to/lucene_index/index for reading: No such file or directory
    

    The /path/to/lucene_index above is a folder where my lucene index was built using pyserini.index. I am guessing that an additional ANN index might be required to be built from the data to allow Dense searching to happen? I looked in the help for pyserini.index but there did not seem to be anything that indicated creation of ANN index.

    I can live with the first problem (since I have a local solution) but obviously some fix to that would be nice. For the second problem, some documentation or help with building a local index for dense searching will be very much appreciated.

    Thanks!

    opened by sujitpal 12
  • Broken links in prebuilt READMEs

    Broken links in prebuilt READMEs

    From here: https://github.com/castorini/pyserini/blob/master/docs/prebuilt-indexes.md

    Link to robust04 README is broken. Might want to go through and make sure they all work...

    opened by lintool 0
  • Fill in missing conditions in MS MARCO V1 repro maxtrix

    Fill in missing conditions in MS MARCO V1 repro maxtrix

    Here: https://castorini.github.io/pyserini/2cr/msmarco-v1-passage.html

    Screen Shot 2022-12-18 at 10 35 34 AM

    We're missing a bunch of conditions that we should add.

    @MXueguang this is probably pretty easy to do right?

    opened by lintool 0
  • Refactor Dependencies

    Refactor Dependencies

    Initial PR Based on https://github.com/castorini/pyserini/issues/1375

    Modularize imports so that LuceneSearcher does not rely on Faiss, torch, and transformers

    opened by ToluClassics 1
  • Importing LuceneSearcher relies on FAISS and Torch

    Importing LuceneSearcher relies on FAISS and Torch

    Currently, importing LuceneSearcher fails if faiss and torch aren't installed. (They aren't installed by design because they're platform-specific, see: https://github.com/castorini/pyserini#installation)

    This is likely caused by the imports in the following init file: https://github.com/castorini/pyserini/blob/master/pyserini/search/init.py#L23-L26

    A fix would need to modularize those imports.

    If no one gets to it before me, I will attempt to send a PR to fix this.

    opened by cakiki 1
Releases(pyserini-0.19.2)
Owner
Castorini
Deep learning for natural language processing and information retrieval at the University of Waterloo
Castorini
Denoising images with Fourier Ring Correlation loss

Denoising images with Fourier Ring Correlation loss The python code accompanies the working manuscript Image quality measurements and denoising using

2 Mar 12, 2022
A Python library created to assist programmers with complex mathematical functions

libmaths libmaths was created not only as a learning experience for me, but as a way to make mathematical models in seconds for Python users using mat

Simple 73 Oct 02, 2022
HeartRate detector with ArduinoandPython - Use Arduino and Python create a heartrate detector.

Syllabus of Contents Syllabus of Contents Introduction Of Project Features Develop With Python code introduction Installation License Developer Contac

1 Jan 05, 2022
Simulate genealogical trees and genomic sequence data using population genetic models

msprime msprime is a population genetics simulator based on tskit. Msprime can simulate random ancestral histories for a sample of individuals (consis

Tskit developers 150 Dec 14, 2022
This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection', CVPR 2019.

Code-and-Dataset-for-CapSal This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detec

lu zhang 48 Aug 19, 2022
Official implementation of NeurIPS 2021 paper "One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective"

Official implementation of NeurIPS 2021 paper "One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective"

Ng Kam Woh 71 Dec 22, 2022
Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

Zero-shot-Fact-Verification-by-Claim-Generation This repository contains code and models for the paper: Zero-shot Fact Verification by Claim Generatio

Liangming Pan 47 Jan 01, 2023
Gif-caption - A straightforward GIF Captioner written in Python

Broksy's GIF Captioner Have you ever wanted to easily caption a GIF without havi

3 Apr 09, 2022
ppo_pytorch_cpp - an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 09, 2022
OpenMMLab Pose Estimation Toolbox and Benchmark.

Introduction English | 简体中文 MMPose is an open-source toolbox for pose estimation based on PyTorch. It is a part of the OpenMMLab project. The master b

OpenMMLab 2.8k Dec 31, 2022
Anchor-free Oriented Proposal Generator for Object Detection

Anchor-free Oriented Proposal Generator for Object Detection Gong Cheng, Jiabao Wang, Ke Li, Xingxing Xie, Chunbo Lang, Yanqing Yao, Junwei Han, Intro

jbwang1997 56 Nov 15, 2022
CowHerd is a partially-observed reinforcement learning environment

CowHerd is a partially-observed reinforcement learning environment, where the player walks around an area and is rewarded for milking cows. The cows try to escape and the player can place fences to h

Danijar Hafner 6 Mar 06, 2022
Algorithmic trading with deep learning experiments

Deep-Trading Algorithmic trading with deep learning experiments. Now released part one - simple time series forecasting. I plan to implement more soph

Alex Honchar 1.4k Jan 02, 2023
Examples of how to create colorful, annotated equations in Latex using Tikz.

The file "eqn_annotate.tex" is the main latex file. This repository provides four examples of annotated equations: [example_prob.tex] A simple one ins

SyNeRCyS Research Lab 3.2k Jan 05, 2023
Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

peng gao 42 Nov 26, 2022
Neural HMMs are all you need (for high-quality attention-free TTS)

Neural HMMs are all you need (for high-quality attention-free TTS) Shivam Mehta, Éva Székely, Jonas Beskow, and Gustav Eje Henter This is the official

Shivam Mehta 0 Oct 28, 2022
Github project for Attention-guided Temporal Coherent Video Object Matting.

Attention-guided Temporal Coherent Video Object Matting This is the Github project for our paper Attention-guided Temporal Coherent Video Object Matti

71 Dec 19, 2022
maximal update parametrization (µP)

Maximal Update Parametrization (μP) and Hyperparameter Transfer (μTransfer) Paper link | Blog link In Tensor Programs V: Tuning Large Neural Networks

Microsoft 694 Jan 03, 2023
my graduation project is about live human face augmentation by projection mapping by using CNN

Live-human-face-expression-augmentation-by-projection my graduation project is about live human face augmentation by projection mapping by using CNN o

1 Mar 08, 2022
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Zhensu Sun 1 Oct 26, 2021