PECOS - Prediction for Enormous and Correlated Spaces

Overview

PECOS - Predictions for Enormous and Correlated Output Spaces

PyPi Latest Release License

PECOS is a versatile and modular machine learning (ML) framework for fast learning and inference on problems with large output spaces, such as extreme multi-label ranking (XMR) and large-scale retrieval. PECOS' design is intentionally agnostic to the specific nature of the inputs and outputs as it is envisioned to be a general-purpose framework for multiple distinct applications.

Given an input, PECOS identifies a small set (10-100) of relevant outputs from amongst an extremely large (~100MM) candidate set and ranks these outputs in terms of relevance.

Features

Extreme Multi-label Ranking and Classification

  • X-Linear (pecos.xmc.xlinear): recursive linear models learning to traverse an input from the root of a hierarchical label tree to a few leaf node clusters, and return top-k relevant labels within the clusters as predictions. See more details in the PECOS paper (Yu et al., 2020).

    • fast real-time inference in C++
    • can handle 100MM output space
  • XR-Transformer (pecos.xmc.xtransformer): Transformer based XMC framework that fine-tunes pre-trained transformers recursively on multi-resolution objectives. It can be used to generate top-k relevant labels for a given instance or simply as a fine-tuning engine for task aware embeddings. See technical details in XR-Transformer paper (Zhang et al., 2021).

    • easy to extend with many pre-trained Transformer models from huggingface transformers.
    • establishes the State-of-the-art on public XMC benchmarks.
  • ANN Search with HNSW (pecos.ann.hnsw): a PECOS Approximated Nearest Neighbor (ANN) search module that implements the Hierarchical Navigable Small World Graphs (HNSW) algorithm (Malkov et al., TPAMI 2018).

    • Supports both sparse and dense input features
    • SIMD optimization for both dense/sparse distance computation
    • Supports thread-safe graph construction in parallel on multi-core shared memory machines
    • Supports thread-safe Searchers to do inference in parallel, which reduces inference overhead

Requirements and Installation

  • Python (>=3.6)
  • Pip (>=19.3)

See other dependencies in setup.py You should install PECOS in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide.

Supporting Platforms

  • Ubuntu 18.04 and 20.04
  • Amazon Linux 2

Installation from Wheel

PECOS can be installed using pip as follows:

python3 -m pip install libpecos

Installation from Source

Prerequisite builder tools

  • For Ubuntu (18.04, 20.04):
sudo apt-get update && sudo apt-get install -y build-essential git python3 python3-distutils python3-venv
  • For Amazon Linux 2 Image:
sudo yum -y install python3 python3-devel python3-distutils python3-venv && sudo yum -y install groupinstall 'Development Tools' 

One needs to install at least one BLAS library to compile PECOS, e.g. OpenBLAS:

  • For Ubuntu (18.04, 20.04):
sudo apt-get install -y libopenblas-dev
  • For Amazon Linux 2 Image and AMI:
sudo amazon-linux-extras install epel -y
sudo yum install openblas-devel -y

Install and develop locally

git clone https://github.com/amzn/pecos
cd pecos
python3 -m pip install --editable ./

Quick Tour

To have a glimpse of how PECOS works, here is a quick tour of using PECOS API for the XMR problem.

Toy Example

The eXtreme Multi-label Ranking (XMR) problem is defined by two matrices

Some toy data matrices are available in the tst-data folder.

PECOS constructs a hierarchical label tree and learns linear models recursively (e.g., XR-Linear):

>>> from pecos.xmc.xlinear.model import XLinearModel
>>> from pecos.xmc import Indexer, LabelEmbeddingFactory

# Build hierarchical label tree and train a XR-Linear model
>>> label_feat = LabelEmbeddingFactory.create(Y, X)
>>> cluster_chain = Indexer.gen(label_feat)
>>> model = XLinearModel.train(X, Y, C=cluster_chain)
>>> model.save("./save-models")

After learning the model, we do prediction and evaluation

>>> from pecos.utils import smat_util
>>> Yt_pred = model.predict(Xt)
# print precision and recall at k=10
>>> print(smat_util.Metrics.generate(Yt, Yt_pred))

PECOS also offers optimized C++ implementation for fast real-time inference

>>> model = XLinearModel.load("./save-models", is_predict_only=True)
>>> for i in range(X_tst.shape[0]):
>>>   y_tst_pred = model.predict(X_tst[i], threads=1)

Citation

If you find PECOS useful, please consider citing the following paper:

Some papers from our group using PECOS:

License

Copyright (2021) Amazon.com, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments
  • text2text model evaluation not working

    text2text model evaluation not working

    Description

    Model evaluation is not working properly to output the precision and recall

    How to Reproduce?

    I run the following line of code,

    python3 -m pecos.apps.text2text.evaluate --pred-path ./test-prediction.txt --truth-path ./test.txt --text-item-path ./output-labels.txt
    

    where, --pred-path is the path of file produced during model prediction, --truth-path is the path of test file, e.g. Out1, Out2, Out3 \t cheap door Out1, Out2 and Out3 are the line number in the the following output file
    --text-item-path ./output-labels.txt

    What have you tried to solve it?

    Error message or code output

    Traceback (most recent call last):
      File "/home/khalid/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/khalid/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/pecos/apps/text2text/evaluate.py", line 130, in <module>
        do_evaluation(args)
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/pecos/apps/text2text/evaluate.py", line 119, in do_evaluation
        Y_true = smat.csr_matrix((val_t, (row_id_t, col_id_t)), shape=(num_samples_t, len(item_dict)))
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/compressed.py", line 55, in __init__
        dtype=dtype))
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/coo.py", line 196, in __init__
        self._check()
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/coo.py", line 285, in _check
        raise ValueError('column index exceeds matrix dimensions')
    ValueError: column index exceeds matrix dimensions
    

    Environment

    • Operating system:
    • Python version:
    • PECOS version:

    (Add as much information about your environment as possible, e.g. dependencies versions.)

    bug 
    opened by Khalid-Usman 13
  • Format of yt label

    Format of yt label

    Hello,

    hope you are fine, have 2 questions about the format

    Question 1 Have one question about optimal format for label Yt. Is it preferable to have Yt as:

    (A) OneHot encoded with only one 1 per row. (B) Mutiple OneHot encode with mutiple 1 per rows (as this is the case for Xt).

    When the prediction is done, it seems only outputing only one 1 per row.

    Question 2:

    Is there any constraint by having Xt as having a mix of dense input and sparse input instead of sparse input only ?

    enhancement 
    opened by arita37 7
  • some formatting

    some formatting

    Hi, Thasnks for this.

    Just would like to confirm the format of the input

    X : CSR format x(i,k) = val Can valx be a float ? does it need to be binary or [0,1] value ?

    Y: CSR format, y(i,k) = valy . does it need to be binary ( 0 or 1) ?

    Thx

    opened by arita37 7
  • Online Inference Latency for XR-TRANSFORMER

    Online Inference Latency for XR-TRANSFORMER

    hi!

    When I use XR-TRANSFORMER for predict(per input), the online Inference lattency comes up to 400ms. this is why?

    the system I use is ubtuntu18.04, and XR-TRANSFORMER are evaluated on a Nvidia Tesla V100 GPU.

    Thanks!

    opened by xiaokening 6
  • Issue with --label-embed-type pifa_lf_concat::Z=${Z_pifa_file}

    Issue with --label-embed-type pifa_lf_concat::Z=${Z_pifa_file}

    Description

    I am trying to use ----label-embed-type parameter in the training and it produces this error. ValueError: Object arrays cannot be loaded when allow_pickle=False - Coming from np.load() function.

    I have tested loading of NPZ file for z_labels (compressed and uncompressed, both) it produces this error if allow_pickle=False I have load data by defining the allow_pickle=True for np.load() function.

    Can you please add description of this file format or can we sent this parameter as an input?

    This is the data I have after loading npz file with allow_pickle = True

    [array(['Trump', 'Bus', 'Trolly '], dtype='<U23')
     array(['Show', 'Disp'], dtype='<U20')
     array(['Recap rew'], dtype='<U24')
     array(['Core, '], dtype='<U32')
     array(['Hoe'], dtype='<U10')
     array(['Plan'], dtype='<U21')]
    

    How to Reproduce?

    Execute model training with numpy version 1.21.2

    python -m pecos.apps.text2text.train \
      --label-embed-type pifa_lf_concat::Z=${Z_pifa_file} \
      -i ${train_file} \ 
      -m ${model_folder}
    

    Error message or code output

    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/jupyter/pecos_git/pecos/pecos/apps/text2text/train.py", line 311, in <module>
        train(args)
      File "/home/jupyter/pecos_git/pecos/pecos/apps/text2text/train.py", line 302, in train
        workspace_folder=args.workspace_folder,
      File "/home/jupyter/pecos_git/pecos/pecos/apps/text2text/model.py", line 325, in train
        Z = smat_util.load_matrix(val)
      File "/home/jupyter/pecos_git/pecos/pecos/utils/smat_util.py", line 117, in load_matrix
        mat = np.load(src)
      File "/opt/conda/lib/python3.7/site-packages/numpy/lib/npyio.py", line 441, in load
        pickle_kwargs=pickle_kwargs)
      File "/opt/conda/lib/python3.7/site-packages/numpy/lib/format.py", line 743, in read_array
        raise ValueError("Object arrays cannot be loaded when "
    ValueError: Object arrays cannot be loaded when allow_pickle=False
    

    Environment

    • Operating system: Unix Ubuntu (on GCP)
    • Python version: 3.8
    • PECOS version: 0.1.0
    • numpy version: 1.21.2
    bug 
    opened by zusmani 6
  • Pecos killed on ranker training step

    Pecos killed on ranker training step

    Description

    The training has been killed on this training step:

    Data: Amazon-670k Model: X-Transformer

    [2022-12-01 21:38:23,019][pecos.xmc.xtransformer.model][INFO] - Start training ranker...
    [2022-12-01 21:38:24,001][pecos.xmc.base][INFO] - Training Layer 0 of 4 Layers in HierarchicalMLModel, neg_mining=tfn..
    [2022-12-01 21:39:05,191][pecos.xmc.base][INFO] - Training Layer 1 of 4 Layers in HierarchicalMLModel, neg_mining=tfn..
    [2022-12-01 21:40:24,829][pecos.xmc.base][INFO] - Training Layer 2 of 4 Layers in HierarchicalMLModel, neg_mining=tfn..
    [2022-12-01 21:43:25,293][pecos.xmc.base][INFO] - Training Layer 3 of 4 Layers in HierarchicalMLModel, neg_mining=tfn+man..
    

    Environment

    Distributor ID:	Ubuntu
    Description:	Ubuntu 18.04.6 LTS
    Release:	18.04
    Codename:	bionic
    Python 3.8.15
    libpecos~=0.4.0
    1 RTX A4500, 32 vCPU, and 250 GB RAM
    

    What could it be? Is it possible to resume training from that stage?

    bug 
    opened by celsofranssa 4
  • How to Use XR-Transformer in Text2Text App

    How to Use XR-Transformer in Text2Text App

    Description

    I want to use XR-Transformer in text2text app, following the parameters given here. But setting --params-path to this .json file raise the error:

    Traceback (most recent call last):
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/runpy.py", line 197, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/train.py", line 345, in <module>
        train(args)
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/train.py", line 328, in train
        t2t_model = Text2Text.train(
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/model.py", line 317, in train
        pred_params = pred_params.override_with_kwargs(kwargs)
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/model.py", line 126, in override_with_kwargs
        self.xlinear_params.override_with_kwargs(pred_kwargs)
    AttributeError: 'NoneType' object has no attribute 'override_with_kwargs'
    

    References

    enhancement 
    opened by lyy1994 4
  • Examples with text

    Examples with text

    Description

    Current example of X and Y only has numeric values. Could you please provide one example where X and Y are both text? Think the paper/method is targeted to solve such problems.

    enhancement 
    opened by xyan326 4
  • Add memory-mapped utilility module

    Add memory-mapped utilility module

    Issue #, if available: N/A

    Description of changes: Add memory-mapped utilility module.

    User could test with below code: Copy it into a filetest_mmap_util.cpp placed at pecos/core/util/, and run:

    gcc -lm -ldl -lstdc++ -fopenmp -std=c++14 -lgcc -lgomp -O3  -I ./ test_mmap_util.cpp
    ./a.out
    

    Output:

    Generate a Bar with data:
    ---Bar---
    ---Foo---
    foo_1: 0 1 2 3 4 5 6 7 8 9 
    foo_2: 1
    ---------
    bar: 5 5 5 5 5 
    ---------
    Save Bar into mmap file: ./bar_test_mmap.txt
    Load a new Bar from saved mmap file...
    Loaded Bar data:
    ---Bar---
    ---Foo---
    foo_1: 0 1 2 3 4 5 6 7 8 9 
    foo_2: 1
    ---------
    bar: 5 5 5 5 5 
    ---------
    
    #include <iostream>
    #include "mmap_util.hpp"
    
    using namespace pecos::mmap_util;
    
    // Nested class mmap example
    // Bar contains a Foo instance
    class Foo {
        public:
            Foo() {}
            ~Foo() { foo_1.clear(); }
    
            void init_data() {
                foo_1.resize(10, 0);
                for (int i=0; i<foo_1.size(); ++i) { foo_1[i] = i; }
                foo_2 = 1.0;
            }
    
            void print() {
                std::cout << "---Foo---" << std::endl;
                std::cout << "foo_1: ";
                for (int i=0; i<foo_1.size(); ++i) { std::cout << foo_1[i] << " "; }
                std::cout << std::endl;
                std::cout << "foo_2: " << foo_2 << std::endl;
                std::cout << "---------" << std::endl;
            }
    
            void save_to_mmap_store(MmapStore& mmap_s) const {
                foo_1.save_to_mmap_store(mmap_s);
                mmap_s.fput_one<double>(foo_2);
            }
    
            void load_from_mmap_store(MmapStore& mmap_s) {
                foo_1.load_from_mmap_store(mmap_s);
                foo_2 = mmap_s.fget_one<double>();
            }
    
        private:
            MmapableVector<int> foo_1;
            double foo_2;
    };
    
    class Bar {
        public:
            Bar() { }
            ~Bar() { bar.clear(); mmap_store.close(); }
    
            void init_data() {
                foo.init_data();
                bar.resize(5, 0);
                for (int i=0; i<bar.size(); ++i) { bar[i] = 5.0; }
            }
    
            void print() {
                std::cout << "---Bar---" << std::endl;
                foo.print();
                std::cout << "bar: ";
                for (int i=0; i<bar.size(); ++i) { std::cout << bar[i] << " "; }
                std::cout << std::endl;
                std::cout << "---------" << std::endl;
            }
    
            void save(const std::string & file_name) const {
                // Create a mmapfile for dump at the most outer layer class
                // You cannot reuse (i.e, close and reopen) mmap_store, since it may hold the data storage
                MmapStore mmap_s = MmapStore();
                mmap_s.open(file_name, "w");
    
                save_to_mmap_store(mmap_s);
    
                // Metadata dump and fp closure is automatically done at MmapStore destructor when this function ends
                // You can make it happen earlier with explicitly calling close()
                mmap_s.close();
            }
            void load(const std::string & file_name, const bool pre_load=true) {
                mmap_store.open(file_name, pre_load?"r":"r_lazy");
                load_from_mmap_store(mmap_store);
            }
    
            void save_to_mmap_store(MmapStore& mmap_s) const {
                foo.save_to_mmap_store(mmap_s);
                bar.save_to_mmap_store(mmap_s);
            }
    
            void load_from_mmap_store(MmapStore& mmap_s) {
                foo.load_from_mmap_store(mmap_s);
                bar.load_from_mmap_store(mmap_s);
            }
    
        private:
            Foo foo;
            MmapableVector<double> bar;
            // Mmap Data storage at the most outer layer class
            MmapStore mmap_store;
    };
    
    
    int main() {
        std::string f_name = "./bar_test_mmap.txt";
    
        std::cout << "Generate a Bar with data:" << std::endl;
        Bar bar;
        bar.init_data();
        bar.print();
    
        std::cout << "Save Bar into mmap file: " << f_name << std::endl;
        bar.save(f_name);
    
        std::cout << "Load a new Bar from saved mmap file..." << std::endl;
        Bar new_bar;
        new_bar.load(f_name, "r");
    
        std::cout << "Loaded Bar data:" << std::endl;
        new_bar.print();
    
        return 0;
    }
    

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by weiliw-amz 3
  • Is there at least one example showing how to use Pecos from a plain text dataset?

    Is there at least one example showing how to use Pecos from a plain text dataset?

    It has been difficult to infer how to use the PECOS properly. The usage case is splited over several README.md files and through the issues.

    Then, could you provide a toy example of an end-to-end approach (using XR-Transformer for instance)?

    Consider the following scenario: We have the training and testing samples in plain text

    #train samples:
        text: raw_text_1, labels: [L1, L7, ..., L3]
        text: raw_text_2, labels: [L8, L9]
        ...
        text: raw_text_N, labels: [L1, L7, ..., L4]
    
    #test samples:
        text: test_raw_text_1
        text: test_raw_text_2
        ...
        text: test_raw_text_M
    

    and someone has to:

    1. prepare the data to the accepted format;
    2. train the model;
    3. predict the top k labels.
    opened by celsofranssa 3
  • bug of installing from source

    bug of installing from source

    Description

    there is sonme problems when install pecos from source according to readme.md

    How to Reproduce?

    python3 -m pip install --editable ./ Obtaining file:///home/workspace/lishengchao/pecos Requirement already satisfied: scipy>=1.4.1 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (1.6.1) Requirement already satisfied: scikit-learn>=0.24.1 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (0.24.1) Requirement already satisfied: torch>=1.8.0 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (1.8.0) Collecting sentencepiece!=0.1.92,>=0.1.86 Using cached https://repo.huaweicloud.com/repository/pypi/packages/68/91/ded0f64f90abfc5413c620fc345a0aef1e7ff5addda8704cc6b3bf589c64/sentencepiece-0.1.96-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB) Requirement already satisfied: transformers>=4.1.1 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (4.8.2) Collecting numpy>=1.19.5 Using cached https://repo.huaweicloud.com/repository/pypi/packages/38/c0/c45c5eb0e25247d5fbb333fd0b56e570ba21cf0e3dca3abad174fb780e8c/numpy-1.22.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.8/site-packages (from scikit-learn>=0.24.1->libpecos==0.3.0) (2.1.0) Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.8/site-packages (from scikit-learn>=0.24.1->libpecos==0.3.0) (1.0.1) Requirement already satisfied: typing_extensions in /opt/conda/lib/python3.8/site-packages (from torch>=1.8.0->libpecos==0.3.0) (3.7.4.3) Collecting huggingface-hub==0.0.12 Downloading https://repo.huaweicloud.com/repository/pypi/packages/2f/ee/97e253668fda9b17e968b3f97b2f8e53aa0127e8807d24a547687423fe0b/huggingface_hub-0.0.12-py3-none-any.whl (37 kB) Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (2021.4.4) Requirement already satisfied: sacremoses in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (0.0.45) Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (2.24.0) Requirement already satisfied: packaging in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (21.3) Requirement already satisfied: tokenizers<0.11,>=0.10.1 in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (0.10.3) Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (4.62.3) Requirement already satisfied: filelock in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (3.0.12) Requirement already satisfied: pyyaml in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (5.4.1) Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from sacremoses->transformers>=4.1.1->libpecos==0.3.0) (1.15.0) Requirement already satisfied: click in /opt/conda/lib/python3.8/site-packages (from sacremoses->transformers>=4.1.1->libpecos==0.3.0) (7.1.2) Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (2.10) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (1.25.11) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (2020.12.5) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging->transformers>=4.1.1->libpecos==0.3.0) (3.0.6) Installing collected packages: sentencepiece, numpy, libpecos, huggingface-hub Attempting uninstall: numpy Found existing installation: numpy 1.19.2 Uninstalling numpy-1.19.2: Successfully uninstalled numpy-1.19.2 Running setup.py develop for libpecos ERROR: Command errored out with exit status 1: command: /opt/conda/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/workspace/lishengchao/pecos/setup.py'"'"'; file='"'"'/home/workspace/lishengchao/pecos/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps cwd: /home/workspace/lishengchao/pecos/ Complete output (28 lines): Set version to 0.3.0 running develop running egg_info creating libpecos.egg-info writing libpecos.egg-info/PKG-INFO writing dependency_links to libpecos.egg-info/dependency_links.txt writing requirements to libpecos.egg-info/requires.txt writing top-level names to libpecos.egg-info/top_level.txt writing manifest file 'libpecos.egg-info/SOURCES.txt' reading manifest file 'libpecos.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.c' under directory 'pecos/core' writing manifest file 'libpecos.egg-info/SOURCES.txt' running build_ext building 'pecos.core.libpecos_float32' extension INFO: C compiler: gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

    creating build
    creating build/temp.linux-x86_64-3.8
    creating build/temp.linux-x86_64-3.8/pecos
    creating build/temp.linux-x86_64-3.8/pecos/core
    INFO: compile options: '-Ipecos/core -I/usr/include/ -I/usr/local/include -I/opt/conda/include/python3.8 -c'
    extra options: '-fopenmp -O3 -std=c++14'
    INFO: gcc: pecos/core/libpecos.cpp
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    /tmp/ccNJQf5g.s: Assembler messages:
    /tmp/ccNJQf5g.s: Fatal error: can't close build/temp.linux-x86_64-3.8/pecos/core/libpecos.o: Input/output error
    error: Command "gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ipecos/core -I/usr/include/ -I/usr/local/include -I/opt/conda/include/python3.8 -c pecos/core/libpecos.cpp -o build/temp.linux-x86_64-3.8/pecos/core/libpecos.o -fopenmp -O3 -std=c++14" failed with exit status 1
    ----------------------------------------
    

    ERROR: Command errored out with exit status 1: /opt/conda/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/workspace/lishengchao/pecos/setup.py'"'"'; file='"'"'/home/workspace/lishengchao/pecos/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.

    Environment

    • Ubtuntu 18.04
    • Python 3.8
    • PECOS 0.3.0

    (Add as much information about your environment as possible, e.g. dependencies versions.)

    bug 
    opened by xiaokening 3
  • Memory-mapped XLinear Model

    Memory-mapped XLinear Model

    Issue #, if available: N/A

    Description of changes:

    • Memory-mapped PECOS XLinear model
      • Greatly reduce loading time.
      • Ideal for large models that user want to quickly try a few inferences without waiting for loading full model into memory.
      • Also capable for large model inference that could not be stored in memory.

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by weiliw-amz 0
Releases(v0.4.0)
  • v0.4.0(Aug 9, 2022)

    Highlights

    • Enable distributed XR-Transformer fine-tuning
    • Enable the capability of large-batch prediction for ANN HNSW
    • Release interactive hands-on tutorial materials

    Enhancements

    • Unit test for sorted_csc, sorted_csr by @chepingt in https://github.com/amzn/pecos/pull/139
    • Unit test for csr_row_softmax by @houyuhan98 in https://github.com/amzn/pecos/pull/141
    • Bump numpy from 1.21.0 to 1.22.0 by @dependabot in https://github.com/amzn/pecos/pull/145 https://github.com/amzn/pecos/pull/146
    • Release the materials for the PECOS hands-on tutorial in KDD 2022 by @hallogameboy in https://github.com/amzn/pecos/pull/153 https://github.com/amzn/pecos/pull/154 https://github.com/amzn/pecos/pull/161
    • Enable the capability of large-batch prediction for HNSW by @OctoberChang in https://github.com/amzn/pecos/pull/156
    • Distributed XR-Transformer fine-tuning by @jiong-zhang in https://github.com/amzn/pecos/pull/144 https://github.com/amzn/pecos/pull/162

    Bug Fixes

    • Fix argument-passing issue in smat_util.sorted_csc by @jiong-zhang in https://github.com/amzn/pecos/pull/134
    • Fix indptr overflow issue in block_diag_csr() by @OctoberChang in https://github.com/amzn/pecos/pull/136
    • Fix the yum group install command in README by @hallogameboy in https://github.com/amzn/pecos/pull/138
    • Change file names for windows compatibility by @YangyiLi001 in https://github.com/amzn/pecos/pull/143
    • Avoid triggering CodeQL on push for Dependabot branches by @weiliw-amz in https://github.com/amzn/pecos/pull/148
    • Fix Pypi release version error by @weiliw-amz in https://github.com/amzn/pecos/pull/163

    Deprecation

    • Deprecate imbalanced hierarchical K-means from clustering and semantic indexing by @hallogameboy in https://github.com/amzn/pecos/pull/151

    New Contributors

    • @chepingt made their first contribution in https://github.com/amzn/pecos/pull/139
    • @houyuhan98 made their first contribution in https://github.com/amzn/pecos/pull/141
    • @YangyiLi001 made their first contribution in https://github.com/amzn/pecos/pull/143
    • @xiusic made their first contribution in https://github.com/amzn/pecos/pull/147

    Full Changelog: https://github.com/amzn/pecos/compare/v0.3.0...v0.4.0

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Apr 1, 2022)

    Highlights

    • Enable distributed training for XLinear
    • Enable PECOS for aarch64(arm64) CPU Architecture
    • Enhance pecos.ann.hnsw with Function Multi-Versioning (FMV) technique to automatically select the best supported SIMD instructions (SSE, AVX2, AVX512) at runtime
    • Reduce CPU memory usage in pecos.xmc.xtransformer training

    Enhancements

    • Add distilbert model. by @mo-fu in https://github.com/amzn/pecos/pull/97
    • add CNAME by @jiong-zhang in https://github.com/amzn/pecos/pull/104
    • Bump numpy from 1.20.3 to 1.21.0 in /examples/qp2q by @dependabot in https://github.com/amzn/pecos/pull/110
    • enable Function Multi-Versioning (FMV) to support AVX512 by @rofuyu in https://github.com/amzn/pecos/pull/111
    • Modify supported Python version by @weiliw-amz in https://github.com/amzn/pecos/pull/113
    • Enabling PECOS for aarch64(arm64) CPU Architecture by @weiliw-amz in https://github.com/amzn/pecos/pull/114
    • Update OpenBLAS Version for x86 Wheel Build by @weiliw-amz in https://github.com/amzn/pecos/pull/117
    • SIMD Functions for aarch64(ARM64) by @weiliw-amz in https://github.com/amzn/pecos/pull/115
    • Add profile_util module by @weiliw-amz in https://github.com/amzn/pecos/pull/121
    • Fix FMV setup link flag and add test wheel CI by @weiliw-amz in https://github.com/amzn/pecos/pull/119
    • Fix xlinear.reconstruct_model; Add PII embedding by @weiliw-amz in https://github.com/amzn/pecos/pull/120
    • Add Distributed PECOS XLinear Modules by @weiliw-amz in https://github.com/amzn/pecos/pull/123
    • Add distributed PECOS README by @weiliw-amz in https://github.com/amzn/pecos/pull/127
    • update HNSW README and save/load in Python API by @OctoberChang in https://github.com/amzn/pecos/pull/129
    • Improve XR-Transformer memory efficiency by @jiong-zhang in https://github.com/amzn/pecos/pull/128

    Bug Fixes

    • properly set Text2Text prediction argument by @OctoberChang in https://github.com/amzn/pecos/pull/101
    • Fix HiearchicalMLModel pred-params initialization and add bugs by @weiliw-amz in https://github.com/amzn/pecos/pull/103
    • minor bug fix in XR-Transformer exp script by @jiong-zhang in https://github.com/amzn/pecos/pull/106
    • fixed multithreading bugs in py hierarchical kmeans by @OctoberChang in https://github.com/amzn/pecos/pull/108
    • set pytest of hierarchical kmeans with single thread by @OctoberChang in https://github.com/amzn/pecos/pull/109
    • Fix relative path in distributed README by @weiliw-amz in https://github.com/amzn/pecos/pull/130

    Experiment Codes for Publications

    • add overlap-clustering (Liu et al.) in NeurIPS21 by @xuanqing94 in https://github.com/amzn/pecos/pull/98
    • add MACLR codes by @xyh97 in https://github.com/amzn/pecos/pull/100
    • update experiment code for pecos jmlr paper by @OctoberChang in https://github.com/amzn/pecos/pull/107
    • update Philip's experiment code into example folder by @OctoberChang in https://github.com/amzn/pecos/pull/118

    New Contributors

    • @mo-fu made their first contribution in https://github.com/amzn/pecos/pull/97
    • @xuanqing94 made their first contribution in https://github.com/amzn/pecos/pull/98
    • @xyh97 made their first contribution in https://github.com/amzn/pecos/pull/100
    • @dependabot made their first contribution in https://github.com/amzn/pecos/pull/110

    Full Changelog: https://github.com/amzn/pecos/compare/v0.2.3...v0.3.0

    Source code(tar.gz)
    Source code(zip)
  • v0.2.3(Nov 15, 2021)

  • v0.2.2(Nov 4, 2021)

  • v0.2.1(Oct 27, 2021)

    Highlights

    • Remove support of Ubuntu 16.04
    • Implemented XR-Transformer
    • Enabled HNSW functionality
    • Enabled cost-sensitive learning in PECOS

    Enhancements

    ANN HNSW

    • Initial implementation of HNSW in C++ with single-thread [#44] (@OctoberChang)
    • Refactor HNSW in C++ to support sparse/dense features and multi-threading [#49] (@rofuyu)
    • Initial implementation of HNSW Python interface [#53] (@OctoberChang)
    • Refactor HNSW python API and readme markdown [#63] (@OctoberChang)
    • Refactor HNSW C++ to reuse priority queue for different inference calls within the same Searcher [#65] (@rofuyu)
    • Enable HNSW save/load functionality [#71] (@OctoberChang)
    • Add serialization version in HNSW save/load [#77] (@rofuyu)
    • Enable HNSW python command line interface [#79] (@OctoberChang)

    Cost-sensitive Learning

    • Enable Cost-Sensitive Learning via XLinear API/CLI [#64] (@jiong-zhang)
    • Enable cost sensitive for text2text CLI [#75] (@jiong-zhang)

    XR-Transformer [#27, #64] (@jiong-zhang)

    • Refactor pecos.xmc.xtransformer and enable end2end XR-Transformer training
    • CLI tool for generating embeddings pecos.xmc.xtransformer.encode
    • Faster transformer text tokenizers using huggingface's C implementation
    • Allow training XR-Transformer without numerical features.

    Better control over parameters for XLinear, XTransformer and Text2text [#64, #78, #80] (@jiong-zhang)

    • Enable advanced control of parameters via JSON input file
    • Add utility tool to generate parameter skeleton for further modification

    Other new functionalities

    • Added support for predicting on select outputs [#37, #43, #47] (@bhl00)
    • Added new primal solver L2R_L2LOSS_SVC_PRIMAL for XLinear [#67] (@yuhchenlin)
    • Add Makefile for easy format, install, clean and unittest. [#12] (@weiliw-amz)

    Bug Fixes

    • (#17) Fixed issues with github information obtaining when installing from .zip. [#21, #29] (@weiliw-amz)
    • (#42) Fixed transformer training issue on single GPU [#14] (@jiong-zhang)
    • Removed PECOS source-installation dependency on NumPy BLAS library. [#81] (@weili-amz)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Apr 26, 2021)

Owner
Amazon
Amazon
PyTorch-centric library for evaluating and enhancing the robustness of AI technologies

Responsible AI Toolbox A library that provides high-quality, PyTorch-centric tools for evaluating and enhancing both the robustness and the explainabi

24 Dec 22, 2022
ONNX-PackNet-SfM: Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Ibai Gorordo 14 Dec 09, 2022
The official implementation of the research paper "DAG Amendment for Inverse Control of Parametric Shapes"

DAG Amendment for Inverse Control of Parametric Shapes This repository is the official Blender implementation of the paper "DAG Amendment for Inverse

Elie Michel 157 Dec 26, 2022
RIM: Reliable Influence-based Active Learning on Graphs.

RIM: Reliable Influence-based Active Learning on Graphs. This repository is the official implementation of RIM. Requirements To install requirements:

Wentao Zhang 4 Aug 29, 2022
Breaching - Breaching privacy in federated learning scenarios for vision and text

Breaching - A Framework for Attacks against Privacy in Federated Learning This P

Jonas Geiping 139 Jan 03, 2023
This is a custom made virus code in python, using tkinter module.

skeleterrorBetaV0.1-Virus-code This is a custom made virus code in python, using tkinter module. This virus is not harmful to the computer, it only ma

AR 0 Nov 21, 2022
Collection of in-progress libraries for entity neural networks.

ENN Incubator Collection of in-progress libraries for entity neural networks: Neural Network Architectures for Structured State Entity Gym: Abstractio

25 Dec 01, 2022
The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization [Paper] accepted at the EMNLP 2021: Vision Guided Genera

CAiRE 42 Jan 07, 2023
FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection

FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection arXi

59 Nov 29, 2022
A Distributional Approach To Controlled Text Generation

A Distributional Approach To Controlled Text Generation This is the repository code for the ICLR 2021 paper "A Distributional Approach to Controlled T

NAVER 102 Jan 07, 2023
KaziText is a tool for modelling common human errors.

KaziText KaziText is a tool for modelling common human errors. It estimates probabilities of individual error types (so called aspects) from grammatic

ÚFAL 3 Nov 24, 2022
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

Squirrel Core Share, load, and transform data in a collaborative, flexible, and efficient way What is Squirrel? Squirrel is a Python library that enab

Merantix Momentum 249 Dec 07, 2022
GBIM(Gesture-Based Interaction map)

手势交互地图 GBIM(Gesture-Based Interaction map),基于视觉深度神经网络的交互地图,通过电脑摄像头观察使用者的手势变化,进而控制地图进行简单的交互。网络使用PaddleX提供的轻量级模型PPYOLO Tiny以及MobileNet V3 small,使得整个模型大小约10MB左右,即使在CPU下也能快速定位和识别手势。

8 Feb 10, 2022
Deep Image Matting implementation in PyTorch

Deep Image Matting Deep Image Matting paper implementation in PyTorch. Differences "fc6" is dropped. Indices pooling. "fc6" is clumpy, over 100 millio

Yang Liu 724 Dec 27, 2022
Genshin-assets - 👧 Public documentation & static assets for Genshin Impact data.

genshin-assets This repo provides easy access to the Genshin Impact assets, primarily for use on static sites. Sources Genshin Optimizer - An Artifact

Zerite Development 5 Nov 22, 2022
Convert BART models to ONNX with quantization. 3X reduction in size, and upto 3X boost in inference speed

fast-Bart Reduction of BART model size by 3X, and boost in inference speed up to 3X BART implementation of the fastT5 library (https://github.com/Ki6a

Siddharth Sharma 19 Dec 09, 2022
This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

RGB2NIR_Experimental This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models

5 Jan 04, 2023
A Python training and inference implementation of Yolov5 helmet detection in Jetson Xavier nx and Jetson nano

yolov5-helmet-detection-python A Python implementation of Yolov5 to detect head or helmet in the wild in Jetson Xavier nx and Jetson nano. In Jetson X

12 Dec 05, 2022
PyTorch implementation of Pay Attention to MLPs

gMLP PyTorch implementation of Pay Attention to MLPs. Quickstart Clone this repository. git clone https://github.com/jaketae/g-mlp.git Navigate to th

Jake Tae 34 Dec 13, 2022
PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition. Transformer models are good at capturing content-based

Soohwan Kim 565 Jan 04, 2023