PECOS - Prediction for Enormous and Correlated Spaces

Overview

PECOS - Predictions for Enormous and Correlated Output Spaces

PyPi Latest Release License

PECOS is a versatile and modular machine learning (ML) framework for fast learning and inference on problems with large output spaces, such as extreme multi-label ranking (XMR) and large-scale retrieval. PECOS' design is intentionally agnostic to the specific nature of the inputs and outputs as it is envisioned to be a general-purpose framework for multiple distinct applications.

Given an input, PECOS identifies a small set (10-100) of relevant outputs from amongst an extremely large (~100MM) candidate set and ranks these outputs in terms of relevance.

Features

Extreme Multi-label Ranking and Classification

  • X-Linear (pecos.xmc.xlinear): recursive linear models learning to traverse an input from the root of a hierarchical label tree to a few leaf node clusters, and return top-k relevant labels within the clusters as predictions. See more details in the PECOS paper (Yu et al., 2020).

    • fast real-time inference in C++
    • can handle 100MM output space
  • XR-Transformer (pecos.xmc.xtransformer): Transformer based XMC framework that fine-tunes pre-trained transformers recursively on multi-resolution objectives. It can be used to generate top-k relevant labels for a given instance or simply as a fine-tuning engine for task aware embeddings. See technical details in XR-Transformer paper (Zhang et al., 2021).

    • easy to extend with many pre-trained Transformer models from huggingface transformers.
    • establishes the State-of-the-art on public XMC benchmarks.
  • ANN Search with HNSW (pecos.ann.hnsw): a PECOS Approximated Nearest Neighbor (ANN) search module that implements the Hierarchical Navigable Small World Graphs (HNSW) algorithm (Malkov et al., TPAMI 2018).

    • Supports both sparse and dense input features
    • SIMD optimization for both dense/sparse distance computation
    • Supports thread-safe graph construction in parallel on multi-core shared memory machines
    • Supports thread-safe Searchers to do inference in parallel, which reduces inference overhead

Requirements and Installation

  • Python (>=3.6)
  • Pip (>=19.3)

See other dependencies in setup.py You should install PECOS in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide.

Supporting Platforms

  • Ubuntu 18.04 and 20.04
  • Amazon Linux 2

Installation from Wheel

PECOS can be installed using pip as follows:

python3 -m pip install libpecos

Installation from Source

Prerequisite builder tools

  • For Ubuntu (18.04, 20.04):
sudo apt-get update && sudo apt-get install -y build-essential git python3 python3-distutils python3-venv
  • For Amazon Linux 2 Image:
sudo yum -y install python3 python3-devel python3-distutils python3-venv && sudo yum -y install groupinstall 'Development Tools' 

One needs to install at least one BLAS library to compile PECOS, e.g. OpenBLAS:

  • For Ubuntu (18.04, 20.04):
sudo apt-get install -y libopenblas-dev
  • For Amazon Linux 2 Image and AMI:
sudo amazon-linux-extras install epel -y
sudo yum install openblas-devel -y

Install and develop locally

git clone https://github.com/amzn/pecos
cd pecos
python3 -m pip install --editable ./

Quick Tour

To have a glimpse of how PECOS works, here is a quick tour of using PECOS API for the XMR problem.

Toy Example

The eXtreme Multi-label Ranking (XMR) problem is defined by two matrices

Some toy data matrices are available in the tst-data folder.

PECOS constructs a hierarchical label tree and learns linear models recursively (e.g., XR-Linear):

>>> from pecos.xmc.xlinear.model import XLinearModel
>>> from pecos.xmc import Indexer, LabelEmbeddingFactory

# Build hierarchical label tree and train a XR-Linear model
>>> label_feat = LabelEmbeddingFactory.create(Y, X)
>>> cluster_chain = Indexer.gen(label_feat)
>>> model = XLinearModel.train(X, Y, C=cluster_chain)
>>> model.save("./save-models")

After learning the model, we do prediction and evaluation

>>> from pecos.utils import smat_util
>>> Yt_pred = model.predict(Xt)
# print precision and recall at k=10
>>> print(smat_util.Metrics.generate(Yt, Yt_pred))

PECOS also offers optimized C++ implementation for fast real-time inference

>>> model = XLinearModel.load("./save-models", is_predict_only=True)
>>> for i in range(X_tst.shape[0]):
>>>   y_tst_pred = model.predict(X_tst[i], threads=1)

Citation

If you find PECOS useful, please consider citing the following paper:

Some papers from our group using PECOS:

License

Copyright (2021) Amazon.com, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments
  • text2text model evaluation not working

    text2text model evaluation not working

    Description

    Model evaluation is not working properly to output the precision and recall

    How to Reproduce?

    I run the following line of code,

    python3 -m pecos.apps.text2text.evaluate --pred-path ./test-prediction.txt --truth-path ./test.txt --text-item-path ./output-labels.txt
    

    where, --pred-path is the path of file produced during model prediction, --truth-path is the path of test file, e.g. Out1, Out2, Out3 \t cheap door Out1, Out2 and Out3 are the line number in the the following output file
    --text-item-path ./output-labels.txt

    What have you tried to solve it?

    Error message or code output

    Traceback (most recent call last):
      File "/home/khalid/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/khalid/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/pecos/apps/text2text/evaluate.py", line 130, in <module>
        do_evaluation(args)
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/pecos/apps/text2text/evaluate.py", line 119, in do_evaluation
        Y_true = smat.csr_matrix((val_t, (row_id_t, col_id_t)), shape=(num_samples_t, len(item_dict)))
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/compressed.py", line 55, in __init__
        dtype=dtype))
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/coo.py", line 196, in __init__
        self._check()
      File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/coo.py", line 285, in _check
        raise ValueError('column index exceeds matrix dimensions')
    ValueError: column index exceeds matrix dimensions
    

    Environment

    • Operating system:
    • Python version:
    • PECOS version:

    (Add as much information about your environment as possible, e.g. dependencies versions.)

    bug 
    opened by Khalid-Usman 13
  • Format of yt label

    Format of yt label

    Hello,

    hope you are fine, have 2 questions about the format

    Question 1 Have one question about optimal format for label Yt. Is it preferable to have Yt as:

    (A) OneHot encoded with only one 1 per row. (B) Mutiple OneHot encode with mutiple 1 per rows (as this is the case for Xt).

    When the prediction is done, it seems only outputing only one 1 per row.

    Question 2:

    Is there any constraint by having Xt as having a mix of dense input and sparse input instead of sparse input only ?

    enhancement 
    opened by arita37 7
  • some formatting

    some formatting

    Hi, Thasnks for this.

    Just would like to confirm the format of the input

    X : CSR format x(i,k) = val Can valx be a float ? does it need to be binary or [0,1] value ?

    Y: CSR format, y(i,k) = valy . does it need to be binary ( 0 or 1) ?

    Thx

    opened by arita37 7
  • Online Inference Latency for XR-TRANSFORMER

    Online Inference Latency for XR-TRANSFORMER

    hi!

    When I use XR-TRANSFORMER for predict(per input), the online Inference lattency comes up to 400ms. this is why?

    the system I use is ubtuntu18.04, and XR-TRANSFORMER are evaluated on a Nvidia Tesla V100 GPU.

    Thanks!

    opened by xiaokening 6
  • Issue with --label-embed-type pifa_lf_concat::Z=${Z_pifa_file}

    Issue with --label-embed-type pifa_lf_concat::Z=${Z_pifa_file}

    Description

    I am trying to use ----label-embed-type parameter in the training and it produces this error. ValueError: Object arrays cannot be loaded when allow_pickle=False - Coming from np.load() function.

    I have tested loading of NPZ file for z_labels (compressed and uncompressed, both) it produces this error if allow_pickle=False I have load data by defining the allow_pickle=True for np.load() function.

    Can you please add description of this file format or can we sent this parameter as an input?

    This is the data I have after loading npz file with allow_pickle = True

    [array(['Trump', 'Bus', 'Trolly '], dtype='<U23')
     array(['Show', 'Disp'], dtype='<U20')
     array(['Recap rew'], dtype='<U24')
     array(['Core, '], dtype='<U32')
     array(['Hoe'], dtype='<U10')
     array(['Plan'], dtype='<U21')]
    

    How to Reproduce?

    Execute model training with numpy version 1.21.2

    python -m pecos.apps.text2text.train \
      --label-embed-type pifa_lf_concat::Z=${Z_pifa_file} \
      -i ${train_file} \ 
      -m ${model_folder}
    

    Error message or code output

    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/jupyter/pecos_git/pecos/pecos/apps/text2text/train.py", line 311, in <module>
        train(args)
      File "/home/jupyter/pecos_git/pecos/pecos/apps/text2text/train.py", line 302, in train
        workspace_folder=args.workspace_folder,
      File "/home/jupyter/pecos_git/pecos/pecos/apps/text2text/model.py", line 325, in train
        Z = smat_util.load_matrix(val)
      File "/home/jupyter/pecos_git/pecos/pecos/utils/smat_util.py", line 117, in load_matrix
        mat = np.load(src)
      File "/opt/conda/lib/python3.7/site-packages/numpy/lib/npyio.py", line 441, in load
        pickle_kwargs=pickle_kwargs)
      File "/opt/conda/lib/python3.7/site-packages/numpy/lib/format.py", line 743, in read_array
        raise ValueError("Object arrays cannot be loaded when "
    ValueError: Object arrays cannot be loaded when allow_pickle=False
    

    Environment

    • Operating system: Unix Ubuntu (on GCP)
    • Python version: 3.8
    • PECOS version: 0.1.0
    • numpy version: 1.21.2
    bug 
    opened by zusmani 6
  • Pecos killed on ranker training step

    Pecos killed on ranker training step

    Description

    The training has been killed on this training step:

    Data: Amazon-670k Model: X-Transformer

    [2022-12-01 21:38:23,019][pecos.xmc.xtransformer.model][INFO] - Start training ranker...
    [2022-12-01 21:38:24,001][pecos.xmc.base][INFO] - Training Layer 0 of 4 Layers in HierarchicalMLModel, neg_mining=tfn..
    [2022-12-01 21:39:05,191][pecos.xmc.base][INFO] - Training Layer 1 of 4 Layers in HierarchicalMLModel, neg_mining=tfn..
    [2022-12-01 21:40:24,829][pecos.xmc.base][INFO] - Training Layer 2 of 4 Layers in HierarchicalMLModel, neg_mining=tfn..
    [2022-12-01 21:43:25,293][pecos.xmc.base][INFO] - Training Layer 3 of 4 Layers in HierarchicalMLModel, neg_mining=tfn+man..
    

    Environment

    Distributor ID:	Ubuntu
    Description:	Ubuntu 18.04.6 LTS
    Release:	18.04
    Codename:	bionic
    Python 3.8.15
    libpecos~=0.4.0
    1 RTX A4500, 32 vCPU, and 250 GB RAM
    

    What could it be? Is it possible to resume training from that stage?

    bug 
    opened by celsofranssa 4
  • How to Use XR-Transformer in Text2Text App

    How to Use XR-Transformer in Text2Text App

    Description

    I want to use XR-Transformer in text2text app, following the parameters given here. But setting --params-path to this .json file raise the error:

    Traceback (most recent call last):
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/runpy.py", line 197, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/train.py", line 345, in <module>
        train(args)
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/train.py", line 328, in train
        t2t_model = Text2Text.train(
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/model.py", line 317, in train
        pred_params = pred_params.override_with_kwargs(kwargs)
      File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/model.py", line 126, in override_with_kwargs
        self.xlinear_params.override_with_kwargs(pred_kwargs)
    AttributeError: 'NoneType' object has no attribute 'override_with_kwargs'
    

    References

    enhancement 
    opened by lyy1994 4
  • Examples with text

    Examples with text

    Description

    Current example of X and Y only has numeric values. Could you please provide one example where X and Y are both text? Think the paper/method is targeted to solve such problems.

    enhancement 
    opened by xyan326 4
  • Add memory-mapped utilility module

    Add memory-mapped utilility module

    Issue #, if available: N/A

    Description of changes: Add memory-mapped utilility module.

    User could test with below code: Copy it into a filetest_mmap_util.cpp placed at pecos/core/util/, and run:

    gcc -lm -ldl -lstdc++ -fopenmp -std=c++14 -lgcc -lgomp -O3  -I ./ test_mmap_util.cpp
    ./a.out
    

    Output:

    Generate a Bar with data:
    ---Bar---
    ---Foo---
    foo_1: 0 1 2 3 4 5 6 7 8 9 
    foo_2: 1
    ---------
    bar: 5 5 5 5 5 
    ---------
    Save Bar into mmap file: ./bar_test_mmap.txt
    Load a new Bar from saved mmap file...
    Loaded Bar data:
    ---Bar---
    ---Foo---
    foo_1: 0 1 2 3 4 5 6 7 8 9 
    foo_2: 1
    ---------
    bar: 5 5 5 5 5 
    ---------
    
    #include <iostream>
    #include "mmap_util.hpp"
    
    using namespace pecos::mmap_util;
    
    // Nested class mmap example
    // Bar contains a Foo instance
    class Foo {
        public:
            Foo() {}
            ~Foo() { foo_1.clear(); }
    
            void init_data() {
                foo_1.resize(10, 0);
                for (int i=0; i<foo_1.size(); ++i) { foo_1[i] = i; }
                foo_2 = 1.0;
            }
    
            void print() {
                std::cout << "---Foo---" << std::endl;
                std::cout << "foo_1: ";
                for (int i=0; i<foo_1.size(); ++i) { std::cout << foo_1[i] << " "; }
                std::cout << std::endl;
                std::cout << "foo_2: " << foo_2 << std::endl;
                std::cout << "---------" << std::endl;
            }
    
            void save_to_mmap_store(MmapStore& mmap_s) const {
                foo_1.save_to_mmap_store(mmap_s);
                mmap_s.fput_one<double>(foo_2);
            }
    
            void load_from_mmap_store(MmapStore& mmap_s) {
                foo_1.load_from_mmap_store(mmap_s);
                foo_2 = mmap_s.fget_one<double>();
            }
    
        private:
            MmapableVector<int> foo_1;
            double foo_2;
    };
    
    class Bar {
        public:
            Bar() { }
            ~Bar() { bar.clear(); mmap_store.close(); }
    
            void init_data() {
                foo.init_data();
                bar.resize(5, 0);
                for (int i=0; i<bar.size(); ++i) { bar[i] = 5.0; }
            }
    
            void print() {
                std::cout << "---Bar---" << std::endl;
                foo.print();
                std::cout << "bar: ";
                for (int i=0; i<bar.size(); ++i) { std::cout << bar[i] << " "; }
                std::cout << std::endl;
                std::cout << "---------" << std::endl;
            }
    
            void save(const std::string & file_name) const {
                // Create a mmapfile for dump at the most outer layer class
                // You cannot reuse (i.e, close and reopen) mmap_store, since it may hold the data storage
                MmapStore mmap_s = MmapStore();
                mmap_s.open(file_name, "w");
    
                save_to_mmap_store(mmap_s);
    
                // Metadata dump and fp closure is automatically done at MmapStore destructor when this function ends
                // You can make it happen earlier with explicitly calling close()
                mmap_s.close();
            }
            void load(const std::string & file_name, const bool pre_load=true) {
                mmap_store.open(file_name, pre_load?"r":"r_lazy");
                load_from_mmap_store(mmap_store);
            }
    
            void save_to_mmap_store(MmapStore& mmap_s) const {
                foo.save_to_mmap_store(mmap_s);
                bar.save_to_mmap_store(mmap_s);
            }
    
            void load_from_mmap_store(MmapStore& mmap_s) {
                foo.load_from_mmap_store(mmap_s);
                bar.load_from_mmap_store(mmap_s);
            }
    
        private:
            Foo foo;
            MmapableVector<double> bar;
            // Mmap Data storage at the most outer layer class
            MmapStore mmap_store;
    };
    
    
    int main() {
        std::string f_name = "./bar_test_mmap.txt";
    
        std::cout << "Generate a Bar with data:" << std::endl;
        Bar bar;
        bar.init_data();
        bar.print();
    
        std::cout << "Save Bar into mmap file: " << f_name << std::endl;
        bar.save(f_name);
    
        std::cout << "Load a new Bar from saved mmap file..." << std::endl;
        Bar new_bar;
        new_bar.load(f_name, "r");
    
        std::cout << "Loaded Bar data:" << std::endl;
        new_bar.print();
    
        return 0;
    }
    

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by weiliw-amz 3
  • Is there at least one example showing how to use Pecos from a plain text dataset?

    Is there at least one example showing how to use Pecos from a plain text dataset?

    It has been difficult to infer how to use the PECOS properly. The usage case is splited over several README.md files and through the issues.

    Then, could you provide a toy example of an end-to-end approach (using XR-Transformer for instance)?

    Consider the following scenario: We have the training and testing samples in plain text

    #train samples:
        text: raw_text_1, labels: [L1, L7, ..., L3]
        text: raw_text_2, labels: [L8, L9]
        ...
        text: raw_text_N, labels: [L1, L7, ..., L4]
    
    #test samples:
        text: test_raw_text_1
        text: test_raw_text_2
        ...
        text: test_raw_text_M
    

    and someone has to:

    1. prepare the data to the accepted format;
    2. train the model;
    3. predict the top k labels.
    opened by celsofranssa 3
  • bug of installing from source

    bug of installing from source

    Description

    there is sonme problems when install pecos from source according to readme.md

    How to Reproduce?

    python3 -m pip install --editable ./ Obtaining file:///home/workspace/lishengchao/pecos Requirement already satisfied: scipy>=1.4.1 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (1.6.1) Requirement already satisfied: scikit-learn>=0.24.1 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (0.24.1) Requirement already satisfied: torch>=1.8.0 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (1.8.0) Collecting sentencepiece!=0.1.92,>=0.1.86 Using cached https://repo.huaweicloud.com/repository/pypi/packages/68/91/ded0f64f90abfc5413c620fc345a0aef1e7ff5addda8704cc6b3bf589c64/sentencepiece-0.1.96-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB) Requirement already satisfied: transformers>=4.1.1 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (4.8.2) Collecting numpy>=1.19.5 Using cached https://repo.huaweicloud.com/repository/pypi/packages/38/c0/c45c5eb0e25247d5fbb333fd0b56e570ba21cf0e3dca3abad174fb780e8c/numpy-1.22.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.8/site-packages (from scikit-learn>=0.24.1->libpecos==0.3.0) (2.1.0) Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.8/site-packages (from scikit-learn>=0.24.1->libpecos==0.3.0) (1.0.1) Requirement already satisfied: typing_extensions in /opt/conda/lib/python3.8/site-packages (from torch>=1.8.0->libpecos==0.3.0) (3.7.4.3) Collecting huggingface-hub==0.0.12 Downloading https://repo.huaweicloud.com/repository/pypi/packages/2f/ee/97e253668fda9b17e968b3f97b2f8e53aa0127e8807d24a547687423fe0b/huggingface_hub-0.0.12-py3-none-any.whl (37 kB) Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (2021.4.4) Requirement already satisfied: sacremoses in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (0.0.45) Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (2.24.0) Requirement already satisfied: packaging in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (21.3) Requirement already satisfied: tokenizers<0.11,>=0.10.1 in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (0.10.3) Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (4.62.3) Requirement already satisfied: filelock in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (3.0.12) Requirement already satisfied: pyyaml in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (5.4.1) Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from sacremoses->transformers>=4.1.1->libpecos==0.3.0) (1.15.0) Requirement already satisfied: click in /opt/conda/lib/python3.8/site-packages (from sacremoses->transformers>=4.1.1->libpecos==0.3.0) (7.1.2) Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (2.10) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (1.25.11) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (2020.12.5) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging->transformers>=4.1.1->libpecos==0.3.0) (3.0.6) Installing collected packages: sentencepiece, numpy, libpecos, huggingface-hub Attempting uninstall: numpy Found existing installation: numpy 1.19.2 Uninstalling numpy-1.19.2: Successfully uninstalled numpy-1.19.2 Running setup.py develop for libpecos ERROR: Command errored out with exit status 1: command: /opt/conda/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/workspace/lishengchao/pecos/setup.py'"'"'; file='"'"'/home/workspace/lishengchao/pecos/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps cwd: /home/workspace/lishengchao/pecos/ Complete output (28 lines): Set version to 0.3.0 running develop running egg_info creating libpecos.egg-info writing libpecos.egg-info/PKG-INFO writing dependency_links to libpecos.egg-info/dependency_links.txt writing requirements to libpecos.egg-info/requires.txt writing top-level names to libpecos.egg-info/top_level.txt writing manifest file 'libpecos.egg-info/SOURCES.txt' reading manifest file 'libpecos.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.c' under directory 'pecos/core' writing manifest file 'libpecos.egg-info/SOURCES.txt' running build_ext building 'pecos.core.libpecos_float32' extension INFO: C compiler: gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

    creating build
    creating build/temp.linux-x86_64-3.8
    creating build/temp.linux-x86_64-3.8/pecos
    creating build/temp.linux-x86_64-3.8/pecos/core
    INFO: compile options: '-Ipecos/core -I/usr/include/ -I/usr/local/include -I/opt/conda/include/python3.8 -c'
    extra options: '-fopenmp -O3 -std=c++14'
    INFO: gcc: pecos/core/libpecos.cpp
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    /tmp/ccNJQf5g.s: Assembler messages:
    /tmp/ccNJQf5g.s: Fatal error: can't close build/temp.linux-x86_64-3.8/pecos/core/libpecos.o: Input/output error
    error: Command "gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ipecos/core -I/usr/include/ -I/usr/local/include -I/opt/conda/include/python3.8 -c pecos/core/libpecos.cpp -o build/temp.linux-x86_64-3.8/pecos/core/libpecos.o -fopenmp -O3 -std=c++14" failed with exit status 1
    ----------------------------------------
    

    ERROR: Command errored out with exit status 1: /opt/conda/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/workspace/lishengchao/pecos/setup.py'"'"'; file='"'"'/home/workspace/lishengchao/pecos/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.

    Environment

    • Ubtuntu 18.04
    • Python 3.8
    • PECOS 0.3.0

    (Add as much information about your environment as possible, e.g. dependencies versions.)

    bug 
    opened by xiaokening 3
  • Memory-mapped XLinear Model

    Memory-mapped XLinear Model

    Issue #, if available: N/A

    Description of changes:

    • Memory-mapped PECOS XLinear model
      • Greatly reduce loading time.
      • Ideal for large models that user want to quickly try a few inferences without waiting for loading full model into memory.
      • Also capable for large model inference that could not be stored in memory.

    By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

    opened by weiliw-amz 0
Releases(v0.4.0)
  • v0.4.0(Aug 9, 2022)

    Highlights

    • Enable distributed XR-Transformer fine-tuning
    • Enable the capability of large-batch prediction for ANN HNSW
    • Release interactive hands-on tutorial materials

    Enhancements

    • Unit test for sorted_csc, sorted_csr by @chepingt in https://github.com/amzn/pecos/pull/139
    • Unit test for csr_row_softmax by @houyuhan98 in https://github.com/amzn/pecos/pull/141
    • Bump numpy from 1.21.0 to 1.22.0 by @dependabot in https://github.com/amzn/pecos/pull/145 https://github.com/amzn/pecos/pull/146
    • Release the materials for the PECOS hands-on tutorial in KDD 2022 by @hallogameboy in https://github.com/amzn/pecos/pull/153 https://github.com/amzn/pecos/pull/154 https://github.com/amzn/pecos/pull/161
    • Enable the capability of large-batch prediction for HNSW by @OctoberChang in https://github.com/amzn/pecos/pull/156
    • Distributed XR-Transformer fine-tuning by @jiong-zhang in https://github.com/amzn/pecos/pull/144 https://github.com/amzn/pecos/pull/162

    Bug Fixes

    • Fix argument-passing issue in smat_util.sorted_csc by @jiong-zhang in https://github.com/amzn/pecos/pull/134
    • Fix indptr overflow issue in block_diag_csr() by @OctoberChang in https://github.com/amzn/pecos/pull/136
    • Fix the yum group install command in README by @hallogameboy in https://github.com/amzn/pecos/pull/138
    • Change file names for windows compatibility by @YangyiLi001 in https://github.com/amzn/pecos/pull/143
    • Avoid triggering CodeQL on push for Dependabot branches by @weiliw-amz in https://github.com/amzn/pecos/pull/148
    • Fix Pypi release version error by @weiliw-amz in https://github.com/amzn/pecos/pull/163

    Deprecation

    • Deprecate imbalanced hierarchical K-means from clustering and semantic indexing by @hallogameboy in https://github.com/amzn/pecos/pull/151

    New Contributors

    • @chepingt made their first contribution in https://github.com/amzn/pecos/pull/139
    • @houyuhan98 made their first contribution in https://github.com/amzn/pecos/pull/141
    • @YangyiLi001 made their first contribution in https://github.com/amzn/pecos/pull/143
    • @xiusic made their first contribution in https://github.com/amzn/pecos/pull/147

    Full Changelog: https://github.com/amzn/pecos/compare/v0.3.0...v0.4.0

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Apr 1, 2022)

    Highlights

    • Enable distributed training for XLinear
    • Enable PECOS for aarch64(arm64) CPU Architecture
    • Enhance pecos.ann.hnsw with Function Multi-Versioning (FMV) technique to automatically select the best supported SIMD instructions (SSE, AVX2, AVX512) at runtime
    • Reduce CPU memory usage in pecos.xmc.xtransformer training

    Enhancements

    • Add distilbert model. by @mo-fu in https://github.com/amzn/pecos/pull/97
    • add CNAME by @jiong-zhang in https://github.com/amzn/pecos/pull/104
    • Bump numpy from 1.20.3 to 1.21.0 in /examples/qp2q by @dependabot in https://github.com/amzn/pecos/pull/110
    • enable Function Multi-Versioning (FMV) to support AVX512 by @rofuyu in https://github.com/amzn/pecos/pull/111
    • Modify supported Python version by @weiliw-amz in https://github.com/amzn/pecos/pull/113
    • Enabling PECOS for aarch64(arm64) CPU Architecture by @weiliw-amz in https://github.com/amzn/pecos/pull/114
    • Update OpenBLAS Version for x86 Wheel Build by @weiliw-amz in https://github.com/amzn/pecos/pull/117
    • SIMD Functions for aarch64(ARM64) by @weiliw-amz in https://github.com/amzn/pecos/pull/115
    • Add profile_util module by @weiliw-amz in https://github.com/amzn/pecos/pull/121
    • Fix FMV setup link flag and add test wheel CI by @weiliw-amz in https://github.com/amzn/pecos/pull/119
    • Fix xlinear.reconstruct_model; Add PII embedding by @weiliw-amz in https://github.com/amzn/pecos/pull/120
    • Add Distributed PECOS XLinear Modules by @weiliw-amz in https://github.com/amzn/pecos/pull/123
    • Add distributed PECOS README by @weiliw-amz in https://github.com/amzn/pecos/pull/127
    • update HNSW README and save/load in Python API by @OctoberChang in https://github.com/amzn/pecos/pull/129
    • Improve XR-Transformer memory efficiency by @jiong-zhang in https://github.com/amzn/pecos/pull/128

    Bug Fixes

    • properly set Text2Text prediction argument by @OctoberChang in https://github.com/amzn/pecos/pull/101
    • Fix HiearchicalMLModel pred-params initialization and add bugs by @weiliw-amz in https://github.com/amzn/pecos/pull/103
    • minor bug fix in XR-Transformer exp script by @jiong-zhang in https://github.com/amzn/pecos/pull/106
    • fixed multithreading bugs in py hierarchical kmeans by @OctoberChang in https://github.com/amzn/pecos/pull/108
    • set pytest of hierarchical kmeans with single thread by @OctoberChang in https://github.com/amzn/pecos/pull/109
    • Fix relative path in distributed README by @weiliw-amz in https://github.com/amzn/pecos/pull/130

    Experiment Codes for Publications

    • add overlap-clustering (Liu et al.) in NeurIPS21 by @xuanqing94 in https://github.com/amzn/pecos/pull/98
    • add MACLR codes by @xyh97 in https://github.com/amzn/pecos/pull/100
    • update experiment code for pecos jmlr paper by @OctoberChang in https://github.com/amzn/pecos/pull/107
    • update Philip's experiment code into example folder by @OctoberChang in https://github.com/amzn/pecos/pull/118

    New Contributors

    • @mo-fu made their first contribution in https://github.com/amzn/pecos/pull/97
    • @xuanqing94 made their first contribution in https://github.com/amzn/pecos/pull/98
    • @xyh97 made their first contribution in https://github.com/amzn/pecos/pull/100
    • @dependabot made their first contribution in https://github.com/amzn/pecos/pull/110

    Full Changelog: https://github.com/amzn/pecos/compare/v0.2.3...v0.3.0

    Source code(tar.gz)
    Source code(zip)
  • v0.2.3(Nov 15, 2021)

  • v0.2.2(Nov 4, 2021)

  • v0.2.1(Oct 27, 2021)

    Highlights

    • Remove support of Ubuntu 16.04
    • Implemented XR-Transformer
    • Enabled HNSW functionality
    • Enabled cost-sensitive learning in PECOS

    Enhancements

    ANN HNSW

    • Initial implementation of HNSW in C++ with single-thread [#44] (@OctoberChang)
    • Refactor HNSW in C++ to support sparse/dense features and multi-threading [#49] (@rofuyu)
    • Initial implementation of HNSW Python interface [#53] (@OctoberChang)
    • Refactor HNSW python API and readme markdown [#63] (@OctoberChang)
    • Refactor HNSW C++ to reuse priority queue for different inference calls within the same Searcher [#65] (@rofuyu)
    • Enable HNSW save/load functionality [#71] (@OctoberChang)
    • Add serialization version in HNSW save/load [#77] (@rofuyu)
    • Enable HNSW python command line interface [#79] (@OctoberChang)

    Cost-sensitive Learning

    • Enable Cost-Sensitive Learning via XLinear API/CLI [#64] (@jiong-zhang)
    • Enable cost sensitive for text2text CLI [#75] (@jiong-zhang)

    XR-Transformer [#27, #64] (@jiong-zhang)

    • Refactor pecos.xmc.xtransformer and enable end2end XR-Transformer training
    • CLI tool for generating embeddings pecos.xmc.xtransformer.encode
    • Faster transformer text tokenizers using huggingface's C implementation
    • Allow training XR-Transformer without numerical features.

    Better control over parameters for XLinear, XTransformer and Text2text [#64, #78, #80] (@jiong-zhang)

    • Enable advanced control of parameters via JSON input file
    • Add utility tool to generate parameter skeleton for further modification

    Other new functionalities

    • Added support for predicting on select outputs [#37, #43, #47] (@bhl00)
    • Added new primal solver L2R_L2LOSS_SVC_PRIMAL for XLinear [#67] (@yuhchenlin)
    • Add Makefile for easy format, install, clean and unittest. [#12] (@weiliw-amz)

    Bug Fixes

    • (#17) Fixed issues with github information obtaining when installing from .zip. [#21, #29] (@weiliw-amz)
    • (#42) Fixed transformer training issue on single GPU [#14] (@jiong-zhang)
    • Removed PECOS source-installation dependency on NumPy BLAS library. [#81] (@weili-amz)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Apr 26, 2021)

Owner
Amazon
Amazon
Face-Recognition-Attendence-System - This face recognition Attendence system using Python

Face-Recognition-Attendence-System I have developed this face recognition Attend

Riya Gupta 4 May 10, 2022
StorSeismic: An approach to pre-train a neural network to store seismic data features

StorSeismic: An approach to pre-train a neural network to store seismic data features This repository contains codes and resources to reproduce experi

Seismic Wave Analysis Group 11 Dec 05, 2022
AI that generate music

PianoGPT ai that generate music try it here https://share.streamlit.io/annasajkh/pianogpt/main/main.py or here https://huggingface.co/spaces/Annas/Pia

Annas 28 Nov 27, 2022
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-t

Facebook Research 5.1k Jan 04, 2023
A simple interface for editing natural photos with generative neural networks.

Neural Photo Editor A simple interface for editing natural photos with generative neural networks. This repository contains code for the paper "Neural

Andy Brock 2.1k Dec 29, 2022
A custom DeepStack model for detecting 16 human actions.

DeepStack_ActionNET This repository provides a custom DeepStack model that has been trained and can be used for creating a new object detection API fo

MOSES OLAFENWA 16 Nov 11, 2022
A generalized framework for prototyping full-stack cooperative driving automation applications under CARLA+SUMO.

OpenCDA OpenCDA is a SIMULATION tool integrated with a prototype cooperative driving automation (CDA; see SAE J3216) pipeline as well as regular autom

UCLA Mobility Lab 726 Dec 29, 2022
JAX + dataclasses

jax_dataclasses jax_dataclasses provides a wrapper around dataclasses.dataclass for use in JAX, which enables automatic support for: Pytree registrati

Brent Yi 35 Dec 21, 2022
Practical and Real-world applications of ML based on the homework of Hung-yi Lee Machine Learning Course 2021

Machine Learning Theory and Application Overview This repository is inspired by the Hung-yi Lee Machine Learning Course 2021. In that course, professo

SilenceJiang 35 Nov 22, 2022
CLIP (Contrastive Language–Image Pre-training) for Italian

Italian CLIP CLIP (Radford et al., 2021) is a multimodal model that can learn to represent images and text jointly in the same space. In this project,

Italian CLIP 114 Dec 29, 2022
Domain Generalization for Mammography Detection via Multi-style and Multi-view Contrastive Learning

MSVCL_MICCAI2021 Installation Please follow the instruction in pytorch-CycleGAN-and-pix2pix to install. Example Usage An example of vendor-styles tran

Jaron Lee 11 Oct 19, 2022
PyTorch code of "SLAPS: Self-Supervision Improves Structure Learning for Graph Neural Networks"

SLAPS-GNN This repo contains the implementation of the model proposed in SLAPS: Self-Supervision Improves Structure Learning for Graph Neural Networks

60 Dec 22, 2022
A simple program for training and testing vit

Vit This is a simple program for training and testing vit. Key requirements: torch, torchvision and timm. Dataset I put 5 categories of the cub classi

xiezhenyu 2 Oct 11, 2022
ConvMixer unofficial implementation

ConvMixer ConvMixer 非官方实现 pytorch 版本已经实现。 nets 是重构版本 ,test 是官方代码 感兴趣小伙伴可以对照看一下。 keras 已经实现 tf2.x 中 是tensorflow 2 版本 gelu 激活函数要求 tf=2.4 否则使用入下代码代替gelu

Jian Tengfei 8 Jul 11, 2022
Human Detection - Pedestrian Detection using OpenCV Python

Pedestrian Detection using OpenCV Python Follow us on Instagram for Machine Lear

Hrishikesh Dutta 1 Jan 23, 2022
Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand

Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand Introduction We propose a generalization of leaderboards, bidimensional leader

4 Dec 03, 2022
A platform for intelligent agent learning based on a 3D open-world FPS game developed by Inspir.AI.

Wilderness Scavenger: 3D Open-World FPS Game AI Challenge This is a platform for intelligent agent learning based on a 3D open-world FPS game develope

46 Nov 24, 2022
Code for "The Box Size Confidence Bias Harms Your Object Detector"

The Box Size Confidence Bias Harms Your Object Detector - Code Disclaimer: This repository is for research purposes only. It is designed to maintain r

Johannes G. 24 Dec 07, 2022
Official code for "Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes", CVPR2022

[CVPR 2022] Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes Dongkwon Jin, Wonhui Park, Seong-Gyun Jeong, Heeyeon Kwon, and Cha

Dongkwon Jin 106 Dec 29, 2022
Official implementation of Deep Burst Super-Resolution

Deep-Burst-SR Official implementation of Deep Burst Super-Resolution Publication: Deep Burst Super-Resolution. Goutam Bhat, Martin Danelljan, Luc Van

Goutam Bhat 113 Dec 19, 2022