PECOS - Prediction for Enormous and Correlated Spaces

Last update: Jan 04, 2023

Overview

PECOS - Predictions for Enormous and Correlated Output Spaces

PECOS is a versatile and modular machine learning (ML) framework for fast learning and inference on problems with large output spaces, such as extreme multi-label ranking (XMR) and large-scale retrieval. PECOS' design is intentionally agnostic to the specific nature of the inputs and outputs as it is envisioned to be a general-purpose framework for multiple distinct applications.

Given an input, PECOS identifies a small set (10-100) of relevant outputs from amongst an extremely large (~100MM) candidate set and ranks these outputs in terms of relevance.

Features

Extreme Multi-label Ranking and Classification

X-Linear (pecos.xmc.xlinear): recursive linear models learning to traverse an input from the root of a hierarchical label tree to a few leaf node clusters, and return top-k relevant labels within the clusters as predictions. See more details in the PECOS paper (Yu et al., 2020).
- fast real-time inference in C++
- can handle 100MM output space
XR-Transformer (pecos.xmc.xtransformer): Transformer based XMC framework that fine-tunes pre-trained transformers recursively on multi-resolution objectives. It can be used to generate top-k relevant labels for a given instance or simply as a fine-tuning engine for task aware embeddings. See technical details in XR-Transformer paper (Zhang et al., 2021).
- easy to extend with many pre-trained Transformer models from huggingface transformers.
- establishes the State-of-the-art on public XMC benchmarks.
ANN Search with HNSW (pecos.ann.hnsw): a PECOS Approximated Nearest Neighbor (ANN) search module that implements the Hierarchical Navigable Small World Graphs (HNSW) algorithm (Malkov et al., TPAMI 2018).
- Supports both sparse and dense input features
- SIMD optimization for both dense/sparse distance computation
- Supports thread-safe graph construction in parallel on multi-core shared memory machines
- Supports thread-safe Searchers to do inference in parallel, which reduces inference overhead

Requirements and Installation

Python (>=3.6)
Pip (>=19.3)

See other dependencies in setup.py You should install PECOS in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide.

Supporting Platforms

Ubuntu 18.04 and 20.04
Amazon Linux 2

Installation from Wheel

PECOS can be installed using pip as follows:

python3 -m pip install libpecos

Installation from Source

Prerequisite builder tools

For Ubuntu (18.04, 20.04):

sudo apt-get update && sudo apt-get install -y build-essential git python3 python3-distutils python3-venv

For Amazon Linux 2 Image:

sudo yum -y install python3 python3-devel python3-distutils python3-venv && sudo yum -y install groupinstall 'Development Tools'

One needs to install at least one BLAS library to compile PECOS, e.g. OpenBLAS:

For Ubuntu (18.04, 20.04):

sudo apt-get install -y libopenblas-dev

For Amazon Linux 2 Image and AMI:

sudo amazon-linux-extras install epel -y
sudo yum install openblas-devel -y

Install and develop locally

git clone https://github.com/amzn/pecos
cd pecos
python3 -m pip install --editable ./

Quick Tour

To have a glimpse of how PECOS works, here is a quick tour of using PECOS API for the XMR problem.

Toy Example

The eXtreme Multi-label Ranking (XMR) problem is defined by two matrices

instance-to-feature matrix X, of shape N by D in SciPy CSR format
instance-to-label matrix Y, of shape N by L in SciPy CSR format

Some toy data matrices are available in the tst-data folder.

PECOS constructs a hierarchical label tree and learns linear models recursively (e.g., XR-Linear):

>>> from pecos.xmc.xlinear.model import XLinearModel
>>> from pecos.xmc import Indexer, LabelEmbeddingFactory

# Build hierarchical label tree and train a XR-Linear model
>>> label_feat = LabelEmbeddingFactory.create(Y, X)
>>> cluster_chain = Indexer.gen(label_feat)
>>> model = XLinearModel.train(X, Y, C=cluster_chain)
>>> model.save("./save-models")

After learning the model, we do prediction and evaluation

>>> from pecos.utils import smat_util
>>> Yt_pred = model.predict(Xt)
# print precision and recall at k=10
>>> print(smat_util.Metrics.generate(Yt, Yt_pred))

PECOS also offers optimized C++ implementation for fast real-time inference

>>> model = XLinearModel.load("./save-models", is_predict_only=True)
>>> for i in range(X_tst.shape[0]):
>>>   y_tst_pred = model.predict(X_tst[i], threads=1)

Citation

If you find PECOS useful, please consider citing the following paper:

PECOS: Prediction for Enormous and Correlated Output Spaces (Yu et al., 2020) [bib]

Some papers from our group using PECOS:

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments

text2text model evaluation not working

Description

Model evaluation is not working properly to output the precision and recall

How to Reproduce?

I run the following line of code,

python3 -m pecos.apps.text2text.evaluate --pred-path ./test-prediction.txt --truth-path ./test.txt --text-item-path ./output-labels.txt

where, --pred-path is the path of file produced during model prediction, --truth-path is the path of test file, e.g. Out1, Out2, Out3 \t cheap door Out1, Out2 and Out3 are the line number in the the following output file
--text-item-path ./output-labels.txt

What have you tried to solve it?

Error message or code output

Traceback (most recent call last):
  File "/home/khalid/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/khalid/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/pecos/apps/text2text/evaluate.py", line 130, in <module>
    do_evaluation(args)
  File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/pecos/apps/text2text/evaluate.py", line 119, in do_evaluation
    Y_true = smat.csr_matrix((val_t, (row_id_t, col_id_t)), shape=(num_samples_t, len(item_dict)))
  File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/compressed.py", line 55, in __init__
    dtype=dtype))
  File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/coo.py", line 196, in __init__
    self._check()
  File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/coo.py", line 285, in _check
    raise ValueError('column index exceeds matrix dimensions')
ValueError: column index exceeds matrix dimensions

Environment

Operating system:
Python version:
PECOS version:

(Add as much information about your environment as possible, e.g. dependencies versions.)

bug

opened by Khalid-Usman 13

Format of yt label

Hello,

hope you are fine, have 2 questions about the format

Question 1 Have one question about optimal format for label Yt. Is it preferable to have Yt as:

(A) OneHot encoded with only one 1 per row. (B) Mutiple OneHot encode with mutiple 1 per rows (as this is the case for Xt).

When the prediction is done, it seems only outputing only one 1 per row.

Question 2:

Is there any constraint by having Xt as having a mix of dense input and sparse input instead of sparse input only ?
enhancement

opened by arita37 7
some formatting

Hi, Thasnks for this.

Just would like to confirm the format of the input

X : CSR format x(i,k) = val Can valx be a float ? does it need to be binary or [0,1] value ?

Y: CSR format, y(i,k) = valy . does it need to be binary ( 0 or 1) ?

Thx

opened by arita37 7
Online Inference Latency for XR-TRANSFORMER

hi！

When I use XR-TRANSFORMER for predict(per input), the online Inference lattency comes up to 400ms. this is why?

the system I use is ubtuntu18.04, and XR-TRANSFORMER are evaluated on a Nvidia Tesla V100 GPU.

Thanks!

opened by xiaokening 6

$Issue with --label-embed-type pifa_lf_concat::Z=${Z_pifa_file}$

Issue with --label-embed-type pifa_lf_concat::Z=${Z_pifa_file}

Description

I am trying to use ----label-embed-type parameter in the training and it produces this error. ValueError: Object arrays cannot be loaded when allow_pickle=False - Coming from np.load() function.

I have tested loading of NPZ file for z_labels (compressed and uncompressed, both) it produces this error if allow_pickle=False I have load data by defining the allow_pickle=True for np.load() function.

Can you please add description of this file format or can we sent this parameter as an input?

This is the data I have after loading npz file with allow_pickle = True

[array(['Trump', 'Bus', 'Trolly '], dtype='<U23')
 array(['Show', 'Disp'], dtype='<U20')
 array(['Recap rew'], dtype='<U24')
 array(['Core, '], dtype='<U32')
 array(['Hoe'], dtype='<U10')
 array(['Plan'], dtype='<U21')]

How to Reproduce?

Execute model training with numpy version 1.21.2

python -m pecos.apps.text2text.train \
  --label-embed-type pifa_lf_concat::Z=${Z_pifa_file} \
  -i ${train_file} \ 
  -m ${model_folder}

Error message or code output

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/jupyter/pecos_git/pecos/pecos/apps/text2text/train.py", line 311, in <module>
    train(args)
  File "/home/jupyter/pecos_git/pecos/pecos/apps/text2text/train.py", line 302, in train
    workspace_folder=args.workspace_folder,
  File "/home/jupyter/pecos_git/pecos/pecos/apps/text2text/model.py", line 325, in train
    Z = smat_util.load_matrix(val)
  File "/home/jupyter/pecos_git/pecos/pecos/utils/smat_util.py", line 117, in load_matrix
    mat = np.load(src)
  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/npyio.py", line 441, in load
    pickle_kwargs=pickle_kwargs)
  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/format.py", line 743, in read_array
    raise ValueError("Object arrays cannot be loaded when "
ValueError: Object arrays cannot be loaded when allow_pickle=False

Environment

Operating system: Unix Ubuntu (on GCP)
Python version: 3.8
PECOS version: 0.1.0
numpy version: 1.21.2

bug

opened by zusmani 6

Pecos killed on ranker training step

Description

The training has been killed on this training step:

Data: Amazon-670k Model: X-Transformer

[2022-12-01 21:38:23,019][pecos.xmc.xtransformer.model][INFO] - Start training ranker...
[2022-12-01 21:38:24,001][pecos.xmc.base][INFO] - Training Layer 0 of 4 Layers in HierarchicalMLModel, neg_mining=tfn..
[2022-12-01 21:39:05,191][pecos.xmc.base][INFO] - Training Layer 1 of 4 Layers in HierarchicalMLModel, neg_mining=tfn..
[2022-12-01 21:40:24,829][pecos.xmc.base][INFO] - Training Layer 2 of 4 Layers in HierarchicalMLModel, neg_mining=tfn..
[2022-12-01 21:43:25,293][pecos.xmc.base][INFO] - Training Layer 3 of 4 Layers in HierarchicalMLModel, neg_mining=tfn+man..

Environment

Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.6 LTS
Release:	18.04
Codename:	bionic
Python 3.8.15
libpecos~=0.4.0
1 RTX A4500, 32 vCPU, and 250 GB RAM

What could it be? Is it possible to resume training from that stage?

bug

opened by celsofranssa 4

How to Use XR-Transformer in Text2Text App

Description

I want to use XR-Transformer in text2text app, following the parameters given here. But setting --params-path to this .json file raise the error:

Traceback (most recent call last):
  File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/train.py", line 345, in <module>
    train(args)
  File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/train.py", line 328, in train
    t2t_model = Text2Text.train(
  File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/model.py", line 317, in train
    pred_params = pred_params.override_with_kwargs(kwargs)
  File "/home/huziyuan/miniconda3/envs/huggingface/lib/python3.9/site-packages/pecos/apps/text2text/model.py", line 126, in override_with_kwargs
    self.xlinear_params.override_with_kwargs(pred_kwargs)
AttributeError: 'NoneType' object has no attribute 'override_with_kwargs'

References

enhancement

opened by lyy1994 4

Examples with text

Description

Current example of X and Y only has numeric values. Could you please provide one example where X and Y are both text? Think the paper/method is targeted to solve such problems.
enhancement

opened by xyan326 4

Add memory-mapped utilility module

Issue #, if available: N/A

Description of changes: Add memory-mapped utilility module.

User could test with below code: Copy it into a filetest_mmap_util.cpp placed at pecos/core/util/, and run:

gcc -lm -ldl -lstdc++ -fopenmp -std=c++14 -lgcc -lgomp -O3  -I ./ test_mmap_util.cpp
./a.out

Output:

Generate a Bar with data:
---Bar---
---Foo---
foo_1: 0 1 2 3 4 5 6 7 8 9 
foo_2: 1
---------
bar: 5 5 5 5 5 
---------
Save Bar into mmap file: ./bar_test_mmap.txt
Load a new Bar from saved mmap file...
Loaded Bar data:
---Bar---
---Foo---
foo_1: 0 1 2 3 4 5 6 7 8 9 
foo_2: 1
---------
bar: 5 5 5 5 5 
---------

#include <iostream>
#include "mmap_util.hpp"

using namespace pecos::mmap_util;

// Nested class mmap example
// Bar contains a Foo instance
class Foo {
    public:
        Foo() {}
        ~Foo() { foo_1.clear(); }

        void init_data() {
            foo_1.resize(10, 0);
            for (int i=0; i<foo_1.size(); ++i) { foo_1[i] = i; }
            foo_2 = 1.0;
        }

        void print() {
            std::cout << "---Foo---" << std::endl;
            std::cout << "foo_1: ";
            for (int i=0; i<foo_1.size(); ++i) { std::cout << foo_1[i] << " "; }
            std::cout << std::endl;
            std::cout << "foo_2: " << foo_2 << std::endl;
            std::cout << "---------" << std::endl;
        }

        void save_to_mmap_store(MmapStore& mmap_s) const {
            foo_1.save_to_mmap_store(mmap_s);
            mmap_s.fput_one<double>(foo_2);
        }

        void load_from_mmap_store(MmapStore& mmap_s) {
            foo_1.load_from_mmap_store(mmap_s);
            foo_2 = mmap_s.fget_one<double>();
        }

    private:
        MmapableVector<int> foo_1;
        double foo_2;
};

class Bar {
    public:
        Bar() { }
        ~Bar() { bar.clear(); mmap_store.close(); }

        void init_data() {
            foo.init_data();
            bar.resize(5, 0);
            for (int i=0; i<bar.size(); ++i) { bar[i] = 5.0; }
        }

        void print() {
            std::cout << "---Bar---" << std::endl;
            foo.print();
            std::cout << "bar: ";
            for (int i=0; i<bar.size(); ++i) { std::cout << bar[i] << " "; }
            std::cout << std::endl;
            std::cout << "---------" << std::endl;
        }

        void save(const std::string & file_name) const {
            // Create a mmapfile for dump at the most outer layer class
            // You cannot reuse (i.e, close and reopen) mmap_store, since it may hold the data storage
            MmapStore mmap_s = MmapStore();
            mmap_s.open(file_name, "w");

            save_to_mmap_store(mmap_s);

            // Metadata dump and fp closure is automatically done at MmapStore destructor when this function ends
            // You can make it happen earlier with explicitly calling close()
            mmap_s.close();
        }
        void load(const std::string & file_name, const bool pre_load=true) {
            mmap_store.open(file_name, pre_load?"r":"r_lazy");
            load_from_mmap_store(mmap_store);
        }

        void save_to_mmap_store(MmapStore& mmap_s) const {
            foo.save_to_mmap_store(mmap_s);
            bar.save_to_mmap_store(mmap_s);
        }

        void load_from_mmap_store(MmapStore& mmap_s) {
            foo.load_from_mmap_store(mmap_s);
            bar.load_from_mmap_store(mmap_s);
        }

    private:
        Foo foo;
        MmapableVector<double> bar;
        // Mmap Data storage at the most outer layer class
        MmapStore mmap_store;
};


int main() {
    std::string f_name = "./bar_test_mmap.txt";

    std::cout << "Generate a Bar with data:" << std::endl;
    Bar bar;
    bar.init_data();
    bar.print();

    std::cout << "Save Bar into mmap file: " << f_name << std::endl;
    bar.save(f_name);

    std::cout << "Load a new Bar from saved mmap file..." << std::endl;
    Bar new_bar;
    new_bar.load(f_name, "r");

    std::cout << "Loaded Bar data:" << std::endl;
    new_bar.print();

    return 0;
}

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

opened by weiliw-amz 3

Is there at least one example showing how to use Pecos from a plain text dataset?
It has been difficult to infer how to use the PECOS properly. The usage case is splited over several README.md files and through the issues.

Then, could you provide a toy example of an end-to-end approach (using XR-Transformer for instance)?

Consider the following scenario: We have the training and testing samples in plain text

#train samples: text: raw_text_1, labels: [L1, L7, ..., L3] text: raw_text_2, labels: [L8, L9] ... text: raw_text_N, labels: [L1, L7, ..., L4] #test samples: text: test_raw_text_1 text: test_raw_text_2 ... text: test_raw_text_M

and someone has to:

prepare the data to the accepted format;

train the model;

predict the top k labels.
opened by celsofranssa 3
bug of installing from source
Description

there is sonme problems when install pecos from source according to readme.md

How to Reproduce?

python3 -m pip install --editable ./ Obtaining file:///home/workspace/lishengchao/pecos Requirement already satisfied: scipy>=1.4.1 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (1.6.1) Requirement already satisfied: scikit-learn>=0.24.1 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (0.24.1) Requirement already satisfied: torch>=1.8.0 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (1.8.0) Collecting sentencepiece!=0.1.92,>=0.1.86 Using cached https://repo.huaweicloud.com/repository/pypi/packages/68/91/ded0f64f90abfc5413c620fc345a0aef1e7ff5addda8704cc6b3bf589c64/sentencepiece-0.1.96-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB) Requirement already satisfied: transformers>=4.1.1 in /opt/conda/lib/python3.8/site-packages (from libpecos==0.3.0) (4.8.2) Collecting numpy>=1.19.5 Using cached https://repo.huaweicloud.com/repository/pypi/packages/38/c0/c45c5eb0e25247d5fbb333fd0b56e570ba21cf0e3dca3abad174fb780e8c/numpy-1.22.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.8/site-packages (from scikit-learn>=0.24.1->libpecos==0.3.0) (2.1.0) Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.8/site-packages (from scikit-learn>=0.24.1->libpecos==0.3.0) (1.0.1) Requirement already satisfied: typing_extensions in /opt/conda/lib/python3.8/site-packages (from torch>=1.8.0->libpecos==0.3.0) (3.7.4.3) Collecting huggingface-hub==0.0.12 Downloading https://repo.huaweicloud.com/repository/pypi/packages/2f/ee/97e253668fda9b17e968b3f97b2f8e53aa0127e8807d24a547687423fe0b/huggingface_hub-0.0.12-py3-none-any.whl (37 kB) Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (2021.4.4) Requirement already satisfied: sacremoses in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (0.0.45) Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (2.24.0) Requirement already satisfied: packaging in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (21.3) Requirement already satisfied: tokenizers<0.11,>=0.10.1 in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (0.10.3) Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (4.62.3) Requirement already satisfied: filelock in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (3.0.12) Requirement already satisfied: pyyaml in /opt/conda/lib/python3.8/site-packages (from transformers>=4.1.1->libpecos==0.3.0) (5.4.1) Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from sacremoses->transformers>=4.1.1->libpecos==0.3.0) (1.15.0) Requirement already satisfied: click in /opt/conda/lib/python3.8/site-packages (from sacremoses->transformers>=4.1.1->libpecos==0.3.0) (7.1.2) Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (2.10) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (1.25.11) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests->transformers>=4.1.1->libpecos==0.3.0) (2020.12.5) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging->transformers>=4.1.1->libpecos==0.3.0) (3.0.6) Installing collected packages: sentencepiece, numpy, libpecos, huggingface-hub Attempting uninstall: numpy Found existing installation: numpy 1.19.2 Uninstalling numpy-1.19.2: Successfully uninstalled numpy-1.19.2 Running setup.py develop for libpecos ERROR: Command errored out with exit status 1: command: /opt/conda/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/workspace/lishengchao/pecos/setup.py'"'"'; file='"'"'/home/workspace/lishengchao/pecos/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps cwd: /home/workspace/lishengchao/pecos/ Complete output (28 lines): Set version to 0.3.0 running develop running egg_info creating libpecos.egg-info writing libpecos.egg-info/PKG-INFO writing dependency_links to libpecos.egg-info/dependency_links.txt writing requirements to libpecos.egg-info/requires.txt writing top-level names to libpecos.egg-info/top_level.txt writing manifest file 'libpecos.egg-info/SOURCES.txt' reading manifest file 'libpecos.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.c' under directory 'pecos/core' writing manifest file 'libpecos.egg-info/SOURCES.txt' running build_ext building 'pecos.core.libpecos_float32' extension INFO: C compiler: gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/pecos creating build/temp.linux-x86_64-3.8/pecos/core INFO: compile options: '-Ipecos/core -I/usr/include/ -I/usr/local/include -I/opt/conda/include/python3.8 -c' extra options: '-fopenmp -O3 -std=c++14' INFO: gcc: pecos/core/libpecos.cpp cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ /tmp/ccNJQf5g.s: Assembler messages: /tmp/ccNJQf5g.s: Fatal error: can't close build/temp.linux-x86_64-3.8/pecos/core/libpecos.o: Input/output error error: Command "gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ipecos/core -I/usr/include/ -I/usr/local/include -I/opt/conda/include/python3.8 -c pecos/core/libpecos.cpp -o build/temp.linux-x86_64-3.8/pecos/core/libpecos.o -fopenmp -O3 -std=c++14" failed with exit status 1 ----------------------------------------

ERROR: Command errored out with exit status 1: /opt/conda/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/workspace/lishengchao/pecos/setup.py'"'"'; file='"'"'/home/workspace/lishengchao/pecos/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.

Environment

Ubtuntu 18.04

Python 3.8

PECOS 0.3.0

(Add as much information about your environment as possible, e.g. dependencies versions.)
bug
opened by xiaokening 3
Memory-mapped XLinear Model
Issue #, if available: N/A

Description of changes:

Memory-mapped PECOS XLinear model

Greatly reduce loading time.

Ideal for large models that user want to quickly try a few inferences without waiting for loading full model into memory.

Also capable for large model inference that could not be stored in memory.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
opened by weiliw-amz 0

Releases(v0.4.0)

v0.4.0(Aug 9, 2022)
Highlights

Enable distributed XR-Transformer fine-tuning

Enable the capability of large-batch prediction for ANN HNSW

Release interactive hands-on tutorial materials

Enhancements

Unit test for sorted_csc, sorted_csr by @chepingt in https://github.com/amzn/pecos/pull/139

Unit test for csr_row_softmax by @houyuhan98 in https://github.com/amzn/pecos/pull/141

Bump numpy from 1.21.0 to 1.22.0 by @dependabot in https://github.com/amzn/pecos/pull/145 https://github.com/amzn/pecos/pull/146

Release the materials for the PECOS hands-on tutorial in KDD 2022 by @hallogameboy in https://github.com/amzn/pecos/pull/153 https://github.com/amzn/pecos/pull/154 https://github.com/amzn/pecos/pull/161

Enable the capability of large-batch prediction for HNSW by @OctoberChang in https://github.com/amzn/pecos/pull/156

Distributed XR-Transformer fine-tuning by @jiong-zhang in https://github.com/amzn/pecos/pull/144 https://github.com/amzn/pecos/pull/162

Bug Fixes

Fix argument-passing issue in smat_util.sorted_csc by @jiong-zhang in https://github.com/amzn/pecos/pull/134

Fix indptr overflow issue in block_diag_csr() by @OctoberChang in https://github.com/amzn/pecos/pull/136

Fix the yum group install command in README by @hallogameboy in https://github.com/amzn/pecos/pull/138

Change file names for windows compatibility by @YangyiLi001 in https://github.com/amzn/pecos/pull/143

Avoid triggering CodeQL on push for Dependabot branches by @weiliw-amz in https://github.com/amzn/pecos/pull/148

Fix Pypi release version error by @weiliw-amz in https://github.com/amzn/pecos/pull/163

Deprecation

Deprecate imbalanced hierarchical K-means from clustering and semantic indexing by @hallogameboy in https://github.com/amzn/pecos/pull/151

New Contributors

@chepingt made their first contribution in https://github.com/amzn/pecos/pull/139

@houyuhan98 made their first contribution in https://github.com/amzn/pecos/pull/141

@YangyiLi001 made their first contribution in https://github.com/amzn/pecos/pull/143

@xiusic made their first contribution in https://github.com/amzn/pecos/pull/147

Full Changelog: https://github.com/amzn/pecos/compare/v0.3.0...v0.4.0
Source code(tar.gz)
Source code(zip)
v0.3.0(Apr 1, 2022)
Highlights

Enable distributed training for XLinear

Enable PECOS for aarch64(arm64) CPU Architecture

Enhance pecos.ann.hnsw with Function Multi-Versioning (FMV) technique to automatically select the best supported SIMD instructions (SSE, AVX2, AVX512) at runtime

Reduce CPU memory usage in pecos.xmc.xtransformer training

Enhancements

Add distilbert model. by @mo-fu in https://github.com/amzn/pecos/pull/97

add CNAME by @jiong-zhang in https://github.com/amzn/pecos/pull/104

Bump numpy from 1.20.3 to 1.21.0 in /examples/qp2q by @dependabot in https://github.com/amzn/pecos/pull/110

enable Function Multi-Versioning (FMV) to support AVX512 by @rofuyu in https://github.com/amzn/pecos/pull/111

Modify supported Python version by @weiliw-amz in https://github.com/amzn/pecos/pull/113

Enabling PECOS for aarch64(arm64) CPU Architecture by @weiliw-amz in https://github.com/amzn/pecos/pull/114

Update OpenBLAS Version for x86 Wheel Build by @weiliw-amz in https://github.com/amzn/pecos/pull/117

SIMD Functions for aarch64(ARM64) by @weiliw-amz in https://github.com/amzn/pecos/pull/115

Add profile_util module by @weiliw-amz in https://github.com/amzn/pecos/pull/121

Fix FMV setup link flag and add test wheel CI by @weiliw-amz in https://github.com/amzn/pecos/pull/119

Fix xlinear.reconstruct_model; Add PII embedding by @weiliw-amz in https://github.com/amzn/pecos/pull/120

Add Distributed PECOS XLinear Modules by @weiliw-amz in https://github.com/amzn/pecos/pull/123

Add distributed PECOS README by @weiliw-amz in https://github.com/amzn/pecos/pull/127

update HNSW README and save/load in Python API by @OctoberChang in https://github.com/amzn/pecos/pull/129

Improve XR-Transformer memory efficiency by @jiong-zhang in https://github.com/amzn/pecos/pull/128

Bug Fixes

properly set Text2Text prediction argument by @OctoberChang in https://github.com/amzn/pecos/pull/101

Fix HiearchicalMLModel pred-params initialization and add bugs by @weiliw-amz in https://github.com/amzn/pecos/pull/103

minor bug fix in XR-Transformer exp script by @jiong-zhang in https://github.com/amzn/pecos/pull/106

fixed multithreading bugs in py hierarchical kmeans by @OctoberChang in https://github.com/amzn/pecos/pull/108

set pytest of hierarchical kmeans with single thread by @OctoberChang in https://github.com/amzn/pecos/pull/109

Fix relative path in distributed README by @weiliw-amz in https://github.com/amzn/pecos/pull/130

Experiment Codes for Publications

add overlap-clustering (Liu et al.) in NeurIPS21 by @xuanqing94 in https://github.com/amzn/pecos/pull/98

add MACLR codes by @xyh97 in https://github.com/amzn/pecos/pull/100

update experiment code for pecos jmlr paper by @OctoberChang in https://github.com/amzn/pecos/pull/107

update Philip's experiment code into example folder by @OctoberChang in https://github.com/amzn/pecos/pull/118

New Contributors

@mo-fu made their first contribution in https://github.com/amzn/pecos/pull/97

@xuanqing94 made their first contribution in https://github.com/amzn/pecos/pull/98

@xyh97 made their first contribution in https://github.com/amzn/pecos/pull/100

@dependabot made their first contribution in https://github.com/amzn/pecos/pull/110

Full Changelog: https://github.com/amzn/pecos/compare/v0.2.3...v0.3.0
Source code(tar.gz)
Source code(zip)
v0.2.3(Nov 15, 2021)
Bug Fixes

Fix the index type during accessing sparse matrices [#96] (@hallogameboy)

Source code(tar.gz)
Source code(zip)
v0.2.2(Nov 4, 2021)
Bug Fixes

Fix cost-sensitive bug in R normalization [#90] (@jiong-zhang )

Source code(tar.gz)
Source code(zip)
v0.2.1(Oct 27, 2021)
Highlights

Remove support of Ubuntu 16.04

Implemented XR-Transformer

Enabled HNSW functionality

Enabled cost-sensitive learning in PECOS

Enhancements

ANN HNSW

Initial implementation of HNSW in C++ with single-thread [#44] (@OctoberChang)

Refactor HNSW in C++ to support sparse/dense features and multi-threading [#49] (@rofuyu)

Initial implementation of HNSW Python interface [#53] (@OctoberChang)

Refactor HNSW python API and readme markdown [#63] (@OctoberChang)

Refactor HNSW C++ to reuse priority queue for different inference calls within the same Searcher [#65] (@rofuyu)

Enable HNSW save/load functionality [#71] (@OctoberChang)

Add serialization version in HNSW save/load [#77] (@rofuyu)

Enable HNSW python command line interface [#79] (@OctoberChang)

Cost-sensitive Learning

Enable Cost-Sensitive Learning via XLinear API/CLI [#64] (@jiong-zhang)

Enable cost sensitive for text2text CLI [#75] (@jiong-zhang)

XR-Transformer [#27, #64] (@jiong-zhang)

Refactor pecos.xmc.xtransformer and enable end2end XR-Transformer training

CLI tool for generating embeddings pecos.xmc.xtransformer.encode

Faster transformer text tokenizers using huggingface's C implementation

Allow training XR-Transformer without numerical features.

Better control over parameters for XLinear, XTransformer and Text2text [#64, #78, #80] (@jiong-zhang)

Enable advanced control of parameters via JSON input file

Add utility tool to generate parameter skeleton for further modification

Other new functionalities

Added support for predicting on select outputs [#37, #43, #47] (@bhl00)

Added new primal solver L2R_L2LOSS_SVC_PRIMAL for XLinear [#67] (@yuhchenlin)

Add Makefile for easy format, install, clean and unittest. [#12] (@weiliw-amz)

Bug Fixes

(#17) Fixed issues with github information obtaining when installing from .zip. [#21, #29] (@weiliw-amz)

(#42) Fixed transformer training issue on single GPU [#14] (@jiong-zhang)

Removed PECOS source-installation dependency on NumPy BLAS library. [#81] (@weili-amz)

Source code(tar.gz)
Source code(zip)
v0.1.0(Apr 26, 2021)
Highlights

Initial release.

Source code(tar.gz)
Source code(zip)

Owner

Amazon

GitHub Repository

Class-Attentive Diffusion Network for Semi-Supervised Classification [AAAI'21] (official implementation)

Class-Attentive Diffusion Network for Semi-Supervised Classification Official Implementation of AAAI 2021 paper Class-Attentive Diffusion Network for

7 Sep 20, 2022

Semantic Segmentation Suite in TensorFlow

Semantic Segmentation Suite in TensorFlow. Implement, train, and test new Semantic Segmentation models easily!

2.5k Jan 06, 2023

Topic Modelling for Humans

gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ

13.8k Jan 03, 2023

Online-compatible Unsupervised Non-resonant Anomaly Detection Repository

Online-compatible Unsupervised Non-resonant Anomaly Detection Repository Repository containing all scripts used in the studies of Online-compatible Un

0 Nov 09, 2021

lightweight python wrapper for vowpal wabbit

vowpal_porpoise Lightweight python wrapper for vowpal_wabbit. Why: Scalable, blazingly fast machine learning. Install Install vowpal_wabbit. Clone and

163 Nov 24, 2022

Spectral Temporal Graph Neural Network (StemGNN in short) for Multivariate Time-series Forecasting

Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting This repository is the official implementation of Spectral Temporal Gr

306 Dec 29, 2022

Keras like implementation of Deep Learning architectures from scratch using numpy.

Mini-Keras Keras like implementation of Deep Learning architectures from scratch using numpy. How to contribute? The project contains implementations

5 Oct 10, 2021

Source code for paper "Deep Diffusion Models for Robust Channel Estimation", TBA.

diffusion-channels Source code for paper "Deep Diffusion Models for Robust Channel Estimation". Generic flow: Use 'matlab/main.mat' to generate traini

15 Dec 22, 2022

OpenGAN: Open-Set Recognition via Open Data Generation

OpenGAN: Open-Set Recognition via Open Data Generation ICCV 2021 (oral) Real-world machine learning systems need to analyze novel testing data that di

90 Jan 06, 2023

Reinforcement learning library in JAX.

96 Oct 30, 2022

This is a TensorFlow implementation for C2-Rec

This is a TensorFlow implementation for C2-Rec We refer to the repo SASRec. Requirements requirement.txt Datasets This repo includes Amazon Beauty dat

7 Nov 14, 2022

On the adaptation of recurrent neural networks for system identification

On the adaptation of recurrent neural networks for system identification This repository contains the Python code to reproduce the results of the pape

3 Jan 13, 2022

WRENCH: Weak supeRvision bENCHmark

🔧 What is it? Wrench is a benchmark platform containing diverse weak supervision tasks. It also provides a common and easy framework for development

176 Dec 28, 2022

Contour-guided image completion with perceptual grouping (BMVC 2021 publication)

Contour-guided Image Completion with Perceptual Grouping Authors Morteza Rezanejad*, Sidharth Gupta*, Chandra Gummaluru, Ryan Marten, John Wilder, Mic

6 Dec 27, 2022

AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention

AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention. AdaNet buil

3.4k Jan 07, 2023

PECOS - Prediction for Enormous and Correlated Spaces

Related tags

Overview

PECOS - Predictions for Enormous and Correlated Output Spaces

Features

Extreme Multi-label Ranking and Classification

Requirements and Installation

Supporting Platforms

Installation from Wheel

Installation from Source

Prerequisite builder tools

Install and develop locally

Quick Tour

Toy Example

Citation

License

Comments

Description

How to Reproduce?

What have you tried to solve it?

Error message or code output

Environment

Description

How to Reproduce?

Error message or code output

Environment

Description

Environment

Description

References

Description

Description

How to Reproduce?

Environment

Releases(v0.4.0)

v0.4.0(Aug 9, 2022)

Highlights

Enhancements

Bug Fixes

Deprecation

New Contributors

v0.3.0(Apr 1, 2022)

Highlights

Enhancements

Bug Fixes

Experiment Codes for Publications

New Contributors

v0.2.3(Nov 15, 2021)

Bug Fixes

v0.2.2(Nov 4, 2021)

Bug Fixes

v0.2.1(Oct 27, 2021)

Highlights

Enhancements

ANN HNSW

Cost-sensitive Learning

XR-Transformer [#27, #64] (@jiong-zhang)

Better control over parameters for XLinear, XTransformer and Text2text [#64, #78, #80] (@jiong-zhang)

Other new functionalities

Bug Fixes

v0.1.0(Apr 26, 2021)

Highlights

Owner

Amazon

Class-Attentive Diffusion Network for Semi-Supervised Classification [AAAI'21] (official implementation)

Semantic Segmentation Suite in TensorFlow

Topic Modelling for Humans

Online-compatible Unsupervised Non-resonant Anomaly Detection Repository

lightweight python wrapper for vowpal wabbit

Spectral Temporal Graph Neural Network (StemGNN in short) for Multivariate Time-series Forecasting

Keras like implementation of Deep Learning architectures from scratch using numpy.

Source code for paper "Deep Diffusion Models for Robust Channel Estimation", TBA.

OpenGAN: Open-Set Recognition via Open Data Generation

Reinforcement learning library in JAX.

This is a TensorFlow implementation for C2-Rec

On the adaptation of recurrent neural networks for system identification

WRENCH: Weak supeRvision bENCHmark

Contour-guided image completion with perceptual grouping (BMVC 2021 publication)

AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention

Audio Domain Adaptation for Acoustic Scene Classification using Disentanglement Learning