A scikit-learn based module for multi-label et. al. classification

Overview

scikit-multilearn

PyPI version License Build Status Linux and OSX Build Status Windows

scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Python packages (numpy, scipy) and follows a similar API to that of scikit-learn.

Features

  • Native Python implementation. A native Python implementation for a variety of multi-label classification algorithms. To see the list of all supported classifiers, check this link.

  • Interface to Meka. A Meka wrapper class is implemented for reference purposes and integration. This provides access to all methods available in MEKA, MULAN, and WEKA — the reference standard in the field.

  • Builds upon giants! Team-up with the power of numpy and scikit. You can use scikit-learn's base classifiers as scikit-multilearn's classifiers. In addition, the two packages follow a similar API.

Dependencies

In most cases you will want to follow the requirements defined in the requirements/*.txt files in the package.

Base dependencies

scipy
numpy
future
scikit-learn
liac-arff # for loading ARFF files
requests # for dataset module
networkx # for networkX base community detection clusterers
python-louvain # for networkX base community detection clusterers
keras

GPL-incurring dependencies for two clusterers

python-igraph # for igraph library based clusterers
python-graphtool # for graphtool base clusterers

Note: Installing graphtool is complicated, please see: graphtool install instructions

Installation

To install scikit-multilearn, simply type the following command:

$ pip install scikit-multilearn

This will install the latest release from the Python package index. If you wish to install the bleeding-edge version, then clone this repository and run setup.py:

$ git clone https://github.com/scikit-multilearn/scikit-multilearn.git
$ cd scikit-multilearn
$ python setup.py

Basic Usage

Before proceeding to classification, this library assumes that you have a dataset with the following matrices:

  • x_train, x_test: training and test feature matrices of size (n_samples, n_features)
  • y_train, y_test: training and test label matrices of size (n_samples, n_labels)

Suppose we wanted to use a problem-transformation method called Binary Relevance, which treats each label as a separate single-label classification problem, to a Support-vector machine (SVM) classifier, we simply perform the following tasks:

# Import BinaryRelevance from skmultilearn
from skmultilearn.problem_transform import BinaryRelevance

# Import SVC classifier from sklearn
from sklearn.svm import SVC

# Setup the classifier
classifier = BinaryRelevance(classifier=SVC(), require_dense=[False,True])

# Train
classifier.fit(X_train, y_train)

# Predict
y_pred = classifier.predict(X_test)

More examples and use-cases can be seen in the documentation. For using the MEKA wrapper, check this link.

Contributing

This project is open for contributions. Here are some of the ways for you to contribute:

  • Bug reports/fix
  • Features requests
  • Use-case demonstrations
  • Documentation updates

In case you want to implement your own multi-label classifier, please read our Developer's Guide to help you integrate your implementation in our API.

To make a contribution, just fork this repository, push the changes in your fork, open up an issue, and make a Pull Request!

We're also available in Slack! Just go to our slack group.

Cite

If you used scikit-multilearn in your research or project, please cite our work:

@ARTICLE{2017arXiv170201460S,
   author = {{Szyma{\'n}ski}, P. and {Kajdanowicz}, T.},
   title = "{A scikit-based Python environment for performing multi-label classification}",
   journal = {ArXiv e-prints},
   archivePrefix = "arXiv",
   eprint = {1702.01460},
   year = 2017,
   month = feb
}
Comments
  • Meka wrapper on Windows

    Meka wrapper on Windows

    Hi everyone,

    I am new to Python and the scikit-multilearn repository so thank you in advance for your patience. I tried executing the following code through the Spyder application on Ananconda:

    from sklearn.datasets import make_multilabel_classification
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import hamming_loss
    from skmultilearn.ext import Meka
    
    X, y = make_multilabel_classification(sparse = True,
        return_indicator = 'sparse')
    
    X_train, X_test, y_train, y_test = train_test_split(X,
        y,
        test_size=0.33)
    
    meka = Meka(
        meka_classifier = "meka.classifiers.multilabel.LC",
        weka_classifier = "weka.classifiers.bayes.NaiveBayes",
        meka_classpath = "C:/Program Files/meka-release-1.9.2-SNAPSHOT/lib/",
        java_command = "C:/Program Files/Java")
    
    meka.fit(X_train, y_train)
    

    and got the following error:

    > meka.fit(X_train, y_train)
    > Traceback (most recent call last):
    > 
    >   File "<ipython-input-2-f6b53f2230d2>", line 1, in <module>
    >     meka.fit(X_train, y_train)
    > 
    >   File "C:\Users\Edward Yapp\Anaconda3\lib\site-packages\skmultilearn\ext\meka.py", line 153, in fit
    >     self.remove_temporary_files([train_arff, classifier_dump_file])
    > 
    >   File "C:\Users\Edward Yapp\Anaconda3\lib\site-packages\skmultilearn\ext\meka.py", line 80, in remove_temporary_files
    >     os.remove(file_name.name)
    > 
    > PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\EDWARD~1\\AppData\\Local\\Temp\\tmpruhwexku'
    

    Perhaps it is something to do with different file closing mechanisms between Windows and Linux? I tried commenting out the os.remove parts of the code and obtained this subsequent error:

    > meka.fit(X_train, y_train)
    > Traceback (most recent call last):
    > 
    >   File "<ipython-input-3-f6b53f2230d2>", line 1, in <module>
    >     meka.fit(X_train, y_train)
    > 
    >   File "C:\Users\Edward Yapp\Anaconda3\lib\site-packages\skmultilearn\ext\meka.py", line 163, in fit
    >     self.run_meka_command(input_args)
    > 
    >   File "C:\Users\Edward Yapp\Anaconda3\lib\site-packages\skmultilearn\ext\meka.py", line 113, in run_meka_command
    >     meka_command), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    > 
    >   File "C:\Users\Edward Yapp\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 210, in __init__
    >     super(SubprocessPopen, self).__init__(*args, **kwargs)
    > 
    >   File "C:\Users\Edward Yapp\Anaconda3\lib\subprocess.py", line 709, in __init__
    >     restore_signals, start_new_session)
    > 
    >   File "C:\Users\Edward Yapp\Anaconda3\lib\subprocess.py", line 997, in _execute_child
    >     startupinfo)
    > 
    > FileNotFoundError: [WinError 2] The system cannot find the file specified
    

    Any advice would be much appreciated. Let me know if there is more information that I could provide.

    Regards, Edward

    opened by ekyy2 24
  • MlKNN:TypeError: __init__() takes 1 positional argument but 2 were given

    MlKNN:TypeError: __init__() takes 1 positional argument but 2 were given

    When mlknn is used, X is passed in_ train,Y_ When running a train, it always reports an error, saying that one more parameter has been passed. You can obviously pass two parameters. I don't know where the error is

    opened by Foreverkobe1314 12
  • ModuleNotFoundError: No module named 'skmultilearn.model_selection'

    ModuleNotFoundError: No module named 'skmultilearn.model_selection'

    At the risk of being told this is a Stackoverflow issue:

    Trying to run: iterative_train_test_split example from the http://scikit.ml/stratification.html page.

    Did a pip install:

    scikit-multilearn in /anaconda3/lib/python3.6/site-packages (0.0.5)

    would this be the wrong version?

    opened by MaartenKool 11
  • Weka wrapper issue

    Weka wrapper issue

    Hello everyone,

    I am trying to run the MEKA wrapper in my python code using skmultilearn. I am using the code in the paragraph 4.2 in http://scikit.ml/meka.html step by step. However, I got this error:

    File "C:\Users\ferna\Anaconda3\lib\site-packages\skmultilearn\ext\meka.py", line 374, in parse_output predictions = self.output.split(predictions_split_head)[1].split(

    IndexError: list index out of range

    I have tried the code in three different machines and keep staying. you can find it in the figure attached.

    code

    What is wrong?

    opened by FernandoSaez95 10
  • get_params and set_params fixed / cross validation test case added

    get_params and set_params fixed / cross validation test case added

    I can't really test the cross validation itself, though cloning (sklearn.base.clone) works. Python gets a SIGKILL from somewhere. Maybe something on my side is wrong, should work though. Other than in the other pull request the deep copying of classifier (i.e.) works. (hasattr(getattr(self, attr), 'get_params') is the correct one)

    See #29 for reference.

    opened by ChristianSch 9
  • Problem with installing scikit-multilearn

    Problem with installing scikit-multilearn

    May I know how to install scikit-multilearn?

    I have tried doing pip install scikit-multilearn from command prompt and it managed to install successfully. "Requirement already satisfied: scikit-learn in C:\Users\Name\appdata\local\continuum\anaconda3\envs\name_env\lib\site-packages"

    But when I tried to run the command after activating my environment, "from skmultilearn.problem_transform import BinaryRelevance" on spyder, I had the error: ImportError: No module named 'skmultilearn'

    I am using Python 3.6.3 :: Anaconda

    Anybody can help me on this?

    invalid 
    opened by Hancminnah 8
  • skmultilearn.embedding ImportError: No module named 'openne'

    skmultilearn.embedding ImportError: No module named 'openne'

    Is there a specific installment requirement for openne when using skmultilearn.embedding? I installed scikit-multilearn 0.2.0 today via pip.

    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    <ipython-input-5-0e6022fdf0e0> in <module>
    ----> 1 from skmultilearn.embedding import CLEMS
    
    ~/repository/.venv/lib/python3.5/site-packages/skmultilearn/embedding/__init__.py in <module>
         30 
         31 if not (sys.version_info[0] == 2 or platform.architecture()[0] == '32bit'):
    ---> 32     from .openne import OpenNetworkEmbedder
         33 
         34     __all__.append('OpenNetworkEmbedder')
    
    ~/repository/.venv/lib/python3.5/site-packages/skmultilearn/embedding/openne.py in <module>
          1 from copy import copy
    ----> 2 from openne.gf import GraphFactorization
          3 from openne.graph import Graph
          4 from openne.grarep import GraRep
          5 from openne.hope import HOPE
    
    ImportError: No module named 'openne'
    

    I tried installing openne from the official repo into the same python environment. It seems to have successfully installed the package but did not solve the issue. (by the way, I did not install requirements.txt specified by openne because my environment already had the required packages installed)

    openne installment procedures and messages
    ~/repository$ git clone https://github.com/thunlp/OpenNE.git && cd OpenNE/src
    ~/repository/OpenNE/src$ pipenv run python setup.py install
    running install
    running bdist_egg
    running egg_info
    creating openne.egg-info
    writing dependency_links to openne.egg-info/dependency_links.txt
    writing top-level names to openne.egg-info/top_level.txt
    writing openne.egg-info/PKG-INFO
    writing manifest file 'openne.egg-info/SOURCES.txt'
    reading manifest file 'openne.egg-info/SOURCES.txt'
    writing manifest file 'openne.egg-info/SOURCES.txt'
    installing library code to build/bdist.linux-x86_64/egg
    running install_lib
    running build_py
    creating build
    creating build/lib
    creating build/lib/openne
    copying openne/__main__.py -> build/lib/openne
    copying openne/graph.py -> build/lib/openne
    copying openne/classify.py -> build/lib/openne
    copying openne/line.py -> build/lib/openne
    copying openne/hope.py -> build/lib/openne
    copying openne/walker.py -> build/lib/openne
    copying openne/node2vec.py -> build/lib/openne
    copying openne/grarep.py -> build/lib/openne
    copying openne/lap.py -> build/lib/openne
    copying openne/gf.py -> build/lib/openne
    copying openne/tadw.py -> build/lib/openne
    copying openne/__init__.py -> build/lib/openne
    copying openne/lle.py -> build/lib/openne
    copying openne/sdne.py -> build/lib/openne
    creating build/lib/openne/gcn
    copying openne/gcn/models.py -> build/lib/openne/gcn
    copying openne/gcn/train.py -> build/lib/openne/gcn
    copying openne/gcn/metrics.py -> build/lib/openne/gcn
    copying openne/gcn/utils.py -> build/lib/openne/gcn
    copying openne/gcn/layers.py -> build/lib/openne/gcn
    copying openne/gcn/__init__.py -> build/lib/openne/gcn
    copying openne/gcn/gcnAPI.py -> build/lib/openne/gcn
    copying openne/gcn/inits.py -> build/lib/openne/gcn
    warning: build_py: byte-compiling is disabled, skipping.
    
    creating build/bdist.linux-x86_64
    creating build/bdist.linux-x86_64/egg   
    creating build/bdist.linux-x86_64/egg/openne
    creating build/bdist.linux-x86_64/egg/openne/gcn
    copying build/lib/openne/gcn/models.py -> build/bdist.linux-x86_64/egg/openne/gcn
    copying build/lib/openne/gcn/train.py -> build/bdist.linux-x86_64/egg/openne/gcn
    copying build/lib/openne/gcn/metrics.py -> build/bdist.linux-x86_64/egg/openne/gcn
    copying build/lib/openne/gcn/utils.py -> build/bdist.linux-x86_64/egg/openne/gcn
    copying build/lib/openne/gcn/layers.py -> build/bdist.linux-x86_64/egg/openne/gcn
    copying build/lib/openne/gcn/__init__.py -> build/bdist.linux-x86_64/egg/openne/gcn
    copying build/lib/openne/gcn/gcnAPI.py -> build/bdist.linux-x86_64/egg/openne/gcn
    copying build/lib/openne/gcn/inits.py -> build/bdist.linux-x86_64/egg/openne/gcn
    copying build/lib/openne/__main__.py -> build/bdist.linux-x86_64/egg/openne
    copying build/lib/openne/graph.py -> build/bdist.linux-x86_64/egg/openne
    copying build/lib/openne/classify.py -> build/bdist.linux-x86_64/egg/openne
    copying build/lib/openne/line.py -> build/bdist.linux-x86_64/egg/openne
    copying build/lib/openne/hope.py -> build/bdist.linux-x86_64/egg/openne
    copying build/lib/openne/walker.py -> build/bdist.linux-x86_64/egg/openne
    copying build/lib/openne/node2vec.py -> build/bdist.linux-x86_64/egg/openne
    copying build/lib/openne/grarep.py -> build/bdist.linux-x86_64/egg/openne
    copying build/lib/openne/lap.py -> build/bdist.linux-x86_64/egg/openne
    copying build/lib/openne/gf.py -> build/bdist.linux-x86_64/egg/openne
    copying build/lib/openne/tadw.py -> build/bdist.linux-x86_64/egg/openne
    copying build/lib/openne/__init__.py -> build/bdist.linux-x86_64/egg/openne
    copying build/lib/openne/lle.py -> build/bdist.linux-x86_64/egg/openne
    copying build/lib/openne/sdne.py -> build/bdist.linux-x86_64/egg/openne
    warning: install_lib: byte-compiling is disabled, skipping.
    
    creating build/bdist.linux-x86_64/egg/EGG-INFO
    copying openne.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
    copying openne.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
    copying openne.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
    copying openne.egg-info/not-zip-safe -> build/bdist.linux-x86_64/egg/EGG-INFO
    copying openne.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
    creating dist
    creating 'dist/openne-0.0.0-py3.5.egg' and adding 'build/bdist.linux-x86_64/egg' to it
    removing 'build/bdist.linux-x86_64/egg' (and everything under it)
    Processing openne-0.0.0-py3.5.egg
    creating /home/hatta/repository/.venv/lib/python3.5/site-packages/openne-0.0.0-py3.5.egg
    Extracting openne-0.0.0-py3.5.egg to /home/hatta/repository/.venv/lib/python3.5/site-packages
    Adding openne 0.0.0 to easy-install.pth file
    
    Installed /home/hatta/repository/.venv/lib/python3.5/site-packages/openne-0.0.0-py3.5.egg
    Processing dependencies for openne==0.0.0
    Finished processing dependencies for openne==0.0.0
    

    Any help appreciated! Thanks in advance.

    opened by kznovo 7
  • AttributeError: module 'arff' has no attribute 'COO'

    AttributeError: module 'arff' has no attribute 'COO'

    I used MOA to generate a multi-label data set with errors during the reading process. `from skmultilearn.dataset import load_from_arff

    path_to_arff_file = 'dataset/Drift-RTG8.arff' label_count = 8 label_location="start" arff_file_is_sparse = True

    x, y, feature_names, label_names = load_from_arff( path_to_arff_file, label_count=label_count, label_location=label_location, load_sparse=arff_file_is_sparse, return_attribute_definitions=True )

    print(x, y, feature_names[:3], label_names[:3])`

    Error message

    "E:\Program Files\python\python.exe" "E:\Program Files\pycharm\PyCharm 2017.1.4\helpers\pycharm_jb_unittest_runner.py" --path "E:/Program Files/multi-Label/test.py" Testing started at 9:33 ... Launching unittests with arguments python -m unittest E:/Program Files/multi-Label/test.py in E:\Program Files\multi-Label Traceback (most recent call last): File "E:\Program Files\pycharm\PyCharm 2017.1.4\helpers\pycharm_jb_unittest_runner.py", line 35, in main(argv=args, module=None, testRunner=unittestpy.TeamcityTestRunner, buffer=not JB_DISABLE_BUFFERING) File "E:\Program Files\python\lib\unittest\main.py", line 100, in init self.parseArgs(argv) File "E:\Program Files\python\lib\unittest\main.py", line 147, in parseArgs self.createTests() File "E:\Program Files\python\lib\unittest\main.py", line 159, in createTests self.module) File "E:\Program Files\python\lib\unittest\loader.py", line 220, in loadTestsFromNames suites = [self.loadTestsFromName(name, module) for name in names] File "E:\Program Files\python\lib\unittest\loader.py", line 220, in suites = [self.loadTestsFromName(name, module) for name in names] File "E:\Program Files\python\lib\unittest\loader.py", line 154, in loadTestsFromName module = import(module_name) File "E:\Program Files\multi-Label\test.py", line 13, in return_attribute_definitions=True File "E:\Program Files\python\lib\site-packages\skmultilearn\dataset.py", line 223, in load_from_arff open(filename, 'r'), encode_nominal=encode_nominal, return_type=arff.COO AttributeError: module 'arff' has no attribute 'COO'

    Process finished with exit code 1 Empty test suite.

    Drift-RTG8.arff file

    @relation 'SYN_Z3.0L8X10S1: -C 8'

    @attribute class0 {0,1} @attribute class1 {0,1} @attribute class2 {0,1} @attribute class3 {0,1} @attribute class4 {0,1} @attribute class5 {0,1} @attribute class6 {0,1} @attribute class7 {0,1} @attribute nominal1 {value1,value2,value3,value4,value5} @attribute nominal2 {value1,value2,value3,value4,value5} @attribute nominal3 {value1,value2,value3,value4,value5} @attribute nominal4 {value1,value2,value3,value4,value5} @attribute nominal5 {value1,value2,value3,value4,value5} @attribute nominal6 {value1,value2,value3,value4,value5} @attribute nominal7 {value1,value2,value3,value4,value5} @attribute nominal8 {value1,value2,value3,value4,value5} @attribute nominal9 {value1,value2,value3,value4,value5} @attribute nominal10 {value1,value2,value3,value4,value5}

    @data

    {1 1,4 1,7 1,9 value4,10 value3,11 value5,12 value5,13 value5,14 value5,15 value2,16 value4,17 value4} {5 1,7 1,8 value5,9 value4,10 value3,11 value4,12 value3,13 value5,14 value3,15 value3,16 value2,17 value5} {2 1,7 1,10 value3,11 value3,12 value3,13 value3,15 value4,17 value3} {1 1,2 1,4 1,7 1,8 value4,9 value3,10 value2,11 value4,12 value4,13 value2,14 value5,16 value5}

    opened by haokeliu 7
  • Error while performing Binary Relevance or Label Powerset

    Error while performing Binary Relevance or Label Powerset

    I am trying to perform a simple classification using Binary Relevance or Label Powerset. I consistently encounter the error despite also trying to convert it into a sparse matrix. How do I overcome this?

    Here is my code:

    import pandas as pd
    import numpy as np
    from scipy import sparse
    from sklearn.naive_bayes import GaussianNB
    from sklearn.model_selection import train_test_split
    from skmultilearn.problem_transform import BinaryRelevance, LabelPowerset
    from sklearn.metrics import f1_score
    
    data = pd.read_csv("a_lucene_results.csv")
    y = data[['isA','isB','isC']]
    to_drop = ['id','isA','isB','isC']
    X = data.drop(to_drop,axis=1)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
    #X_train = sparse.csr_matrix(X_train)   # Here's the initialization of the sparse matrix.
    #X_test = sparse.csr_matrix(X_test)
    #y_train = sparse.csr_matrix(y_train)   # Here's the initialization of the sparse matrix.
    #y_test = sparse.csr_matrix(y_test)
    clf = BinaryRelevance(GaussianNB())
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print("The macro averaged F1-score is: %.3f" %(f1_score(y_pred, y_test, average='macro')))
    

    However, I always get this:

    Traceback (most recent call last):
      File "exp2.py", line 26, in <module>
        clf.fit(X_train, y_train)
      File "/home/ankur218/anaconda3/lib/python3.5/site-packages/skmultilearn/problem_transform/br.py", line 60, in fit
        X, sparse_format='csr', enforce_sparse=True)
      File "/home/ankur218/anaconda3/lib/python3.5/site-packages/skmultilearn/base/base.py", line 97, in ensure_input_format
        return matrix_creation_function_for_format(sparse_format)(X)
      File "/home/ankur218/anaconda3/lib/python3.5/site-packages/scipy/sparse/compressed.py", line 79, in __init__
        self._set_self(self.__class__(coo_matrix(arg1, dtype=dtype)))
      File "/home/ankur218/anaconda3/lib/python3.5/site-packages/scipy/sparse/compressed.py", line 32, in __init__
        arg1 = arg1.asformat(self.format)
      File "/home/ankur218/anaconda3/lib/python3.5/site-packages/scipy/sparse/base.py", line 287, in asformat
        return getattr(self, 'to' + format)()
      File "/home/ankur218/anaconda3/lib/python3.5/site-packages/scipy/sparse/coo.py", line 342, in tocsr
        data = np.empty_like(self.data, dtype=upcast(self.dtype))
      File "/home/ankur218/anaconda3/lib/python3.5/site-packages/scipy/sparse/sputils.py", line 51, in upcast
        raise TypeError('no supported conversion for types: %r' % (args,))
    TypeError: no supported conversion for types: (dtype('O'),)
    
    opened by ansin218 7
  • MLARAM bug

    MLARAM bug " return numpy.array(numpy.matrix(allranks))" raises ValueError: matrix must be 2-dimensional

    Hello,

    when running the following code, I get the error mentioned in the title:

    from skmultilearn.neurofuzzy import MLARAM
    mam = MLARAM(vigilance=0.9, threshold=0.02, neurons=[])
    from sklearn.datasets import make_multilabel_classification
    x, y = make_multilabel_classification(sparse = True, n_labels = 5,
      return_indicator = 'sparse', allow_unlabeled = False)
    mam.fit(x.todense(), y.todense())
    mam.predict(x.todense())
    

    Full traceback:

    
    Traceback (most recent call last):
      File "C:\Program Files\WinPython-32bit-3.6.1.0Qt5\python-3.6.1\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)
      File "<ipython-input-5-1c3b03743990>", line 1, in <module>
        mam.predict(x.todense())
      File "C:\Program Files\WinPython-32bit-3.6.1.0Qt5\python-3.6.1\lib\site-packages\skmultilearn\neurofuzzy\MLARAMfast.py", line 141, in predict
        ranks = self.predict_proba(X)
      File "C:\Program Files\WinPython-32bit-3.6.1.0Qt5\python-3.6.1\lib\site-packages\skmultilearn\neurofuzzy\MLARAMfast.py", line 233, in predict_proba
        return numpy.array(numpy.matrix(allranks))
      File "C:\Program Files\WinPython-32bit-3.6.1.0Qt5\python-3.6.1\lib\site-packages\numpy\matrixlib\defmatrix.py", line 274, in __new__
        raise ValueError("matrix must be 2-dimensional")
    ValueError: matrix must be 2-dimensional
    

    Also, note that MLARAM does not support sparse matrices contrary to what is mentioned in the comments. As an aside, in the same method, the test

    if len(X) == 0:
        return
    

    fails on sparse matrices and should probably be replaced by something like

            if scipy.sparse.issparse(X):
                if X.getnnz() == 0:
                    return
            elif len(X) == 0:
                return
    

    regards,

    Simon

    opened by simon-m 7
  • Error in the Example of hyperparameter tuning

    Error in the Example of hyperparameter tuning

    I run the test code on the documentation page http://scikit.ml/api/model_estimation.html#model-estimation about hyper-parameter tuning. I got an error from the case of ensemble classifier. ------------ Code --------------- from skmultilearn.ensemble.rakeld import RakelD from skmultilearn.problem_transform import BinaryRelevance, LabelPowerset from sklearn.model_selection import GridSearchCV from sklearn.naive_bayes import MultinomialNB

    x, y = make_multilabel_classification(sparse=True, n_labels=5, return_indicator='sparse', allow_unlabeled=False)

    parameters = { 'labelset_size': range(2, 3), 'classifier': [LabelPowerset(), BinaryRelevance()], 'classifier__classifier': [MultinomialNB()], 'classifier__classifier__alpha': [0.7, 1.0], }

    clf = GridSearchCV(RakelD(), parameters, scoring='f1_macro') clf.fit(x, y) -------------Error ------------------------------- ValueError: Found input variables with inconsistent numbers of samples: [66, 1]

    opened by jpzhangvincent 7
  • IterativeStratification use in medical and some datasets ValueError: Only one class present in y_true. ROC AUC score is not defined in that case

    IterativeStratification use in medical and some datasets ValueError: Only one class present in y_true. ROC AUC score is not defined in that case

    that means for some labels in y[train] that only have zero class , but i am sure that this label at least have two one class samples IterativeStratification does not work well

    opened by CquptZA 0
  • Random state parameter doomed to fail in IterativeStratification

    Random state parameter doomed to fail in IterativeStratification

    Hi there,

    First of all great job on stratification - it is already very useful in our recent project :)

    I encountered small issue tho I'm trying to use IterativeStratification and one it's parameters, random_state seems to be a trap

    This code works fine:

    from skmultilearn.model_selection import IterativeStratification
    test_size = 0.2
    
    stratifier = IterativeStratification(
        n_splits=2, order=2,
        sample_distribution_per_fold=[
            test_size, 1.0 - test_size],
    )
    
    train_indices, test_indices = next(
        stratifier.split(
            X=np.random.random((100,4)), 
            y=(np.random.random((100,4)) > 0.5).astype(int)    
        )
    )
    

    while this

    from skmultilearn.model_selection import IterativeStratification
    test_size = 0.2
    
    stratifier = IterativeStratification(
        n_splits=2, order=2,
        sample_distribution_per_fold=[
            test_size, 1.0 - test_size],
        random_state = 42
    )
    
    train_indices, test_indices = next(
        stratifier.split(
            X=np.random.random((100,4)), 
            y=(np.random.random((100,4)) > 0.5).astype(int)    
        )
    )
    

    produces

    ValueError: Setting a random_state has no effect since shuffle is False. You should leave random_state to its default (None), or set shuffle=True.
    

    but shuffle is hardcoded as False in IterativeStratification super class call

    https://github.com/scikit-multilearn/scikit-multilearn/blob/e6eabf0062abca4a482d0e24426c61b4788fc6b3/skmultilearn/model_selection/iterative_stratification.py#L184

    opened by kamilc-bst 2
  • MLkNN breaks when sklearn >= 1.0

    MLkNN breaks when sklearn >= 1.0

    Hi, dear colleagues, thank you for your work.

    This is a bug report.

    With scikit-learn==0.24.0, this code (X_train is a dense 2D numpy array, y_train is a sparse scipy matrix with the same number of rows) works:

    classifier = MLkNN(k=2)
    classifier.fit(X=X_train, y=y_train)
    

    With scikit-learn==1.0 and scikit-learn==1.1.3, however, I get:

        classifier.fit(X=X_train, y=y_train)
      File "C:\Users\<me>\AppData\Local\Programs\Python\Python38\lib\site-packages\skmultilearn\adapt\mlknn.py", line 218, in fit
        self._cond_prob_true, self._cond_prob_false = self._compute_cond(X, self._label_cache)
      File "C:\Users\<me>\AppData\Local\Programs\Python\Python38\lib\site-packages\skmultilearn\adapt\mlknn.py", line 165, in _compute_cond
        self.knn_ = NearestNeighbors(self.k).fit(X)
    TypeError: __init__() takes 1 positional argument but 2 were given
    

    Thanks.

    opened by alexeyev 0
  • predict_proba() with LinearSVC as classifier

    predict_proba() with LinearSVC as classifier

    Hi, predict_proba() uses a self.classifier.predict_proba() but when a classifier is set to LinearSVC, it does not have predict_proba(). it only has decision_function(). Is there any workaround?

    opened by graceleetr 0
  • Add ability to fix RNG state so that folds from IterativeStratification are reproducible

    Add ability to fix RNG state so that folds from IterativeStratification are reproducible

    Currently, it is impossible to pass an RNG seed to IterativeStratification, which makes getting reproducible results from it impossible. This PR exposes the shuffle parameter of the base class in the IterativeStratification constructor. It also makes some changes that allow the CV results to become reproducible.

    Notably, it changes all the np.random.choice calls from within IterativeStratification to use the RNG seeded in the constructor (or the global NumPy RNG if the seed is none). It also makes some changes to the _fold_tie_break function to allow it to use the RNG state.

    These changes make the folds produced by IterativeStratification reproducible if one passes random_state to the constructor.

    This should fix #144. I also should mention that credit for investigating the causes of non-reproducibility should go to @VaelK and @blackcat84 (see #144 )

    opened by x0wllaar 1
Releases(0.2.0)
  • 0.2.0(Dec 10, 2018)

    A new feature release:

    • first python implementation of multi-label SVM (MLTSVM)
    • a general multi-label embedding framework with several embedders supported (LNEMLC, CLEMS)
    • balanced k-means clusterer from HOMER implemented
    • wrapper for Keras model use in scikit-multilearn
    Source code(tar.gz)
    Source code(zip)
  • 0.1.0(Sep 3, 2018)

    Fix a lot of bugs and generally improve stability, cross-platform functionality standard and unit test coverage. This release has been tested with a large set of unit tests that work across Windows

    Also, new features:

    • multi-label stratification algorithm and stratification quality measures
    • a robust reorganization of label space division, alongside with a working stochastic blockmodel approach and new underlying layer - graph builders that allow using graph models for dividing the label space based not just on label co-occurence but on any kind of network relationships between labels you can come up with
    • meka wrapper works fully cross-platform now, including windows 10
    • multi-label data set downloading and load/save functionality brought in, like sklearn's dataset
    • kNN models support sparse input
    • MLARAM models support sparse input
    • BSD-compatible label space partitioning via NetworkX
    • dependence on GPL libraries made optional
    • working predict_proba added for label space partitioning methods
    • MLARAM moved to from neurofuzzy to adapt
    • test coverage increased to 94%
    • Classifier Chains allow specifying the chain order
    • lots of documentation updates
    Source code(tar.gz)
    Source code(zip)
  • 0.0.5(Feb 25, 2017)

    • a general matrix-based label space clusterer has been added which can cluster the output space using any scikit-learn compatible clusterer (incl. k-means) support for more single-class and multi-class classifiers you can now use problem transformation approaches with - your favourite neural networks/deep learning libraries: theano, tensorflow, keras, scikit-neuralnetworks support for label powerset based stratified kfold added
    • graph-tool clusterer supports weighted graphs again and includes stochastic blockmodel calibration
    • bugs were fixed in: classifier chains and hierarchical neuro fuzzy clasifiers
    Source code(tar.gz)
    Source code(zip)
  • 0.0.4(Feb 10, 2017)

    • *kNN classifiers support sparse matrices properly
    • support for the new model_selection API from scikit-learn
    • extended graph-based label space clusteres to allow taking probability of a label occuring alone into consideration
    • compatible with newest graphtool
    • support the case when meka decides that an observation doesn't have any labels assigned
    • HARAM classifier provided by Fernando Benitez from University of Konstanz
    • predict_proba added to problem transformation classifiers
    • ported to python 3
    Source code(tar.gz)
    Source code(zip)
Owner
A multi-label classification library for Python.
Greykite: A flexible, intuitive and fast forecasting library

The Greykite library provides flexible, intuitive and fast forecasts through its flagship algorithm, Silverkite.

LinkedIn 1.4k Jan 15, 2022
Sleep stages are classified with the help of ML. We have used 4 different ML algorithms (SVM, KNN, RF, NN) to demonstrate them

Sleep stages are classified with the help of ML. We have used 4 different ML algorithms (SVM, KNN, RF, NN) to demonstrate them.

Anirudh Edpuganti 3 Apr 03, 2022
Spark development environment for k8s

Local Spark Dev Env with Docker Development environment for k8s. Using the spark-operator image to ensure it will be the same environment. Start conta

Otacilio Filho 18 Jan 04, 2022
BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python

BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python. Some of the algorithms included are mor

Jared M. Smith 40 Aug 26, 2022
Time series changepoint detection

changepy Changepoint detection in time series in pure python Install pip install changepy Examples from changepy import pelt from cha

Rui Gil 92 Nov 08, 2022
Hypernets: A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.

A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.

DataCanvas 216 Dec 23, 2022
machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service

This is a machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service. We initially made th

Krishna Priyatham Potluri 73 Dec 01, 2022
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

Daniel Formoso 5.7k Dec 30, 2022
Implementation of K-Nearest Neighbors Algorithm Using PySpark

KNN With Spark Implementation of KNN using PySpark. The KNN was used on two separate datasets (https://archive.ics.uci.edu/ml/datasets/iris and https:

Zachary Petroff 4 Dec 30, 2022
Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale.

Model Search Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers sp

AriesTriputranto 1 Dec 13, 2021
PyPOTS - A Python Toolbox for Data Mining on Partially-Observed Time Series

A python toolbox/library for data mining on partially-observed time series, supporting tasks of forecasting/imputation/classification/clustering on incomplete multivariate time series with missing va

Wenjie Du 179 Dec 31, 2022
Python/Sage Tool for deriving Scattering Matrices for WDF R-Adaptors

R-Solver A Python tools for deriving R-Type adaptors for Wave Digital Filters. This code is not quite production-ready. If you are interested in contr

8 Sep 19, 2022
NumPy-based implementation of a multilayer perceptron (MLP)

My own NumPy-based implementation of a multilayer perceptron (MLP). Several of its components can be tuned and played with, such as layer depth and size, hidden and output layer activation functions,

1 Feb 10, 2022
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

A unified Data Analytics and AI platform for distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray What is Analytics Zoo? Analytics Zo

2.5k Dec 28, 2022
pywFM is a Python wrapper for Steffen Rendle's factorization machines library libFM

pywFM pywFM is a Python wrapper for Steffen Rendle's libFM. libFM is a Factorization Machine library: Factorization machines (FM) are a generic approa

João Ferreira Loff 251 Sep 23, 2022
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Jan 06, 2023
PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.

An open-source, low-code machine learning library in Python 🚀 Version 2.3.5 out now! Check out the release notes here. Official • Docs • Install • Tu

PyCaret 6.7k Jan 08, 2023
Open source time series library for Python

PyFlux PyFlux is an open source time series library for Python. The library has a good array of modern time series models, as well as a flexible array

Ross Taylor 2k Jan 02, 2023
The Ultimate FREE Machine Learning Study Plan

The Ultimate FREE Machine Learning Study Plan

Patrick Loeber (Python Engineer) 2.5k Jan 05, 2023
Convoys is a simple library that fits a few statistical model useful for modeling time-lagged conversions.

Convoys is a simple library that fits a few statistical model useful for modeling time-lagged conversions. There is a lot more info if you head over to the documentation. You can also take a look at

Better 240 Dec 26, 2022