Predictive AI layer for existing databases.

Overview

MindsDB

MindsDB workflow Python supported PyPi Version PyPi Downloads MindsDB Community MindsDB Website

MindsDB is an open-source AI layer for existing databases that allows you to effortlessly develop, train and deploy state-of-the-art machine learning models using SQL queries. Tweet

Predictive AI layer for existing databases
MindsDB

Try it out

Contributing

To contribute to mindsdb, please check out our Contribution guide.

Current contributors

Made with contributors-img.

Report Issues

Please help us by reporting any issues you may have while using MindsDB.

License

Comments
  • [Bug]: Exception

    [Bug]: Exception "Lightgbm mixer not supported for type: tags" when following process plant tutorial

    Is there an existing issue for this?

    • [x] I have searched the existing issues

    Current Behaviour

    I'm following the Manufacturing process quality tutorial and during training I get the error Exception: Lightgbm mixer not supported for type: tags, raised at: /opt/conda/lib/python3.7/site-packages/mindsdb/interfaces/model/learn_process.py#177.

    Expected Behaviour

    I'd expect AutoML to train the model successfully

    Steps To Reproduce

    Follow the tutorial steps until training.
    

    Anything else?

    The tutorials seem to be of mixed quality, some resulting in errors, some in low model performance or "null" predictions (for the bus ride tutorial).

    bug documentation 
    opened by philfuchs 22
  • install issues on windows 10

    install issues on windows 10

    Describe the bug When installing mindsdb, the following error message is output with the following command.

    command: pip install --requirement reqs.txt

    error message: ERROR: Could not find a version that satisfies the requirement torch>=1.0.1.post2 (from lightwood==0.6.4->-r reqs.txt (line 25)) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2) ERROR: No matching distribution found for torch>=1.0.1.post2 (from lightwood==0.6.4->-r reqs.txt (line 25))

    Desktop (please complete the following information):

    • OS: windows 10

    Additional context I think pytorch for Windows is currently not available through PYPI, so using the commands in https://pytorch.org is a better way.

    bug 
    opened by YottaGin 22
  • Integration merlion issue2377

    Integration merlion issue2377

    1. Modify sql_query.py to support to define customized return columns and dtype of ml_handlers;
    2. [issue2377] Merlion integrated, forecaster: default, sarima, prophet, mses, detector: default, isolation forest, windstats, prophet;
    3. Corresponding test cases added, /mindsdb/tests/unit/test_merlion_handler.
    opened by rfsiten 21
  • [Bug]:  metadata-generation-failed

    [Bug]: metadata-generation-failed

    Is there an existing issue for this?

    • [X] I have searched the existing issues

    Current Behavior

    I keep on trying to pip install mindsdb. Screenshot 2022-05-10 11 13 31

    This is the message that it comes up with.

    Things tried:

    • Installing wheel
    • Upgrading pip
    • Installing sktime
    • googling/StackOverflow

    Expected Behavior

    I anticipated that I would be able to download the package.

    Steps To Reproduce

    pip install mindsdb
    

    Anything else?

    No response

    bug 
    opened by TyanBr 20
  • [Bug]: Not able to preview data due to internal server error HTTP 500

    [Bug]: Not able to preview data due to internal server error HTTP 500

    Is there an existing issue for this?

    • [X] I have searched the existing issues

    Current Behavior

    I was trying the mindsdb get started tutorial, that time i encountered HTTP 500 error, is it due to URL authorization scope not found ? or anything else. I tried googling why this happens but couldn't get a proper explanation. I can't do further steps it keeps on giving me a HTTP 500 error. Is it a common issue, coz i keep getting it on the website while running the query for the tutorial and can't preview my tables. Says that the query is not proper, i ran the exact code from tutorial. Is it is some issue from my side...anyway do help me overcome it. I really like mindsdb concept and an AI/ML enthusiast :)

    Here is the ss of what im getting while running the queries:

    Screenshot 2022-07-14 012410

    It says query with error but i am running the code from the tutorial the syntax is intact. Is there any changes to be made in the syntax?

    Expected Behavior

    I should be able to preview the home rentals data that i will train in the further steps.

    Steps To Reproduce

    Go to the mindsdb website and then start the home rentals demo tutorial. Run the demo code given in the step 1. Try to preview the data
    

    Anything else?

    nothing

    bug 
    opened by prathikshetty2002 19
  • Could not load module ModelInterface

    Could not load module ModelInterface

    hello can someone help to solve this :

    • ERROR:mindsdb-logger-f7442ec0-574d-11ea-a55e-106530eaf271:c:\users\drpbengrir\appdata\local\programs\python\python37\lib\site-packages\mindsdb\libs\controllers\transaction.py:188 - Could not load module ModelInterface
    bug 
    opened by simofilahi 18
  • FileNotFoundError: [Errno 2] No such file or directory: '/home/milia/.venvs/mindsdb/lib/python3.6/site-packages/mindsdb_storage/1_0_5/suicide_rates_light_model_metadata.pickle'

    FileNotFoundError: [Errno 2] No such file or directory: '/home/milia/.venvs/mindsdb/lib/python3.6/site-packages/mindsdb_storage/1_0_5/suicide_rates_light_model_metadata.pickle'

    Describe the bug A FileNotFoundError occurs when running the predict.py script described below.

    The full traceback is the following:

    Traceback (most recent call last):
      File "predict.py", line 12, in <module>
        result = Predictor(name='suicide_rates').predict(when={'country':'Greece','year':1981,'sex':'male','age':'35-54','population':300000})
      File "/home/milia/.venvs/mindsdb/lib/python3.6/site-packages/mindsdb/libs/controllers/predictor.py", line 472, in predict
        transaction = Transaction(session=self, light_transaction_metadata=light_transaction_metadata, heavy_transaction_metadata=heavy_transaction_metadata, breakpoint=breakpoint)
      File "/home/milia/.venvs/mindsdb/lib/python3.6/site-packages/mindsdb/libs/controllers/transaction.py", line 53, in __init__
        self.run()
      File "/home/milia/.venvs/mindsdb/lib/python3.6/site-packages/mindsdb/libs/controllers/transaction.py", line 259, in run
        self._execute_predict()
      File "/home/milia/.venvs/mindsdb/lib/python3.6/site-packages/mindsdb/libs/controllers/transaction.py", line 157, in _execute_predict
        with open(CONFIG.MINDSDB_STORAGE_PATH + '/' + self.lmd['name'] + '_light_model_metadata.pickle', 'rb') as fp:
    FileNotFoundError: [Errno 2] No such file or directory: '/home/milia/.venvs/mindsdb/lib/python3.6/site-packages/mindsdb_storage/1_0_5/suicide_rates_light_model_metadata.pickle'
    

    To Reproduce Steps to reproduce the behavior:

    1. Create a train.py script using the dataset: https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016#master.csv. The train.py script is the one below:
    from mindsdb import Predictor
    
    Predictor(name='suicide_rates').learn(
        to_predict='suicides_no', # the column we want to learn to predict given all the data in the file
        from_data="master.csv" # the path to the file where we can learn from, (note: can be url)
    )
    
    1. Run the train.py script.
    2. Create and run the predict.py script:
    from mindsdb import Predictor
    
    # use the model to make predictions
    result = Predictor(name='suicide_rates').predict(when={'country':'Greece','year':1981,'sex':'male','age':'35-54','population':300000})
    
    # you can now print the results
    print(result)
    
    1. See error

    Expected behavior What was expected was to see the results.

    Desktop (please complete the following information):

    • OS: Ubuntu 18.04.2 LTS
    • mindsdb 1.0.5
    • python 3.6.7
    bug 
    opened by mlliarm 18
  • MySQL / Singlestore DB SSL support

    MySQL / Singlestore DB SSL support

    Problem I cannot connect my Singlestore DB (MySQL driver) because Mindsdb doesn't support SSL options.

    Describe the solution you'd like Full support for MySQL SSL (key, cert, ca).

    Describe alternatives you've considered No alternative is possible at the moment, security first.

    enhancement question 
    opened by pierre-b 17
  • Caching historical data for streams

    Caching historical data for streams

    I'll be using an example here and generalize whenever needed, let's say we have the following dataset we train a timeseries predictor on:

    time,gb,target,aux
    1     ,A, 7,        foo
    2     ,A, 10,      foo
    3     ,A,  12,     bar
    4     ,A,  14,     bar
    2     ,B,  5,       foo
    4     ,B,  9,       foo
    

    In this case target is what we are predicting, gb is the column we are grouping on and we are ordering by time. aux is an unrelated column that's not timeseries in nature and just used "normally".

    We train a predictor with a window of n

    Then let's say we have an input stream that looks something like this:

    time, gb, target, aux
    6,      A ,   33,     foo
    7,      A,    54,     foo
    

    Caching

    First, we will need to store, for each value of the column gb n recrods.

    So, for example, if n==1 we would save the last row in the data above, if n==2 we would save both, whem new rows come in, we un-cache the older rows`.

    Infering

    Second, when a new datapoint comes into the input stream we'll need to "infer" that the prediction we have to make is actually for the "next" datapoint. Which is to say that when: 7, A, 54, foo comes in we need to infer that we need to actually make predictions for:

    8, A, <this is what we are predicting>, foo

    The challenge here is how do we infer that the next timestamp is 8, one simple way to do this is to just subtract from the previous record, but that's an issue for the first observation (since we don't have a previous record to substract from, unless we cache part of the training data) or we could add a feature to native, to either:

    a) Provide a delta argument for each group by (representing by how much we increment the order column[s]) b) Have an argument when doing timeseries prediction that tells it to predict for the "next" row and then do the inferences under the cover.

    @paxcema let me know which of these features would be easy to implement in native, since you're now the resident timeseries expert.

    enhancement 
    opened by George3d6 16
  • Data extraction with mindsdb v.2

    Data extraction with mindsdb v.2

    This is a followup to #334, which was about the same use case and dataset, but different version of MindsDB with different errors.

    Your Environment

    Google Colab.

    • Python version: 3.6
    • Pip version: 19.3.1
    • Mindsdb version you tried to install: 2.13.8

    Describe the bug Running .learn() fails.

    [nltk_data]   Package stopwords is already up-to-date!
    
    /usr/local/lib/python3.6/dist-packages/lightwood/mixers/helpers/ranger.py:86: UserWarning: This overload of addcmul_ is deprecated:
    	addcmul_(Number value, Tensor tensor1, Tensor tensor2)
    Consider using one of the following signatures instead:
    	addcmul_(Tensor tensor1, Tensor tensor2, *, Number value) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
      exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
    
    Downloading: 100%
    232k/232k [00:00<00:00, 1.31MB/s]
    
    
    Downloading: 100%
    442/442 [00:06<00:00, 68.0B/s]
    
    
    Downloading: 100%
    268M/268M [00:06<00:00, 43.1MB/s]
    
    
    Token indices sequence length is longer than the specified maximum sequence length for this model (606 > 512). Running this sequence through the model will result in indexing errors
    ERROR:mindsdb-logger-ac470732-3303-11eb-bbe9-0242ac1c0002---eb4b7352-566f-4a1b-aef2-c286163e1a10:/usr/local/lib/python3.6/dist-packages/mindsdb_native/libs/controllers/transaction.py:173 - Could not load module ModelInterface
    
    ERROR:mindsdb-logger-ac470732-3303-11eb-bbe9-0242ac1c0002---eb4b7352-566f-4a1b-aef2-c286163e1a10:/usr/local/lib/python3.6/dist-packages/mindsdb_native/libs/controllers/transaction.py:239 - index out of range in self
    
    ---------------------------------------------------------------------------
    
    IndexError                                Traceback (most recent call last)
    
    <ipython-input-13-a5e3bd095e46> in <module>()
          7 mdb.learn(
          8     from_data=train,
    ----> 9     to_predict='birth_year' # the column we want to learn to predict given all the data in the file
         10 )
    
    20 frames
    
    /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
       1812         # remove once script supports set_grad_enabled
       1813         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
    -> 1814     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
       1815 
       1816 
    
    IndexError: index out of range in self
    

    https://colab.research.google.com/drive/1a6WRSoGK927m3eMkVdlwBhW6-3BtwaAO?usp=sharing

    To Reproduce

    1. Rerun this notebook on Google Colab https://github.com/opendataby/vybary2019/blob/e08c32ac51e181ddce166f8a4fbf968f81bd2339/canal03-parsing-with-mindsdb.ipynb
    bug 
    opened by abitrolly 15
  • Upload new Predictor

    Upload new Predictor

    Your Environment Scout The Scout upload Prediction does not function when a zip file is trying to upload.

    Errror is "Just .zip files are allowed" even though the zip file was selected. 
    
    

    Question on prediction, is there any way to upload a model of Tensorflow, or do we need to Convert TensorFlow model into the MindDB prediction model?

    bug question 
    opened by Winthan 15
  • [Bug]: TS dates not interpreted correctly

    [Bug]: TS dates not interpreted correctly

    Is there an existing issue for this?

    • [X] I have searched the existing issues

    Current Behavior

    Predictions from a timeseries model (using >LATEST) gives dates starting Dec 1st, however there are Dec dates (e.g. 6th) within the training data.

    Expected Behavior

    Predictions using ">LATEST" should start 1 day after the latest date in the training data.

    Steps To Reproduce

    CREATE MODEL mindsdb.callsdata_time
    FROM files
    (SELECT * from CallData)
    PREDICT CallsOfferedTo5
    ORDER BY DateTime
    GROUP BY PrecisionQueue
    WINDOW 8    
    HORIZON 4;
    
    
    SELECT m.DateTime as date,
      m.CallsOfferedTo5 as forecast
      FROM mindsdb.callsdata_time as m
      JOIN files.CallData as t
      WHERE t.DateTime > LATEST
      AND t.PrecisionQueue = 5000;
    
    
    
    ### Anything else?
    
    **Training data**
    [export (3) (1).csv](https://github.com/mindsdb/mindsdb/files/10335508/export.3.1.csv)
    
    **Example of dates in Dec**
    ![image](https://user-images.githubusercontent.com/34073127/210332080-d2471ece-5c3a-4a3c-942c-ea1e41a2d4cd.png)
    
    **Model output**
    ![image](https://user-images.githubusercontent.com/34073127/210332293-418f2a5a-c6d7-4bba-98a7-1effb8e1bca2.png)
    
    bug 
    opened by tomhuds 1
  • [ENH] Override dtypes - LW handler

    [ENH] Override dtypes - LW handler

    Description

    Enables support for overriding automatically inferred data types when using the Lightwood handler.

    Type of change

    (Please delete options that are not relevant)

    • [ ] 🐛 Bug fix (non-breaking change which fixes an issue)
    • [x] ⚡ New feature (non-breaking change which adds functionality)
    • [ ] 📢 Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [x] 📄 This change requires a documentation update

    What is the solution?

    If the user provides:

    CREATE ...
    USING 
    dtype_dict = {
       'col1': 'type',
        ...
    };
    

    The handler will extract this mapping and use it to build a modified problem definition, which will trigger the default encoder belonging to each new dtype (note: manually specifying encoders is out of scope for this PR and will be added at a later date).

    Checklist:

    • [x] My code follows the style guidelines(PEP 8) of MindsDB.
    • [x] I have commented my code, particularly in hard-to-understand areas.
    • [ ] I have updated the documentation, or created issues to update them.
    • [x] I fixed|updated|added unit tests and integration tests for each feature (if applicable).
    • [x] I have checked that my code additions will fail neither code linting checks nor unit test.
    • [ ] I have shared a short loom video or screenshots demonstrating any new functionality.
    opened by paxcema 0
  • updated all links

    updated all links

    Description

    Please include a summary of the change and the issue it solves.

    Fixes #4263 Fixes copied modified content of databases.mdx into its copy databases2.mdx Fixes removed https://docs.mindsdb.com from links

    Type of change

    (Please delete options that are not relevant)

    • [ ] 🐛 Bug fix (non-breaking change which fixes an issue)
    • [ ] ⚡ New feature (non-breaking change which adds functionality)
    • [ ] 📢 Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [x] 📄 Documentation update

    What is the solution?

    (Describe at a high level how the feature was implemented) As in Description.

    Checklist:

    • [ ] My code follows the style guidelines(PEP 8) of MindsDB.
    • [ ] I have commented my code, particularly in hard-to-understand areas.
    • [x] I have updated the documentation.
    • [ ] I fixed|updated|added unit tests and integration tests for each feature (if applicable).
    • [ ] I have checked that my code additions will fail neither code linting checks nor unit test.
    • [ ] I have shared a short loom video or screenshots demonstrating any new functionality.
    documentation 
    opened by martyna-mindsdb 0
  • updated links

    updated links

    Description

    Please include a summary of the change and the issue it solves.

    Fixes #4261

    Type of change

    (Please delete options that are not relevant)

    • [ ] 🐛 Bug fix (non-breaking change which fixes an issue)
    • [ ] ⚡ New feature (non-breaking change which adds functionality)
    • [ ] 📢 Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [x] 📄 Documentation update

    What is the solution?

    Updated file names in links wherever necessary.

    Checklist:

    • [ ] My code follows the style guidelines(PEP 8) of MindsDB.
    • [ ] I have commented my code, particularly in hard-to-understand areas.
    • [x] I have updated the documentation.
    • [ ] I fixed|updated|added unit tests and integration tests for each feature (if applicable).
    • [ ] I have checked that my code additions will fail neither code linting checks nor unit test.
    • [ ] I have shared a short loom video or screenshots demonstrating any new functionality.
    documentation 
    opened by martyna-mindsdb 0
  • Predictions for ranges/sets of values

    Predictions for ranges/sets of values

    Is there an existing issue for this?

    • [X] I have searched the existing issues

    Is your feature request related to a problem? Please describe.

    I'd like to retrieve a prediction filtering over a range of values rather than only individual exact values. Example: select rental_price, rental_price_explain from mindsdb.home_rentals_model where sqft between 1000 and 1200 and neighborhood in ('berkeley_hills', 'westbrae', 'downtown');

    This currently generates an error: Only 'and' and '=' operations allowed in WHERE clause, found: BetweenOperation(op='between', args=( Identifier(parts=['sqft']), Constant(value=1000), Constant(value=1200) )

    Describe the solution you'd like.

    I'd like to have the ability to use filter criteria such as "in", "between", or "or" logic to retrieve a single prediction.

    Describe an alternate solution.

    No response

    Anything else? (Additional Context)

    https://mindsdbcommunity.slack.com/archives/C01S2T35H18/p1672320517307839

    opened by rbkrejci 0
Releases(v22.12.4.3)
[ICML 2021] Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data

Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data This repo provides the source code & data of our paper: Break-It-Fix-It: Unsupervised

Michihiro Yasunaga 86 Nov 30, 2022
This repository contains the official code of the paper Equivariant Subgraph Aggregation Networks (ICLR 2022)

Equivariant Subgraph Aggregation Networks (ESAN) This repository contains the official code of the paper Equivariant Subgraph Aggregation Networks (IC

Beatrice Bevilacqua 59 Dec 13, 2022
Indices Matter: Learning to Index for Deep Image Matting

IndexNet Matting This repository includes the official implementation of IndexNet Matting for deep image matting, presented in our paper: Indices Matt

Hao Lu 357 Nov 26, 2022
Self-Supervised depth kalilia

Self-Supervised depth kalilia

24 Oct 15, 2022
Evaluating AlexNet features at various depths

Linear Separability Evaluation This repo provides the scripts to test a learned AlexNet's feature representation performance at the five different con

Yuki M. Asano 32 Dec 30, 2022
⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.

Optimized Einsum Optimized Einsum: A tensor contraction order optimizer Optimized einsum can significantly reduce the overall execution time of einsum

Daniel Smith 653 Dec 30, 2022
Few-Shot Object Detection via Association and DIscrimination

Few-Shot Object Detection via Association and DIscrimination Code release of our NeurIPS 2021 paper: Few-Shot Object Detection via Association and DIs

Cao Yuhang 49 Dec 18, 2022
An alarm clock coded in Python 3 with Tkinter

Tkinter-Alarm-Clock An alarm clock coded in Python 3 with Tkinter. Run python3 Tkinter Alarm Clock.py in a terminal if you have Python 3. NOTE: This p

CodeMaster7000 1 Dec 25, 2021
Geometric Algebra package for JAX

JAXGA - JAX Geometric Algebra GitHub | Docs JAXGA is a Geometric Algebra package on top of JAX. It can handle high dimensional algebras by storing onl

Robin Kahlow 36 Dec 22, 2022
Code and dataset for ACL2018 paper "Exploiting Document Knowledge for Aspect-level Sentiment Classification"

Aspect-level Sentiment Classification Code and dataset for ACL2018 [paper] ‘‘Exploiting Document Knowledge for Aspect-level Sentiment Classification’’

Ruidan He 146 Nov 29, 2022
Official PyTorch implementation of paper: Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation (ICCV 2021 Oral Presentation)

SML (ICCV 2021, Oral) : Official Pytorch Implementation This repository provides the official PyTorch implementation of the following paper: Standardi

SangHun 61 Dec 27, 2022
BARTScore: Evaluating Generated Text as Text Generation

This is the Repo for the paper: BARTScore: Evaluating Generated Text as Text Generation Updates 2021.06.28 Release online evaluation Demo 2021.06.25 R

NeuLab 196 Dec 17, 2022
An NLP library with Awesome pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications.

简体中文 | English News [2021-10-12] PaddleNLP 2.1版本已发布!新增开箱即用的NLP任务能力、Prompt Tuning应用示例与生成任务的高性能推理! 🎉 更多详细升级信息请查看Release Note。 [2021-08-22]《千言:面向事实一致性的生

6.9k Jan 01, 2023
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

Core ML Tools Use coremltools to convert machine learning models from third-party libraries to the Core ML format. The Python package contains the sup

Apple 3k Jan 08, 2023
1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

This project releases our 1st place solution on ICDAR 2021 Competition on Mathematical Formula Detection. We implement our solution based on MMDetection, which is an open source object detection tool

yuxzho 94 Dec 25, 2022
DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition, TPAMI 2021

DVG-Face: Dual Variational Generation for HFR This repo is a PyTorch implementation of DVG-Face: Dual Variational Generation for Heterogeneous Face Re

52 Dec 30, 2022
百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline

项目说明: 百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline 比赛链接:https://aistudio.baidu.com/aistudio/competition/detail/66?isFromLuge=true 官方的baseline版本是基于paddlepadd

周俊贤 54 Nov 23, 2022
An implementation on "Curved-Voxel Clustering for Accurate Segmentation of 3D LiDAR Point Clouds with Real-Time Performance"

Lidar-Segementation An implementation on "Curved-Voxel Clustering for Accurate Segmentation of 3D LiDAR Point Clouds with Real-Time Performance" from

Wangxu1996 135 Jan 06, 2023
LETR: Line Segment Detection Using Transformers without Edges

LETR: Line Segment Detection Using Transformers without Edges Introduction This repository contains the official code and pretrained models for Line S

mlpc-ucsd 157 Jan 06, 2023
《LXMERT: Learning Cross-Modality Encoder Representations from Transformers》(EMNLP 2020)

The Most Important Thing. Our code is developed based on: LXMERT: Learning Cross-Modality Encoder Representations from Transformers

53 Dec 16, 2022