Adansons Base is a data management tool that organizes metadata of unstructured data and creates and organizes datasets.


Adansons Base Document

Product Concept

  • Adansons Base is a data management tool that organizes metadata of unstructured data and creates and organizes datasets.
  • It makes dataset creation more effective and helps find essential insights from training results and improves AI performance.

More detail ↓↓↓

See our product page:

0. Get Access Key

Type your email into the form below to join our slack and get the access key.

Invitation Form:

1. Installation

Adansons Base contains Command Line Interface (CLI) and Python SDK, and you can install both with pip command.

pip install git+

Note: if you want to use CLI in any directory, you have to install with the python globally installed on your computer.

2. Configuration

2.1 with CLI

when you run any Base CLI command for the first time, Base will ask your access key provided on our slack.

then, Base will verify the specified access key was correct.

if you don't have any access key, please see 0. Get Access Key.

this command will show you what projects you have

base list
Welcome to Adansons Base!!

Let's start with your access key provided on our slack.

Please register your access_key: xxxxxxxxxx

Successfully configured as [email protected]


2.2 Environment Variables

if you don’t want to configure interactively, you can use environment variables for configuration.

BASE_USER_ID is used for identification of users, this is the email address you submitted via our form.

export BASE_ACCESS_KEY=xxxxxxxxxx
export [email protected]

3. Tutorial 1: Organize meta data and Create dataset

let’s start Base tutorial with mnist dataset.

Step 0. prepare sample dataset

install dependencied for download dataset at first.

pip install pypng

then, download a script for mnist from our Base repository

curl -sSL >

run download-mnist script. you can specify any folder for downloading as last argument(default “~/dataset/mnist”). if you run this command on Windows, please replace it to windows path like “C:\dataset\mnist”

python3 ./ ~/dataset/mnist

Note: Base can link the data files if you put anywhere in local computer. So if you already downloaded mnist dataset, you can use it

after downloading, you can see data files in ~/dataset/mnist.

└── dataset
     └── mnist
          ├── train
          │ 	 ├── 0
          │ 	 │   ├── 1.png
          │ 	 │   ├── ...
          │ 	 │   └── 59987.png
          │ 	 ├── ...
          │ 	 └── 9
          └──	test
                ├── 0
                └── ...

Step 1. create new project

create mnist project with base new command.

base new mnist
Your Project UID

save Project UID in local file (~/.base/projects)

Base will issue a Project Unique ID and automatically save it in local file.

Step 2. import data files

after the step 0, you have many png image files on ”~/dataset/mnist” directory.

let’s upload meta data related their paths into mnist project with base import command.

base import mnist --directory ~/dataset/mnist --extension png --parse "{dataType}/{label}/{id}.png"

Note: if you changed download folder, please replace “~/dataset/mnist” in above command.

Check datafiles...
found 70000 files with png extension.

Step 3. import external metadata files

if you have external meta data files, you can integrate them into existing project database with —-external-file option.

in this time, we use wrongImagesInMNISTTestset.csv published at Github by youkaichao.

this is the extra meta data which correct wrong label on mnist test dataset.

you can evaluate your model more strictly and correctly by using these extra meta data with Base.

download external csv

curl -SL > ~/Downloads/wrongImagesInMNISTTestset.csv
base import mnist --external-file --path ~/Downloads/wrongImagesInMNISTTestset.csv -a dataType:test
1 tables found!
now estimating the rule for table joining...

1 table joining rule was estimated!
Below table joining rule will be applied...

Rule no.1

        key 'index'     ->      connected to 'id' key on exist table
        key 'originalLabel'     ->      connected to 'label' key on exist table
        key 'correction'        ->      newly added

1 tables will be applied
Table 1 sample record:
        {'index': 8, 'originalLabel': 5, 'correction': '-1'}

Do you want to perform table join?
        Base will join tables with that rule described above.

        'y' will be accepted to approve.

        Enter a value: y

Step 4. filter and export dataset with CLI

now, we are ready to create dataset.

let’s pick up a part of data files, label is 0, 1, or 2 for training, from project mnist with base search command.

you can use --conditions option for magical search filter and --query option for advanced filter.

be careful that you may get so large output on your console without -s, --summary option.

(check search docs for more information).

base search mnist --conditions "train" --query "label in ['1','2','3']"

Note: in query option, you have to specified each component as string in list without space like “[’1’,’2’,’3’]”, when you want to operate in or not in query.

18831 files

Note: If you specify no conditions or query, Base will return whole data files.

Step 5. filter and export dataset with Python SDK

in python script, you can filter and export dataset easily and simply with Project class and Files class. (see SDK docs)

'/home/xxxx/dataset/mnist/0/12909.png' print(files[0].label) # this returns the value of attribute 'lable' of first `File` object # -> '0' dataset = Dataset(files, target_key="label", transform=preprocess_func) x_train, x_test, y_train, y_test = dataset.train_test_split(split_rate=0.2) # or use with torch import torch dataset = Dataset(files, target_key="label", transform=preprocess_func) loader =, batch_size=32, shuffle=True)">
from base import Project, Dataset

# export dataset as you want to use
project = Project("mnist")
files = project.files(conditions="train", query=["label in ['1','2','3']"])

# this returns path-like `File` object
# -> '/home/xxxx/dataset/mnist/0/12909.png'
# this returns the value of attribute 'lable' of first `File` object
# -> '0'

dataset = Dataset(files, target_key="label", transform=preprocess_func)
x_train, x_test, y_train, y_test = dataset.train_test_split(split_rate=0.2)

# or use with torch
import torch

dataset = Dataset(files, target_key="label", transform=preprocess_func)
loader =, batch_size=32, shuffle=True)

finally, let’s try one of most characteristic use cases on Adansons Base.

in the external file you imported in step.3, some mnist test data files are annotated as “-1” in correction column. this means that it is difficult to classify that files even for human.

so, you should exclude that files from your dataset to evaluate your AI models more properly.

9963 eval_dataset = Dataset(eval_files, target_key="label", transform=preprocess_func)">
# you can exclude files which have "-1" on "correction" with below code
eval_files = project.files(conditions="test", query=["correction != -1"])

# this returns the number of files matched with requested conditions or query
# -> 9963

eval_dataset = Dataset(eval_files, target_key="label", transform=preprocess_func)

4. API Reference

4.1 Command Reference

Command Reference

4.2 Python Reference

Python Reference

  • update README

    update README

    close #17


    Make the mnist tutorial code in the README easier to understand.

    Description of the changes

    Write concrete examples of preprocessing functions.


    opened by cv-dote 7
  • _Feature/#93


    close #93


    Change error message contains just status code to more easy-to-understand one.

    Description of the changes

    • Changed the error message in archive_project() in


    opened by cv-dote 2
  • can't operate Files which doesn't have condition attribute.

    can't operate Files which doesn't have condition attribute.

    Error messages, stack traces, or logs

    we can not operate Files which doesn't have condition attribute.

        413             files.reprtext = files.reprtext + other.reprtext
        414             files.expression += " + " + other.expression
    --> 415             files.conditions = self.conditions + "," + other.conditions
        416             files.query = sorted(
        417                 set([*(self.query), *(other.query)]),
    TypeError: can only concatenate str (not "NoneType") to str

    Steps to reproduce

    I will change the initial value of condition : None -> '' or stop concatenating conditions and query, because it is unnecessary.

    Additional context (optional)

    opened by YU-SUKETAKAHASHI 2
  • Insert progress bar while base import

    Insert progress bar while base import


    Show the user how much more time it will take to import the data to decrease frustration.


    Show progress bar while importing dataset in CLI. The progress information can be % or anything else.

    Additional context (optional)

    opened by sbilxxxx 2
  • Notebook for ImageNet Evaluation

    Notebook for ImageNet Evaluation

    close #80


    Reproduce the experiment to re-evaluate ImageNet excluding error data.

    Description of the changes

    • Notebook for ImageNet Evaluation
    • error_data.csv


    opened by ShuntaroSuzuki 1
  • add parser.validate_parsing_rule

    add parser.validate_parsing_rule

    close #69


    When input a parsing_rule not including the pattern {XX}, an error should be printed, but "Success!"

    Description of the changes

    • add Parser.validate_parsing_rule
    • check parsing_rule is valid in Project.add_datafiles


    opened by ShuntaroSuzuki 1
  • Feature Request for `base search --query`

    Feature Request for `base search --query`


    When I try base search mnist --query "id <= 1200" command, now, they are evaluated in lexical order as str types, not int types. So, for example, data with id=10000 will also be obtained in this case.

    opened by 31159piko-suke 1
  • operated Files object can not filter properly

    operated Files object can not filter properly

    Error messages, stack traces, or logs

    I concatenate FIles object.

    project = Project("glia")
    files1 = project.files(conditions="20220418", query=["hour >= 018"], sort_key='hour')
    files2 = project.files(conditions="20220419", sort_key='hour')
    files3 = project.files(conditions="20220420", query=["hour <= 009"], sort_key='hour')
    files = files1 + files2 + files3

    Then I filter the concatenated Files, but it is not work.

    filtered_files = files.filter(query=['hour > 020'])
    >>> 0

    The bug is caused by the .query attribute of the concatenated Files. Because the .query attributes of files1 and files3 are also concatenated, there is no File that satisfies these queries.

    >>>['hour >= 018', 'hour <= 009']

    Steps to reproduce

    ~~I think the concatenated Files should have the empty .query attribute.~~ ~~Files is already queried, so the elements itself has query information.~~ ~~Hence filtered Files don't have to remember its query.~~

    I will change not to concatenate queries in filter method.

    filtered_files.query = query + self.query

    filtered_files.query = query

    Additional context (optional)

    opened by YU-SUKETAKAHASHI 1
  • mapping from string to integer does not to be working

    mapping from string to integer does not to be working

    The mapping from string to integer does not seem to be working in base Dataset class that creates convert_dict.


    convert_dict={'8': 0, '1': 1, '6': 2, '9': 3, '5': 4, '4': 5, '7': 6, '2': 7, '0': 8, '3': 9}
    opened by 31159piko-suke 1
  • the responce of `base show` command is difficult to understand

    the responce of `base show` command is difficult to understand


    base show returns raw data about keys I imported. it is difficult to understand, and I want to summarize.

    [email protected] ~ % base show mnist
    projects mnist
    {'LowerValue': '0', 'EditorList': ['[email protected]'], 'Creator': '[email protected]', 'ValueHash': '6dd1c6ef359fc0290897273dfee97dd6d1f277334b9a53f07056500409fd0f3a', 'LastEditor': '[email protected]', 'UpperValue': '59999', 'ValueType': 'str', 'CreatedTime': '1651429889.986235', 'LastModifiedTime': '1651430744.0796146', 'KeyHash': 'a56145270ce6b3bebd1dd012b73948677dd618d496488bc608a3cb43ce3547dd', 'KeyName': 'id', 'RecordedCount': 70000}
    {'LowerValue': '0', 'EditorList': ['[email protected]'], 'Creator': '[email protected]', 'ValueHash': '6dd1c6ef359fc0290897273dfee97dd6d1f277334b9a53f07056500409fd0f3a', 'LastEditor': '[email protected]', 'UpperValue': '59999', 'ValueType': 'int', 'CreatedTime': '1651429889.986235', 'LastModifiedTime': '1651430744.0796146', 'KeyHash': 'a56145270ce6b3bebd1dd012b73948677dd618d496488bc608a3cb43ce3547dd', 'KeyName': 'index', 'RecordedCount': 70000}
    {'LowerValue': '0or6', 'EditorList': ['[email protected]'], 'Creator': '[email protected]', 'ValueHash': '665c5c8dca33d1e21cbddcf524c7d8e19ec4b6b1576bbb04032bdedd8e79d95a', 'LastEditor': '[email protected]', 'UpperValue': '-1', 'ValueType': 'str', 'CreatedTime': '1651430744.0796146', 'LastModifiedTime': '1651430744.0796146', 'KeyHash': '34627e3242f2ca21f540951cb5376600aebba58675654dd5f61e860c6948bffa', 'KeyName': 'correction', 'RecordedCount': 74}
    {'LowerValue': '0', 'EditorList': ['[email protected]'], 'Creator': '[email protected]', 'ValueHash': '0c2fb8f0d59d60a0a5e524c7794d1cf091a377e5c0d3b2cf19324432562555e1', 'LastEditor': '[email protected]', 'UpperValue': '9', 'ValueType': 'str', 'CreatedTime': '1651429889.986235', 'LastModifiedTime': '1651430744.0796146', 'KeyHash': '1aca80e8b55c802f7b43740da2990e1b5735bbb323d93eb5ebda8395b04025e2', 'KeyName': 'label', 'RecordedCount': 70000}
    {'LowerValue': '0', 'EditorList': ['[email protected]'], 'Creator': '[email protected]', 'ValueHash': '0c2fb8f0d59d60a0a5e524c7794d1cf091a377e5c0d3b2cf19324432562555e1', 'LastEditor': '[email protected]', 'UpperValue': '9', 'ValueType': 'int', 'CreatedTime': '1651429889.986235', 'LastModifiedTime': '1651430744.0796146', 'KeyHash': '1aca80e8b55c802f7b43740da2990e1b5735bbb323d93eb5ebda8395b04025e2', 'KeyName': 'originalLabel', 'RecordedCount': 70000}
    {'LowerValue': 'test', 'EditorList': ['[email protected]'], 'Creator': '[email protected]', 'ValueHash': '0e546bb01e2c9a9d1c388fca8ce3fabdde16084aee10c58becd4767d39f62ab7', 'LastEditor': '[email protected]', 'UpperValue': 'train', 'ValueType': 'str', 'CreatedTime': '1651429889.986235', 'LastModifiedTime': '1651430744.0796146', 'KeyHash': '9c98c4cbd490df10e7dc42f441c72ef835e3719d147241e32b962a6ff8c1f49d', 'KeyName': 'dataType', 'RecordedCount': 70000}
    opened by kenichihiguchi 1
  • No support for Japanese external files.

    No support for Japanese external files.

    Before using the post method, we should encode the data to utf8 like below at

    data = data.encode('utf-8')
    res =, json.dumps(data), headers=HEADER)
    opened by ynntech 1
  •  Feature Request for `base search --condition ` command

    Feature Request for `base search --condition ` command


    When I type a label that is not correct with base search --condition something command, now, we got all of the file information. I want to get the returns like there is no value "something"

    Additional context (optional)

    enhancement good first issue 
    opened by ynntech 0
  • Explain behavior when multiple `--query` given in `base search`

    Explain behavior when multiple `--query` given in `base search`



    When you give multiple --query in base search, you'll get the intersection of given queries as a return.
    (E.g. : base search mnist --conditions "test" --query "correction == -1" --query "label in ['1','2','3']" Add description about this on the docs

    Additional context (optional)

    opened by kuriyan1204 0
  • v0.1.2(Jun 11, 2022)

    What's Changed

    improve features

    • able to specify original table join rule with base import --external-file command
      • if the estimated rule is not correct, you can select "m" to download a definition YML file
    • add base import --external-file --extract suboption to get structured and extracted table as CSV
    • add base import --external-file --estimate-rule suboption to preview estimated table join rule
    • able to filter missing values with the query "Key is None"

    fix bugs

    • CSV export error with base search [PROJECT] --export CSV command
    • and some bugs

    and update documents


    • remove attributes(conditions and query) and fix bugs by @31159piko-suke in
    • Feature/#61 by @31159piko-suke in
    • enabled export csv file by search --export csv command by @31159piko-suke in
    • Update jupyternotebook :Consistent with by @ynntech in
    • fixed path specification error with search --export by @31159piko-suke in
    • Enabled evaluate number as int in query by @31159piko-suke in
    • add parser.validate_parsing_rule by @ShuntaroSuzuki in
    • enable specify table joining rule by base import by @31159piko-suke in
    • solved issue#36 by @31159piko-suke in
    • v0.1.2 by @kenichihiguchi in

    New Contributors

    • @ShuntaroSuzuki made their first contribution in

    Full Changelog:

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(May 18, 2022)

    What's Changed

    improve features

    • update the output of base show [PROJECT] command to know what keys in the project easily
    • create a progress bar at datafile import command
    • support + and | operators with base.files.Files() class

    fix bugs

    • crash bug when we import the external files include Japanese
    • one-hot vector mapping doesn't work well on base.dataset.Dataset() class (this feature will be temporary removed)

    and update documents


    • update README by @cv-dote in
    • fixed link for SDK docs by @kenichihiguchi in
    • add link to medium by @kenichihiguchi in
    • Feature/#16 by @kuriyan1204 in
    • Update filename in tutorial notebook by @ynntech in
    • Support Japanese by @ynntech in
    • create actions yml file for dev and main branch by @31159piko-suke in
    • temporarily removed convert dict and onehot vector by @31159piko-suke in
    • make it possible to check progress in base import by @31159piko-suke in
    • Supported + and | operators for Files by @YU-SUKETAKAHASHI in
    • Added .metadata attr to File by @YU-SUKETAKAHASHI in
    • Fixed error statements when parsing fails. by @YU-SUKETAKAHASHI in
    • Feature/#32 by @ynntech in
    • added description for Files and Dataset by @31159piko-suke in
    • update base show output to know keys on metadata DB easily by @kenichihiguchi in
    • v0.1.1 by @ynntech in
    • increment version 0.1.0 -> 0.1.1 by @kenichihiguchi in
    • v0.1.1 by @kenichihiguchi in

    New Contributors

    • @kuriyan1204 made their first contribution in
    • @31159piko-suke made their first contribution in
    • @YU-SUKETAKAHASHI made their first contribution in

    Full Changelog:

    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Apr 25, 2022)

Adansons Inc
Adansons Inc
JurjenLang, an interpreted programming language

JurjenLang An interpreted programming language Getting started Follow these three steps on your computer to get started git clone

JVerbruggen 5 May 03, 2022
🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

Boltons boltons should be builtins. Boltons is a set of over 230 BSD-licensed, pure-Python utilities in the same spirit as — and yet conspicuously mis

Mahmoud Hashemi 6k Jan 06, 2023
A "multiclipboards" script for an efficient way to improve the original clipboards which are only able to save one string at a time

A "multiclipboards" script for an efficient way to improve the original clipboards which are only able to save one string at a time. Works on both Windows and Linux.

1 Jan 24, 2022
A light library to build tiny websites

A light library to build tiny websites

BT.Q 1 Dec 23, 2021
A simply dashboard to view commodities position data based on CFTC reports

commodities-dashboard A simply dashboard to view commodities position data based on CFTC reports This is a python project using Dash and plotly to con

71 Dec 19, 2022
A python script to get your activity

activities A python script to get your activity Not complete Requirements Python (=3.7) Pip (for python = 3.7) Git Pip packages psutil asyncio aioht

StarNumber 3 Nov 07, 2021
Open source book about making Python packages.

Python packages Tomas Beuzen & Tiffany Timbers Python packages are a core element of the Python programming language and are how you create organized,

Python Packages 169 Jan 06, 2023
Python script to autodetect a base set of swiftlint rules.

swiftlint-autodetect Python script to autodetect a base set of swiftlint rules. Installation brew install pipx

Jonathan Wight 24 Sep 20, 2022
Personal Chat Assistance

Python-Programming Personal Chat Assistance {% import "bootstrap/wtf.html" as wtf %} titleEVT/title script src="

PRASH_SMVIT 2 Nov 14, 2021
This is a simple python script for checking A/L Examination results of srilankan students

AL-Result-Checker This is a simple python script for checking A/L Examination results of srilankan students INSTALLATION [Termux] [Linux] : apt-get up

Razor Kenway 8 Oct 24, 2022
A plugin for poetry that allows you to execute scripts defined in your pyproject.toml, just like you can in npm or pipenv

poetry-exec-plugin A plugin for poetry that allows you to execute scripts defined in your pyproject.toml, just like you can in npm or pipenv Installat

38 Jan 06, 2023
Um Script De Mensagem anonimas Para linux e Termux Feito em python

Um Script De Mensagem anonimas Para linux e Termux Feito em python feito em um celular

6 Sep 09, 2021
A collection of full-stack resources for programmers.

A collection of full-stack resources for programmers.

Charles-Axel Dein 22.3k Dec 30, 2022
Hy - A dialect of Lisp that's embedded in Python

Hy Lisp and Python should love each other. Let's make it happen. Hy is a Lisp dialect that's embedded in Python. Since Hy transforms its Lisp code int

Hy Society 4.4k Jan 02, 2023
A simple panel with IP, CNPJ, CEP and PLACA queries

Painel mpm Um painel simples com consultas de IP, CNPJ, CEP e PLACA Início 🌐 apt update && apt upgrade -y pkg i python git pip install requests Insta

MrDiniz 4 Nov 04, 2022
a bit of my project :) and I use some of them for my school lesson or study for an exam! but some of them just for myself.

Handy Project a bit of my project :) and I use some of them for my school lesson or study for an exam! but some of them just for myself. the handy pro

amirkasra esmaeilian 13 Jul 05, 2021
This is a spamming selfbot that has custom spammed message and @everyone spam.

This is a spamming selfbot that has custom spammed message and @everyone spam.

astro1212 1 Jul 31, 2022
Programa que organiza pastas automaticamente

📂 Folder Organizer 📂 Programa que organiza pastas automaticamente Requisitos • Como usar • Melhorias futuras • Capturas de Tela Requisitos Antes de

João Victor Vilela dos Santos 1 Nov 02, 2021
Python package for reference counting native pointers

refcount master: testing: This package is primarily for managing resources in native libraries, written for instance in C++, from Python. While it boi

CSIRO Hydroinformatics 2 Nov 03, 2022
Reconhecimento de voz, em português, com python

Speech_recognizer Reconhecimento de voz, em português, com python O ato de falar nada mais é que criar vibrações no ar. Por meio de um conversor analó

Marcus Vinícius Ribeiro Andrade 1 Dec 14, 2021