Simple but maybe too simple config management through python data classes. We use it for machine learning.

Overview

👩‍✈️ Coqpit

CI

Simple, light-weight and no dependency config handling through python data classes with to/from JSON serialization/deserialization.

Currently it is being used by 🐸 TTS.

Why I need this

What I need from a ML configuration library...

  1. Fixing a general config schema in Python to guide users about expected values.

    Python is good but not universal. Sometimes you train a ML model and use it on a different platform. So, you need your model configuration file importable by other programming languages.

  2. Simple dynamic value and type checking with default values.

    If you are a beginner in a ML project, it is hard to guess the right values for your ML experiment. Therefore it is important to have some default values and know what range and type of input are expected for each field.

  3. Ability to decompose large configs.

    As you define more fields for the training dataset, data preprocessing, model parameters, etc., your config file tends to get quite large but in most cases, they can be decomposed, enabling flexibility and readability.

  4. Inheritance and nested configurations.

    Simply helps to keep configurations consistent and easier to maintain.

  5. Ability to override values from the command line when necessary.

    For instance, you might need to define a path for your dataset, and this changes for almost every run. Then the user should be able to override this value easily over the command line.

    It also allows easy hyper-parameter search without changing your original code. Basically, you can run different models with different parameters just using command line arguments.

  6. Defining dynamic or conditional config values.

    Sometimes you need to define certain values depending on the other values. Using python helps to define the underlying logic for such config values.

  7. No dependencies

    You don't want to install a ton of libraries for just configuration management. If you install one, then it is better to be just native python.

🔍 Examples

👉 Simple Coqpit

import os
from dataclasses import asdict, dataclass, field

from coqpit.coqpit import MISSING, Coqpit, check_argument


@dataclass
class SimpleConfig(Coqpit):
    val_a: int = 10
    val_b: int = None
    val_d: float = 10.21
    val_c: str = "Coqpit is great!"
    # mandatory field
    # raise an error when accessing the value if it is not changed. It is a way to define
    val_k: int = MISSING
    # optional field
    val_dict: dict = field(default_factory=lambda: {"val_aa": 10, "val_ss": "This is in a dict."})
    # list of list
    val_listoflist: List[List] = field(default_factory=lambda: [[1, 2], [3, 4]])
    val_listofunion: List[List[Union[str]]] = field(default_factory=lambda: [[1, 3], [1, "Hi!"]])

    def check_values(
        self,
    ):  # you can define explicit constraints on the fields using `check_argument()`
        """Check config fields"""
        c = asdict(self)
        check_argument("val_a", c, restricted=True, min_val=10, max_val=2056)
        check_argument("val_b", c, restricted=True, min_val=128, max_val=4058, allow_none=True)
        check_argument("val_c", c, restricted=True)


if __name__ == "__main__":
    file_path = os.path.dirname(os.path.abspath(__file__))
    config = SimpleConfig()

    # try MISSING class argument
    try:
        k = config.val_k
    except AttributeError:
        print(" val_k needs a different value before accessing it.")
    config.val_k = 1000

    # try serialization and deserialization
    print(config.serialize())
    print(config.to_json())
    config.save_json(os.path.join(file_path, "example_config.json"))
    config.load_json(os.path.join(file_path, "example_config.json"))
    print(config.pprint())

    # try `dict` interface
    print(*config)
    print(dict(**config))

    # value assignment by mapping
    config["val_a"] = -999
    print(config["val_a"])
    assert config.val_a == -999

👉 Serialization

import os
from dataclasses import asdict, dataclass, field
from coqpit import Coqpit, check_argument
from typing import List, Union


@dataclass
class SimpleConfig(Coqpit):
    val_a: int = 10
    val_b: int = None
    val_c: str = "Coqpit is great!"

    def check_values(self,):
        '''Check config fields'''
        c = asdict(self)
        check_argument('val_a', c, restricted=True, min_val=10, max_val=2056)
        check_argument('val_b', c, restricted=True, min_val=128, max_val=4058, allow_none=True)
        check_argument('val_c', c, restricted=True)


@dataclass
class NestedConfig(Coqpit):
    val_d: int = 10
    val_e: int = None
    val_f: str = "Coqpit is great!"
    sc_list: List[SimpleConfig] = None
    sc: SimpleConfig = SimpleConfig()
    union_var: Union[List[SimpleConfig], SimpleConfig] = field(default_factory=lambda: [SimpleConfig(),SimpleConfig()])

    def check_values(self,):
        '''Check config fields'''
        c = asdict(self)
        check_argument('val_d', c, restricted=True, min_val=10, max_val=2056)
        check_argument('val_e', c, restricted=True, min_val=128, max_val=4058, allow_none=True)
        check_argument('val_f', c, restricted=True)
        check_argument('sc_list', c, restricted=True, allow_none=True)
        check_argument('sc', c, restricted=True, allow_none=True)


if __name__ == '__main__':
    file_path = os.path.dirname(os.path.abspath(__file__))
    # init 🐸 dataclass
    config = NestedConfig()

    # save to a json file
    config.save_json(os.path.join(file_path, 'example_config.json'))
    # load a json file
    config2 = NestedConfig(val_d=None, val_e=500, val_f=None, sc_list=None, sc=None, union_var=None)
    # update the config with the json file.
    config2.load_json(os.path.join(file_path, 'example_config.json'))
    # now they should be having the same values.
    assert config == config2

    # pretty print the dataclass
    print(config.pprint())

    # export values to a dict
    config_dict = config.to_dict()
    # crate a new config with different values than the defaults
    config2 = NestedConfig(val_d=None, val_e=500, val_f=None, sc_list=None, sc=None, union_var=None)
    # update the config with the exported valuess from the previous config.
    config2.from_dict(config_dict)
    # now they should be having the same values.
    assert config == config2

👉 argparse handling and parsing.

import argparse
import os
from dataclasses import asdict, dataclass, field
from typing import List

from coqpit.coqpit import Coqpit, check_argument
import sys


@dataclass
class SimplerConfig(Coqpit):
    val_a: int = field(default=None, metadata={'help': 'this is val_a'})


@dataclass
class SimpleConfig(Coqpit):
    val_a: int = field(default=10,
                       metadata={'help': 'this is val_a of SimpleConfig'})
    val_b: int = field(default=None, metadata={'help': 'this is val_b'})
    val_c: str = "Coqpit is great!"
    mylist_with_default: List[SimplerConfig] = field(
        default_factory=lambda:
        [SimplerConfig(val_a=100),
         SimplerConfig(val_a=999)],
        metadata={'help': 'list of SimplerConfig'})

    # mylist_without_default: List[SimplerConfig] = field(default=None, metadata={'help': 'list of SimplerConfig'})  # NOT SUPPORTED YET!

    def check_values(self, ):
        '''Check config fields'''
        c = asdict(self)
        check_argument('val_a', c, restricted=True, min_val=10, max_val=2056)
        check_argument('val_b',
                       c,
                       restricted=True,
                       min_val=128,
                       max_val=4058,
                       allow_none=True)
        check_argument('val_c', c, restricted=True)


def main():
    # initial config
    config = SimpleConfig()
    print(config.pprint())

    # reference config that we like to match with the config above
    config_ref = SimpleConfig(val_a=222,
                              val_b=999,
                              val_c='this is different',
                              mylist_with_default=[
                                  SimplerConfig(val_a=222),
                                  SimplerConfig(val_a=111)
                              ])

    # create and init argparser with Coqpit
    parser = argparse.ArgumentParser()
    parser = config.init_argparse(parser)
    parser.print_help()
    args = parser.parse_args()

    # parse the argsparser
    config.parse_args(args)
    config.pprint()
    # check the current config with the reference config
    assert config == config_ref


if __name__ == '__main__':
    sys.argv.extend(['--coqpit.val_a', '222'])
    sys.argv.extend(['--coqpit.val_b', '999'])
    sys.argv.extend(['--coqpit.val_c', 'this is different'])
    sys.argv.extend(['--coqpit.mylist_with_default.0.val_a', '222'])
    sys.argv.extend(['--coqpit.mylist_with_default.1.val_a', '111'])
    main()

🤸‍♀️ Merging coqpits

import os
from dataclasses import dataclass
from coqpit.coqpit import Coqpit, check_argument


@dataclass
class CoqpitA(Coqpit):
    val_a: int = 10
    val_b: int = None
    val_d: float = 10.21
    val_c: str = "Coqpit is great!"


@dataclass
class CoqpitB(Coqpit):
    val_d: int = 25
    val_e: int = 257
    val_f: float = -10.21
    val_g: str = "Coqpit is really great!"


if __name__ == '__main__':
    file_path = os.path.dirname(os.path.abspath(__file__))
    coqpita = CoqpitA()
    coqpitb = CoqpitB()
    coqpitb.merge(coqpita)
    print(coqpitb.val_a)
    print(coqpitb.pprint())
Comments
  • Allow file-like objects when saving and loading

    Allow file-like objects when saving and loading

    Allow users to save the configs to arbitrary locations through file-like objects. Would e.g. simplify coqui-ai/TTS#683 without adding an fsspec dependency to this library.

    opened by agrinh 6
  • Latest PR causes an issue when a `Serializable` has default None

    Latest PR causes an issue when a `Serializable` has default None

    https://github.com/coqui-ai/coqpit/blob/5379c810900d61ae19d79b73b03890fa103487dd/coqpit/coqpit.py#L539

    @reuben I am on it but if you have an easy fix go for it. Right now it breaks all the TTS trainings.

    opened by erogol 2
  • [feature request] change the `arg_perfix` of coqpit

    [feature request] change the `arg_perfix` of coqpit

    Is it possible to change the arg_perfix when using Coqpit object to another value / empty string? I see the option is supported in the code by changing arg_perfix, but not sure how to access it using the proposed API.

    Thanks for the package, looks very useful!

    opened by mosheman5 1
  • Setup CI to push new tags to PyPI automatically

    Setup CI to push new tags to PyPI automatically

    I'm gonna add a workflow to automatically upload new tags to PyPI. @erogol when you have a chance could you transfer the coqpit project on PyPI to the coqui user?[0] Then you can add your personal account as a maintainer also, so you don't have to change your local setup.

    In the mean time I'll iterate on testpypi.

    [0] https://pypi.org/user/coqui/

    opened by reuben 1
  • Fix rsetattr

    Fix rsetattr

    rsetattr() is updated to pass the new test cases below.

    I don't know if it is the right solution. It might be that rsetattr confuses when coqpit is used as a prefix.

    opened by erogol 0
  • [feature request] Warning when unexpected key is loaded but not present in class

    [feature request] Warning when unexpected key is loaded but not present in class

    Here is an toy scenario where it would be nice to have a warning

    from dataclasses import dataclass
    from coqpit import Coqpit
    
    @dataclass
    class SimpleConfig(Coqpit):
        val_a: int = 10
        val_b: int = None
    
    if __name__ == "__main__":
        config = SimpleConfig()
    
        tmp_config = config.to_dict()
        tmp_config["unknown_key"] = "Ignored value"
        config.from_dict(tmp_config)
        print(config.to_json())
    

    There the value of config.to_json() is

    {
        "val_a": 10,
        "val_b": null
    }
    

    Which is expected behaviour, but we should get a warning that some keys were ignored (IMO)

    feature request 
    opened by WeberJulian 6
  • [feature request] Add `is_defined`

    [feature request] Add `is_defined`

    Use coqpit.is_defined('field') to check if "field" in coqpit and coqpit.field is not None:

    It is a common condition when you parse out a coqpit object.

    feature request 
    opened by erogol 0
  • Allow grouping of argparse fields according to subclassing

    Allow grouping of argparse fields according to subclassing

    When using inheritance to extend config definitions the resulting ArgumentParser has all fields flattened out. It would be nice to group fields by class and allow some control over ordering.

    opened by reuben 2
Releases(v0.0.17)
Owner
coqui
Coqui, a startup providing open speech tech for everyone 🐸
coqui
Dicionario-git-github - Dictionary created to help train new users of Git and GitHub applications

Dicionário 📕 Dicionário criado com o objetivo de auxiliar no treinamento de nov

Felippe Rafael 1 Feb 07, 2022
Program Input Nilai Mahasiswa Menggunakan Fungsi

PROGRAM INPUT NILAI MAHASISWA MENGGUNAKAN FUNGSI Nama : Maulana Reza Badrudin Nim : 312110510 Matkul : Bahas Pemograman DESKRIPSI Deklarasi dicti

Maulana Reza Badrudin 1 Jan 05, 2022
🌈Python cheatsheet for all standard libraries(Continuously Updated)

Python Standard Libraries Cheatsheet Depend on Python v3.9.8 All code snippets have been tested to ensure they work properly. Fork me on GitHub. 中文 En

nick 12 Dec 27, 2022
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!

Streamify A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more! Description Objective The project will stre

Ankur Chavda 206 Dec 30, 2022
Adam with minor modifications which give significant improvement

BAdam Modification of Adam [1] optimizer with increased stability and better performance. Tricks used: Decoupled weight decay as in AdamW [2]. Such de

19 May 11, 2022
Automatically remove user join messages when the user leaves the server.

CleanLeave Automatically remove user join messages when the user leaves the server. Installation You will need to install poetry to run this bot local

11 Sep 19, 2022
TrackGen - The simplest tropical cyclone track map generator

TrackGen - The simplest tropical cyclone track map generator Usage Each line is a point to be plotted on the map Each field gives information about th

TrackGen 6 Jul 20, 2022
Cirq is a Python library for writing, manipulating, and optimizing quantum circuits and running them against quantum computers and simulators

Cirq is a Python library for writing, manipulating, and optimizing quantum circuits and running them against quantum computers and simulators. Install

quantumlib 3.6k Jan 07, 2023
Mannaggia is a python application to praise or more likely to curse the saints

Mannaggia-py 👼 Remember Mannaggia? This is a Python remake of it, with new features. mannaggia is a python application to praise or more likely to cu

Christian Visintin 9 Aug 12, 2022
Weakly-Divisable - Takes an interger and seee if it is weakly divisible by seven

Weakly Divisble Project by Diana Arce-Hernandez, Ryan McAlpine, and Rommel Ravan

Diana Arce-Hernandez 1 Jan 12, 2022
Lightweight Scheduled Blocks Checker for Current Epoch. No cardano-node Required, data is taken from blockfrost.io

ReLeaderLogs For Cardano Stakepool Operators: Lightweight Scheduled Blocks Checker for Current Epoch. No cardano-node Required, data is taken from blo

SNAKE (Cardano Stakepool) 2 Oct 19, 2021
K2HASH Python library - NoSQL Key Value Store(KVS) library

k2hash_python Overview k2hash_python is an official python driver for k2hash. Install Firstly you must install the k2hash shared library: curl -o- htt

Yahoo! JAPAN 3 Oct 19, 2022
A python package that computes an optimal motion plan for approaching a red light

redlight_approach redlight_approach is a Python package that computes an optimal motion plan during traffic light approach. RLA_demo.mov Given the par

Jonathan Roy 4 Oct 27, 2022
InverterApi - This project has been designed to take monitoring data from Voltronic, Axpert, Mppsolar PIP, Voltacon, Effekta

InverterApi - This project has been designed to take monitoring data from Voltronic, Axpert, Mppsolar PIP, Voltacon, Effekta

Josep Escobar 2 Sep 03, 2022
LiteX-Acorn-Baseboard is a baseboard developed around the SQRL's Acorn board (or Nite/LiteFury) expanding their possibilities

LiteX-Acorn-Baseboard is a baseboard developed around the SQRL's Acorn board (or Nite/LiteFury) expanding their possibilities

33 Nov 26, 2022
Some shitty programs just to brush up on my understanding of binary conversions.

Binary Converters Some shitty programs just to brush up on my understanding of binary conversions. Supported conversions formats = "unsigned-binary" |

Tim 2 Jan 09, 2022
📦 A Human's Ultimate Guide to setup.py.

📦 setup.py (for humans) This repo exists to provide an example setup.py file, that can be used to bootstrap your next Python project. It includes som

Navdeep Gill 5k Jan 04, 2023
Anki cards generator for Leetcode

Leetcode Anki card generator Summary By running this script you'll be able to generate Anki cards with all the leetcode problems. I personally use it

Pavel Safronov 150 Dec 25, 2022
A program to calculate the are of a triangle. made with Python.

Area-Calculator What is Area-Calculator? Area-Calculator is a program to find out the area of a triangle easily. fully made with Python. Needed a pyth

Chandula Janith 0 Nov 27, 2021
Awesome & interesting talks about programming

Programming Talks I watch a lot of talks that I love to share with my friends, fellows and coworkers. As I consider all GitHubbers my friends (oh yeah

Veit Heller 7k Dec 26, 2022