An Explainable Leaderboard for NLP

Overview

ExplainaBoard: An Explainable Leaderboard for NLP

Introduction | Website | Download | Backend | Paper | Video | Bib

Introduction

ExplainaBoard is an interpretable, interactive and reliable leaderboard with seven (so far) new features (F) compared with generic leaderboard.

  • F1: Single-system Analysis: What is a system good or bad at?
  • F2: Pairwise Analysis: Where is one system better (worse) than another?
  • F3: Data Bias Analysis: What are the characteristics of different evaluated datasets?
  • F5: Common errors: What are common mistakes that top-5 systems made?
  • F6: Fine-grained errors: where will errors occur?
  • F7: System Combination: Is there potential complementarity between different systems?

Website

We deploy ExplainaBoard as a Web toolkit, which includes 9 NLP tasks, 40 datasets and 300 systems. Detailed information is as follows.

Task

Task Sub-task Dataset Model Attribute
Sentiment 8 40 2
Text Classification Topics 4 18 2
Intention 1 3 2
Text-Span Classification Aspect Sentiment 4 20 4
Text pair Classification NLI 2 6 7
NER 3 74 9
Sequence Labeling POS 3 14 4
Chunking 3 14 9
CWS 7 64 7
Structure Prediction Semantic Parsing 4 12 4
Text Generation Summarization 2 36 7

Download System Outputs

We haven't released datasets or corresponding system outputs that require licenses. But If you have licenses please fill in this form and we will send them to you privately. (Description of output's format can refer here If these system outputs are useful for you, you can cite our work.

Test Your Results

pip install -r requirements.txt

Description of Each Directory

  • task-[task_name]: fine-grained analysis for each task, aiming to generating fine-grained analysis results with the json format. For example, task-mlqa can calculate the fine-graied F1 scores for different systems, and output corresponding json files in task-mlqa/output/ .

  • meta-eval is a sort of controller, which can be used to start the fine-graind anlsysis of all tasks, and analyze output json files.

    • calculate fine-grained results for all tasks: ./meta-eval/run-allTasks.sh
        cd ./meta-eval/
        ./run-allTasks.sh
    • merge json files of all tasks into a csv file, which would be useful for further SQL import: ./meta-eval/genCSV/json2csv.py
        cd ./meta-eval/genCSV/json2csv.py
        python json2csv.py > explainabord.csv
  • src stores some auxiliary codes.

Submit Your Results

You can submit your system's output by this form following the format description.

Acknowledgement

We thanks all authors who share their system outputs with us: Ikuya Yamada, Stefan Schweter, Colin Raffel, Yang Liu, Li Dong. We also thank Vijay Viswanathan, Yiran Chen, Hiroaki Hayashi for useful discussion and feedback about ExplainaBoard.

Comments
  • Is the current applicable condition of t-test correct?

    Is the current applicable condition of t-test correct?

    opened by tetsuok 22
  • Allowed specification of the metric #dimensions

    Allowed specification of the metric #dimensions

    This PR loosens the restriction that sufficient statistics must be a vector, and allows them to be a tensor with the dimension equal to Metric.stats_ndim().

    It also demonstrates how this works on the NLGMetaEvaluation metric.

    @pfliu-nlp and @odashi : could you please check this PR as a potential solution to the discussion in https://github.com/neulab/ExplainaBoard/pull/527 ?

    (sorry, after sending the review request I made a change of naming from dim->ndim, which I think is more in line with the naming in numpy)

    opened by neubig 12
  • test_generate_system_analysis in integration_tests.summarization_test.SummarizationTest is too slow

    test_generate_system_analysis in integration_tests.summarization_test.SummarizationTest is too slow

    commit 8c514c3d81a079d967d208f8bc330c2f202620bb (#437) increases the execution time of integration_tests.summarization_test.SummarizationTest. When I measured on my GCP VM, the time of the test increased by 430 seconds (from 6 seconds to 436 seconds), which is too slow to run as automated tests in pull requests. Slow tests need to be removed or replaced with more focused and fast tests. In general, having slow tests leads to productivity drains: Time to update pull requests takes longer, developers would try to include large commits into pull requests to work around slow CI time, pull requests become expensive to review, which makes identifying bugs or design flaws in code review difficult.

    Repro steps

    rm -rf ~/.cache/explainaboard
    time python -m unittest -v integration_tests.summarization_test.SummarizationTest
    

    Output

    test_datalab_loader (integration_tests.summarization_test.SummarizationTest) ... skipped 'time consuming'
    test_default_features_dont_modify_condgen (integration_tests.summarization_test.SummarizationTest) ... ok
    test_generate_system_analysis (integration_tests.summarization_test.SummarizationTest) ... WARNING:datalabs.load:Couldn't find a directory or a dataset named 'cnn_dailymail' in this version. It was picked from the master branch on github instead.
    WARNING:datalabs.builder:No config specified, defaulting to: cnn_dailymail/3.0.0
    WARNING:datalabs.builder:Reusing dataset cnn_dailymail (/home/t/.cache/expressai/datalab/cnn_dailymail/3.0.0/3.0.0/6e2f5d689f0225c4f22eb78d11ba7a21399810c5cb853edafe39b1d006a1ff95)
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 287113/287113 [06:20<00:00, 755.03it/s]
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 287113/287113 [00:29<00:00, 9616.19it/s]
    INFO:explainaboard:caching stats for cnn_dailymail None
    calculating example-level features: 3it [00:00, 51.88it/s]
    calculating token-level features: 3it [00:00, 139.83it/s]
    /home/t/explainaboard-fork/explainaboard/metrics/metric.py:336: DeprecationWarning: Use of keyword argument `alpha` for method `interval` is deprecated. Use first positional argument or keyword argument `confidence` instead.
      return stats_t.interval(
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 349.50it/s]
    ok
    test_generate_system_human_eval (integration_tests.summarization_test.SummarizationTest) ... skipped 'Not yet fixed in v0.11'
    test_load_tsv (integration_tests.summarization_test.SummarizationTest) ... ok
    
    ----------------------------------------------------------------------
    Ran 5 tests in 438.659s
    
    OK (skipped=2)
    python -m unittest -v integration_tests.summarization_test.SummarizationTest  434.35s user 2.58s system 98% cpu 7:22.46 total
    
    opened by tetsuok 12
  • Use 'confidence' instead of deprecated 'alpha' for scipy.stats.t.interval

    Use 'confidence' instead of deprecated 'alpha' for scipy.stats.t.interval

    Reduced heavy logging uncovered buried DeprecationWarnings in tests. We get the following DeprecationWarning in the tests that invoke scipy.stats.t.interval method:

    test_hits (explainaboard.tests.test_metric.TestMetric) ... /home/runner/work/ExplainaBoard/ExplainaBoard/explainaboard/metrics/metric.py:338: DeprecationWarning: Use of keyword argument `alpha` for method `interval` is deprecated. Use first positional argument or keyword argument `confidence` instead.
    

    This PR fixes the warning as the warning suggests.

    opened by tetsuok 12
  • Cache pip dependencies to speed up CI

    Cache pip dependencies to speed up CI

    This PR attempts to speed up both unit-tests and integration-tests CI jobs. Every CI job spends about 2 minutes on installing pip packages. The step dominates about 90% of the total time of unit-tests and about 30% of the total time of integration-tests. The step to install pip packages can be skipped by creating virtual environments and caching the installed packages onto the environments using actions/cache. Note that actions/[email protected] doesn't support caching installed packages. It only allow to avoid re-downloading by caching downloaded packages from PyPI under ~/.cache/pip.

    Dependencies listed in setup.py are moved to requirements.txt. This is to generate lock files for every Python version from requirements.txt. The generated lock files are used as keys to caches to properly invalidate when dependencies are updated. Unless dependencies are changed, every CI job should be reproducible (with respect to installing pip dependencies). Making the CI jobs reproducible and faster achieves at the expense of periodical updates of these lock files. Maintaining lock files for dependencies is pretty common in other programming languages such as JS and Rust. This update can be done by running cicd/gen_requirements_lock.sh.

    opened by tetsuok 12
  • Refactor/loaders

    Refactor/loaders

    1. Commit 1: refactored Loader.__init__()
    • made data a required argument
    • all loaders now call the __init__ method of the base loader
    1. Commit 2: implemented file-specific loaders to simplify the task-specific loaders
    • implements TSVFileLoader, JSONFileLoader, DatalabFileLoader and CoNLLFileLoader which knows how to load a certain type of file given the fields
    • refactored all the existing loaders to use these file-specific loaders instead
    • QAMultipleChoiceLoader KgLinkTailPredictionLoader still uses custom load() methods because they support user-defined features. The way they load these extra features is different so I decided to leave them for now. It'll be easy to incorporate user-defined features to the file loaders (we just need to update the fields based on self.user_defined_features_configs)
    • hellaswag is removed in https://github.com/neulab/ExplainaBoard/commit/4b93b9542b714754eb91d718cd82b98ab706d11c
    • This refactor makes it easier to do #141 in the future. We just need to have two sets of file loaders for each task-specific loader. One is for the (input, reference_output) file and the other one is for the predictions file.

    Please let me know what you think! Thanks!

    opened by lyuyangh 12
  • Potential issue with spearman R bootstrapping

    Potential issue with spearman R bootstrapping

    We observed the following test failure when integrating another PR:

    ======================================================================
    FAIL: test_sample_level_spearmanr_bootstrap (integration_tests.meta_eval_wmt_da_test.MetaEvalNLGCITest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/runner/work/ExplainaBoard/ExplainaBoard/integration_tests/meta_eval_wmt_da_test.py", line 191, in test_sample_level_spearmanr_bootstrap
        self.assertAlmostEqual(ci[0], 0.6488, 2)
    AssertionError: 0.7325904563487001 != 0.6488 within 2 places (0.08379045634870008 difference)
    
    ----------------------------------------------------------------------
    

    We are not sure whether this is an issue with the test or the underlying code, but as a temporary measure we reduced the sensitivity of the test. We should go back and check to make sure whether this is just due to bootstrapping variance or whether it's due to a bug in the test itself.

    opened by neubig 10
  • Implement CalibrationAnalysis

    Implement CalibrationAnalysis

    Calibration is whether a system's confidence is well-correlated with whether the system got the answer right or not. It would be nice if we could do analyses related to calibration, such as calculating expected calibration error: https://arxiv.org/abs/1706.04599

    I think this should probably be implemented as an additional variety of analysis, which would be simple and self-contained: https://github.com/neulab/ExplainaBoard/blob/main/explainaboard/analysis/analyses.py#L45

    good first issue new-analysis 
    opened by neubig 10
  • Correct training set feature field names

    Correct training set feature field names

    Previously, calculation of training set features would fail if the datalab dataset used unconventional column names.

    This does the following things:

    1. Makes an option to use Loader to load only datasets without system outputs if output_data is set to None
    2. Changes _statistics_func to simply take in the samples and system info, and return the statistics (in contrast to previously using the datalab aggregating() functionality.
    3. Loads data used in calculating training features through Loader so that appropriate field mapping will be performed

    Fixes https://github.com/neulab/ExplainaBoard/issues/416

    Notably, @pfliu-nlp, "2." may require some discussion, here are the pros and cons of doing it this new way:

    Pros

    • it makes the statistics code self-contained and not rely on an external library. honestly, even though I'm very familiar with explainaboard, I was always a bit confused about what was actually going on here because the aggregating() decorator was a bit mysterious to me
    • statistics_func can now be called on any set of samples, so it could be called on a non-datalab dataset. this may be useful if we want to, for example, calculate training set features with custom datasets

    Cons

    • the datalab aggregating operator may have implemented parallelism so this aggregation of statistics might be able to be done faster? but I actually am not sure if that's actually the case in practice
    • something else I'm missing?
    opened by neubig 9
  • Unsafe en_core_web_sm downloading in setup.py

    Unsafe en_core_web_sm downloading in setup.py

    Currently setup.py will execute an external command python -m spacy download en_core_web_sm to install a spaCy model during setup. This approach has several issues about system consystency:

    • spaCy models are intendedly not registered to PyPI, and PyPI does not allow libraries depending on external requirements.
    • The command is just a system command which possibly breaks the system, or won't work correctly.

    Since there is no recommended way to add spaCy models to install_requires, we need to take either of follows:

    • Download the model programatically when spacy.load() fails.
    • Bundle the model file into this repository.
    • Ask users to download appropriate models additionally.
    opened by odashi 9
  • How to name metrics when registering them

    How to name metrics when registering them

    There are two ways to name metrics

    (1)

    
    @dataclass
    @metric_config_registry.register("AccuracyConfig")
    class AccuracyConfig(MetricConfig):
        def to_metric(self):
            return Accuracy(self)
    
    

    (2)

    @dataclass
    @metric_config_registry.register("Accuracy")
    class AccuracyConfig(MetricConfig):
        def to_metric(self):
            return Accuracy(self)
    
    

    Currently, we are using (1), which, however, is inconsistent with how the Processor names them. For example:

    https://github.com/neulab/ExplainaBoard/blob/cd54c1b61e490295db8c1cfee8460aff4cce1880/explainaboard/processors/text_classification.py#L132

    Which one do you prefer?

    If we go with (2), this code should be modified to avoid naming bug: https://github.com/neulab/ExplainaBoard/blob/cd54c1b61e490295db8c1cfee8460aff4cce1880/explainaboard/metrics/registry.py#L11

    config_cls = metric_config_registry.get_type(dikt["name"]) # instead of type
    

    I could send a PR of this.

    opened by pfliu-nlp 8
  • add tests for meval to replicate paper results

    add tests for meval to replicate paper results

    Overview

    This PR adds tests to verify whether our implemented meta-evaluation processor is able to replicate reported results from existing published papers.

    Relevant issue: https://github.com/inspired-co/taskboard/issues/180

    Details

    • Collect system outputs from this repo of two metrics (rouge1 and bartscore)
    • Using Explainaboard to process these outputs and compare the results with the ones reported from the above repo.

    References

    • Paper: BARTSCORE: Evaluating Generated Text as Text Generation
    • Code: https://github.com/neulab/BARTScore
    opened by pfliu-nlp 0
  • `TypeError: 'type' object is not subscriptable` when attempt to import or use CLI

    `TypeError: 'type' object is not subscriptable` when attempt to import or use CLI

    How I install ?

    pip install explainaboard
    or
    pip install -U --force-reinstall explainaboard
    

    Both cause same problem

    Version : 0.12.3

    When try to import explainaboard, or run explainaboard from CLI, same error:

    Python 3.8.15 (default, Nov 24 2022, 15:19:38) 
    [GCC 11.2.0] :: Anaconda, Inc. on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import explainaboard
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/__init__.py", line 6, in <module>
        from explainaboard.loaders import DatalabLoaderOption, get_loader_class
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/loaders/__init__.py", line 5, in <module>
        from explainaboard.loaders import file_loader, loader_factory
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/loaders/file_loader.py", line 18, in <module>
        from explainaboard.analysis.analyses import Analysis
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/analysis/analyses.py", line 14, in <module>
        from explainaboard.analysis.bucketing import get_bucketing_method
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/analysis/bucketing.py", line 13, in <module>
        from explainaboard.serialization.types import SerializableData
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/serialization/__init__.py", line 8, in <module>
        from explainaboard.serialization.types import Serializable
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/serialization/types.py", line 21, in <module>
        list["PrimitiveData"],  # type: ignore
    TypeError: 'type' object is not subscriptable
    
    
    opened by ttpro1995 0
  • Bump mypy version to 0.990

    Bump mypy version to 0.990

    Since mypy 0.990 was released yesterday (blog post), it would be better to bump mypy version to 0.990 to take advantage of the new features and bug fixes. It seems there is some sort of efforts to be made to adopt the version when I run mypy 0.990 in the codebase of explainaboard. Below is the output of pre-commit run mypy --color=never --all-files

    mypy.....................................................................Failed
    - hook id: mypy
    - exit code: 1
    
    explainaboard/utils/spacy_loader.py:5: error: Cannot find implementation or library stub for module named "spacy"  [import]
    explainaboard/utils/spacy_loader.py:6: error: Cannot find implementation or library stub for module named "spacy.language"  [import]
    explainaboard/utils/agreement.py:5: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/sum_attribute.py:8: error: Cannot find implementation or library stub for module named "nltk"  [import]
    explainaboard/analysis/sum_attribute.py:10: error: Cannot find implementation or library stub for module named "nltk.util"  [import]
    explainaboard/utils/async_eaas.py:10: error: Cannot find implementation or library stub for module named "eaas"  [import]
    explainaboard/third_party/text_to_sql_test_suit_eval/parse.py:7: error: Cannot find implementation or library stub for module named "sqlparse"  [import]
    explainaboard/third_party/text_to_sql_test_suit_eval/parse.py:8: error: Cannot find implementation or library stub for module named "sqlparse.sql"  [import]
    explainaboard/third_party/text_to_sql_test_suit_eval/parse.py:9: error: Cannot find implementation or library stub for module named "sqlparse.tokens"  [import]
    setup.py:3: error: Skipping analyzing "setuptools": module is installed, but missing library stubs or py.typed marker  [import]
    explainaboard/metrics/auxiliary/qa_table_text_hybrid_auxiliary.py:16: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/auxiliary/qa_table_text_hybrid_auxiliary.py:17: error: Cannot find implementation or library stub for module named "scipy.optimize"  [import]
    explainaboard/utils/logging.py:9: error: Library stubs not installed for "tqdm"  [import]
    explainaboard/utils/logging.py:9: note: Hint: "python3 -m pip install types-tqdm"
    explainaboard/utils/logging.py:9: note: (or run "mypy --install-types" to install all missing stub packages)
    explainaboard/utils/logging.py:16: error: Incompatible default for argument "desc" (default has type "None", argument has type "str")  [assignment]
    explainaboard/utils/logging.py:16: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
    explainaboard/utils/logging.py:16: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
    explainaboard/visualizers/bar_chart.py:8: error: Cannot find implementation or library stub for module named "matplotlib"  [import]
    explainaboard/visualizers/bar_chart.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/bucketing.py:10: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/feature.py:239: error: Incompatible types in assignment (expression has type "Dict[str, FeatureType]", target has type "SerializableData")  [assignment]
    explainaboard/utils/agreement_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/utils/typing_utils_test.py:10: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/serialization/serializers.py:53: error: Incompatible return value type (got "Union[List[Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]], Tuple[Union[None, bool, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable], ...]]", expected "Union[None, bool, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]")  [return-value]
    explainaboard/serialization/serializers.py:53: error: Generator has incompatible item type "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"; expected "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [misc]
    explainaboard/serialization/serializers.py:89: error: Incompatible return value type (got "Union[List[Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]], Tuple[Union[None, bool, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]], ...]]", expected "Union[None, bool, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]")  [return-value]
    explainaboard/serialization/serializers.py:89: error: Generator has incompatible item type "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [misc]
    explainaboard/utils/tensor_analysis.py:12: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/metric.py:10: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/metric.py:11: error: Cannot find implementation or library stub for module named "scipy.stats"  [import]
    explainaboard/metrics/metric.py:178: error: Dict entry 0 has incompatible type "str": "Dict[str, MetricValue]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/metrics/metric.py:196: error: Argument 1 to "MetricResult" has incompatible type "Dict[str, Union[None, bool, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]]"; expected "Dict[str, MetricValue]"  [arg-type]
    explainaboard/third_party/text_to_sql_test_suit_eval/process_sql.py:30: error: Cannot find implementation or library stub for module named "nltk"  [import]
    explainaboard/utils/tokenizer.py:15: error: Cannot find implementation or library stub for module named "sacrebleu.tokenizers"  [import]
    explainaboard/utils/tokenizer.py:16: error: Cannot find implementation or library stub for module named "sacrebleu.tokenizers.tokenizer_intl"  [import]
    explainaboard/utils/tokenizer.py:17: error: Cannot find implementation or library stub for module named "sacrebleu.tokenizers.tokenizer_ja_mecab"  [import]
    explainaboard/utils/tokenizer.py:18: error: Cannot find implementation or library stub for module named "sacrebleu.tokenizers.tokenizer_zh"  [import]
    explainaboard/metrics/continuous.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/metric_test.py:10: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/external_eval.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/meta_evaluation.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/meta_evaluation.py:9: error: Cannot find implementation or library stub for module named "scipy"  [import]
    explainaboard/analysis/feature_test.py:69: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/feature_test.py:134: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/feature_test.py:205: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:230: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:231: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:232: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:233: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Collection[str]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:234: error: List item 0 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:234: error: List item 1 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:234: error: List item 2 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:235: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Tuple[Dict[str, object], Dict[str, object], Dict[str, object]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:237: error: Dict entry 0 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:237: error: Dict entry 1 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:237: error: Dict entry 2 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:240: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:241: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:242: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:243: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Collection[str]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:244: error: List item 0 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:244: error: List item 1 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:244: error: List item 2 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:245: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Tuple[Dict[str, object], Dict[str, object], Dict[str, object]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:247: error: Dict entry 0 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:247: error: Dict entry 1 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:247: error: Dict entry 2 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/metrics/eaas.py:9: error: Cannot find implementation or library stub for module named "eaas.async_client"  [import]
    explainaboard/metrics/eaas.py:10: error: Cannot find implementation or library stub for module named "eaas.config"  [import]
    explainaboard/metrics/eaas.py:11: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/eaas.py:12: error: Cannot find implementation or library stub for module named "sacrebleu"  [import]
    explainaboard/metrics/eaas.py:13: error: Cannot find implementation or library stub for module named "sacrebleu.metrics.base"  [import]
    explainaboard/metrics/eaas.py:13: error: Cannot find implementation or library stub for module named "sacrebleu.metrics"  [import]
    explainaboard/metrics/ranking.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/performance.py:51: error: Dict entry 1 has incompatible type "str": "List[int]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/performance.py:52: error: Dict entry 2 has incompatible type "str": "Dict[str, MetricResult]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/performance.py:72: error: Argument 1 to "float" has incompatible type "Union[str, None, int, float, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"; expected "Union[SupportsFloat, SupportsIndex, str, bytes, bytearray, memoryview, array[Any], mmap, _CData, PickleBuffer]"  [arg-type]
    explainaboard/analysis/performance.py:73: error: Argument 1 to "float" has incompatible type "Union[str, None, int, float, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"; expected "Union[SupportsFloat, SupportsIndex, str, bytes, bytearray, memoryview, array[Any], mmap, _CData, PickleBuffer]"  [arg-type]
    explainaboard/metrics/log_prob.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/accuracy.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/external_eval_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/performance_test.py:219: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/performance_test.py:241: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/metrics/qa_table_text_hybrid.py:10: error: Cannot find implementation or library stub for module named "numpy"  [import]
    integration_tests/meta_eval_nlg_test.py:5: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/accuracy_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/analyses.py:12: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/analyses.py:245: error: Dict entry 0 has incompatible type "str": "List[BucketPerformance]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:446: error: Dict entry 0 has incompatible type "str": "List[BucketPerformance]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:563: error: Argument "bucket_setting" to "__call__" of "BucketingFn" has incompatible type "List[Tuple[float, float]]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses.py:563: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses.py:563: note: Consider using "Sequence" instead, which is covariant
    explainaboard/analysis/analyses.py:658: error: Dict entry 2 has incompatible type "str": "List[int]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:722: error: Dict entry 1 has incompatible type "str": "List[ComboOccurence]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:841: error: Dict entry 1 has incompatible type "str": "Dict[str, FeatureType]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:842: error: Dict entry 2 has incompatible type "str": "Dict[str, MetricConfig]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/metrics/extractive_qa.py:11: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/analyses_test.py:90: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Collection[str]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:237: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[BucketPerformance]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:237: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:237: note: Consider using "Sequence" instead, which is covariant
    explainaboard/analysis/analyses_test.py:266: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:280: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:321: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[ComboOccurence]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:321: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:321: note: Consider using "Sequence" instead, which is covariant
    explainaboard/analysis/analyses_test.py:328: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Union[Sequence[str], None, int, float, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:350: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:477: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[BucketPerformance]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:477: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:477: note: Consider using "Sequence" instead, which is covariant
    explainaboard/analysis/analyses_test.py:507: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:518: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "Dict[str, FeatureType]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:518: note: "Dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:518: note: Consider using "Mapping" instead, which is covariant in the value type
    explainaboard/analysis/analyses_test.py:519: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "Dict[str, MetricConfig]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:519: note: "Dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:519: note: Consider using "Mapping" instead, which is covariant in the value type
    explainaboard/analysis/result.py:33: error: Dict entry 0 has incompatible type "str": "Dict[str, Dict[str, MetricResult]]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/result.py:34: error: Dict entry 1 has incompatible type "str": "List[AnalysisResult]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/loaders/file_loader.py:15: error: Cannot find implementation or library stub for module named "datalabs"  [import]
    explainaboard/loaders/file_loader.py:16: error: Cannot find implementation or library stub for module named "datalabs.features.features"  [import]
    explainaboard/loaders/file_loader.py:212: error: Incompatible default for argument "fields" (default has type "None", argument has type "List[FileLoaderField]")  [assignment]
    explainaboard/loaders/file_loader.py:212: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
    explainaboard/loaders/file_loader.py:212: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
    explainaboard/loaders/file_loader.py:475: error: Incompatible default for argument "fields" (default has type "None", argument has type "List[FileLoaderField]")  [assignment]
    explainaboard/loaders/file_loader.py:475: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
    explainaboard/loaders/file_loader.py:475: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
    explainaboard/loaders/file_loader.py:522: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/analysis/result_test.py:35: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Dict[str, MetricResult]]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/result_test.py:36: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[AnalysisResult]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/result_test.py:36: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/result_test.py:36: note: Consider using "Sequence" instead, which is covariant
    explainaboard/third_party/text_to_sql_test_suit_eval/exec_eval.py:11: error: Library stubs not installed for "tqdm"  [import]
    explainaboard/info.py:186: error: Dict entry 11 has incompatible type "str": "List[AnalysisLevel]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/info.py:187: error: Dict entry 12 has incompatible type "str": "List[Analysis]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/info.py:260: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/feature_funcs.py:8: error: Cannot find implementation or library stub for module named "lexicalrichness"  [import]
    explainaboard/analysis/feature_funcs.py:8: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
    explainaboard/analysis/feature_funcs.py:9: error: Cannot find implementation or library stub for module named "sacrebleu"  [import]
    explainaboard/meta_analyses/ranking.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/meta_analyses/ranking.py:9: error: Cannot find implementation or library stub for module named "pandas"  [import]
    explainaboard/metrics/f1_score.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/processors/processor.py:9: error: Cannot find implementation or library stub for module named "eaas.async_client"  [import]
    explainaboard/processors/processor.py:10: error: Cannot find implementation or library stub for module named "eaas.config"  [import]
    explainaboard/processors/sequence_labeling.py:43: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/processors/argument_pair_extraction.py:34: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/processors/qa_tat.py:7: error: Cannot find implementation or library stub for module named "datalabs"  [import]
    explainaboard/processors/language_modeling.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/processors/conditional_generation.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/processors/cloze_generative.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/processors/summarization.py:8: error: Cannot find implementation or library stub for module named "datalabs.operations.featurize.plugins.summarization.sum_attribute"  [import]
    integration_tests/summarization_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    integration_tests/meta_eval_wmt_da_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/text_to_sql.py:11: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/f1_score_test.py:7: error: Cannot find implementation or library stub for module named "sklearn.metrics"  [import]
    explainaboard/visualizers/draw_charts.py:24: error: Cannot find implementation or library stub for module named "matplotlib"  [import]
    explainaboard/visualizers/draw_charts.py:25: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/info_test.py:116: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[AnalysisLevel]"; expected "SerializableData"  [arg-type]
    explainaboard/info_test.py:116: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/info_test.py:116: note: Consider using "Sequence" instead, which is covariant
    explainaboard/info_test.py:117: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[Analysis]"; expected "SerializableData"  [arg-type]
    explainaboard/info_test.py:117: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/info_test.py:117: note: Consider using "Sequence" instead, which is covariant
    explainaboard/info_test.py:160: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Union[Collection[str], None, int, float, List[PrimitiveData], Tuple[PrimitiveData, ...]]]"; expected "PrimitiveData"  [arg-type]
    integration_tests/metric_test.py:6: error: Cannot find implementation or library stub for module named "eaas"  [import]
    integration_tests/metric_test.py:7: error: Cannot find implementation or library stub for module named "eaas.async_client"  [import]
    integration_tests/metric_test.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/explainaboard_main.py:10: error: Cannot find implementation or library stub for module named "eaas.endpoint"  [import]
    explainaboard/explainaboard_main.py:10: error: Cannot find implementation or library stub for module named "eaas"  [import]
    explainaboard/explainaboard_main.py:89: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:90: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:91: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:92: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:93: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:94: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:364: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:365: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:367: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:368: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:369: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:370: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:371: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:390: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:401: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:402: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:403: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:404: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:405: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:406: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:407: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:408: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:499: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    integration_tests/cli_test.py:10: error: Cannot find implementation or library stub for module named "datalabs"  [import]
    Found 141 errors in 59 files (checked 231 source files)
    
    opened by tetsuok 0
  • add_tasks.md is out of date

    add_tasks.md is out of date

    It seems add_tasks.md is out of date. add_tasks.md mentions tasks.py in three places below:

    • https://github.com/neulab/ExplainaBoard/blame/fcedd5d7aab172b943c6b0025685b09744f149fd/docs/add_new_tasks.md#L6
    • https://github.com/neulab/ExplainaBoard/blame/fcedd5d7aab172b943c6b0025685b09744f149fd/docs/add_new_tasks.md#L12
    • https://github.com/neulab/ExplainaBoard/blame/fcedd5d7aab172b943c6b0025685b09744f149fd/docs/add_new_tasks.md#L133

    but the Python script was removed in #373. add_tasks.md needs to be updated properly.

    opened by tetsuok 0
  • Add system metadata class

    Add system metadata class

    Processor.process() takes metadata, which is used to directly initialize SysOutputInfo. However, these are essentially different data (especially, "metadata" $\subset$ SysOutputInfo, but not $=$) and the current implementation makes some confusion around this:

    The most significant abuse around this behavior is that FileLoaderMetadata is implicitly converted into SysOutputInfo. This shouldn't work unless explicit conversion: https://github.com/neulab/ExplainaBoard/blob/4cec0a01cbe2617e9a67a440be25ee4252f792b2/integration_tests/ner_test.py#L148-L154

    To this end, we need:

    • A struct defining the system metadata.
    • Change the behavior of Processor to take the system metadata, not a dict.
    • Either:
      • A conversion method between system metadata and FileLoaderReturn/SysOutputInfo
      • Include system metadata as a direct member of FileLoaderReturn/SysOutputInfo
    opened by odashi 3
  • Reconsider default number of buckets

    Reconsider default number of buckets

    Currently the default number of buckets is 4: https://github.com/neulab/ExplainaBoard/blob/38db95801cbd15e2e9b2db7b60c40bd7173e1deb/explainaboard/analysis/analyses.py#L117

    But this is probably too few when we're doing discrete bucketing. It'd probably be better to have the default be 4 for continuous and more (maybe 10) for discrete bucketing.

    opened by neubig 0
Releases(v0.8.5)
  • v0.8.5(Apr 2, 2022)

    This release:

    • Refactors the metrics class and the report structure.
    • Adds significance tests to all metrics.
    • Does major code style improvements and adds type checking.
    • Fixes several bugs.
    Source code(tar.gz)
    Source code(zip)
Owner
NeuLab
Graham Neubig's Lab at LTI/CMU
NeuLab
An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

NLP-Pytorch-Assignment An assignment from my grad-level data mining course (before I started personal projects) demonstrating some experience with NLP

David Thorne 0 Feb 06, 2022
Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

0 Feb 13, 2022
CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus

CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus CVSS is a massively multilingual-to-English speech-to-speech translation corpus, co

Google Research Datasets 118 Jan 06, 2023
A retro text-to-speech bot for Discord

hawking A retro text-to-speech bot for Discord, designed to work with all of the stuff you might've seen in Moonbase Alpha, using the existing command

Nick Schorr 23 Dec 25, 2022
Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

smart-school-chatbot Smart discord chatbot integrated with Dialogflow to interact with students naturally and manage different classes in a school. De

Tom Huynh 5 Oct 24, 2022
Retraining OpenAI's GPT-2 on Discord Chats

Train OpenAI's GPT-2 on Discord Chats Retraining a Text Generation Model on Discord Chats using gpt-2-simple that wraps existing model fine-tuning and

Ayush Mishra 4 Oct 27, 2022
State of the Art Natural Language Processing

Spark NLP: State of the Art Natural Language Processing Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provide

John Snow Labs 3k Jan 05, 2023
Augmenty is an augmentation library based on spaCy for augmenting texts.

Augmenty: The cherry on top of your NLP pipeline Augmenty is an augmentation library based on spaCy for augmenting texts. Besides a wide array of high

Kenneth Enevoldsen 124 Dec 29, 2022
🧪 Cutting-edge experimental spaCy components and features

spacy-experimental: Cutting-edge experimental spaCy components and features This package includes experimental components and features for spaCy v3.x,

Explosion 65 Dec 30, 2022
LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation Tasks | Datasets | LongLM | Baselines | Paper Introduction LOT is a ben

46 Dec 28, 2022
Sploitus - Command line search tool for sploitus.com. Think searchsploit, but with more POCs

Sploitus Command line search tool for sploitus.com. Think searchsploit, but with

watchdog2000 5 Mar 07, 2022
texlive expressions for documents

tex2nix Generate Texlive environment containing all dependencies for your document rather than downloading gigabytes of texlive packages. Installation

Jörg Thalheim 70 Dec 26, 2022
This repository contains examples of Task-Informed Meta-Learning

Task-Informed Meta-Learning This repository contains examples of Task-Informed Meta-Learning (paper). We consider two tasks: Crop Type Classification

10 Dec 19, 2022
🏆 • 5050 most frequent words in 109 languages

🏆 Most Common Words Multilingual 5000 most frequent words in 109 languages. Uses wordfrequency.info as a source. 🔗 License source code license data

14 Nov 24, 2022
This is an incredibly powerful calculator that is capable of many useful day-to-day functions.

Description 💻 This is an incredibly powerful calculator that is capable of many useful day-to-day functions. Such functions include solving basic ari

Jordan Leich 37 Nov 19, 2022
LeBenchmark: a reproducible framework for assessing SSL from speech

LeBenchmark: a reproducible framework for assessing SSL from speech

11 Nov 30, 2022
Graphical user interface for Argos Translate

Argos Translate GUI Website | GitHub | PyPI Graphical user interface for Argos Translate. Install pip3 install argostranslategui

Argos Open Tech 16 Dec 07, 2022
Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.

flashgeotext ⚡ 🌍 Extract and count countries and cities (+their synonyms) from text, like GeoText on steroids using FlashText, a Aho-Corasick impleme

Ben 57 Dec 16, 2022
A Transformer Implementation that is easy to understand and customizable.

Simple Transformer I've written a series of articles on the transformer architecture and language models on Medium. This repository contains an implem

Naoki Shibuya 4 Jan 20, 2022
Yodatranslator is a simple translator English to Yoda-language

yodatranslator Overview yodatranslator is a simple translator English to Yoda-language. Project is created for educational purposes. It is intended to

1 Nov 11, 2021