Client library to download and publish models and other files on the huggingface.co hub

Overview

huggingface_hub

Client library to download and publish models and other files on the huggingface.co hub

Build GitHub GitHub release

Do you have an open source ML library? We're looking to partner with a small number of other cool open source ML libraries to provide model hosting + versioning. https://twitter.com/julien_c/status/1336374565157679104 https://twitter.com/mnlpariente/status/1336277058062852096

Advantages are:

  • versioning is built-in (as hosting is built around git and git-lfs), no lock-in, you can just git clone away.
  • anyone can upload a new model for your library, just need to add the corresponding tag for the model to be discoverable – no more need for a hardcoded list in your code
  • Fast downloads! We use Cloudfront (a CDN) to geo-replicate downloads so they're blazing fast from anywhere on the globe
  • Usage stats and more features to come

Ping us if interested 😎


♻️ Partial list of implementations in third party libraries:


Download files from the huggingface.co hub

Integration inside a library is super simple. We expose two functions, hf_hub_url() and cached_download().

hf_hub_url

hf_hub_url() takes:

  • a model id (like julien-c/EsperBERTo-small i.e. a user or organization name and a repo name, separated by /),
  • a filename (like pytorch_model.bin),
  • and an optional git revision id (can be a branch name, a tag, or a commit hash)

and returns the url we'll use to download the actual files: https://huggingface.co/julien-c/EsperBERTo-small/resolve/main/pytorch_model.bin

If you check out this URL's headers with a HEAD http request (which you can do from the command line with curl -I) for a few different files, you'll see that:

  • small files are returned directly
  • large files (i.e. the ones stored through git-lfs) are returned via a redirect to a Cloudfront URL. Cloudfront is a Content Delivery Network, or CDN, that ensures that downloads are as fast as possible from anywhere on the globe.

cached_download

cached_download() takes the following parameters, downloads the remote file, stores it to disk (in a versioning-aware way) and returns its local file path.

Parameters:

  • a remote url
  • your library's name and version (library_name and library_version), which will be added to the HTTP requests' user-agent so that we can provide some usage stats.
  • a cache_dir which you can specify if you want to control where on disk the files are cached.

Check out the source code for all possible params (we'll create a real doc page in the future).


Publish models to the huggingface.co hub

Uploading a model to the hub is super simple too:

  • create a model repo directly from the website, at huggingface.co/new (models can be public or private, and are namespaced under either a user or an organization)
  • clone it with git
  • download and install git lfs if you don't already have it on your machine (you can check by running a simple git lfs)
  • add, commit and push your files, from git, as you usually do.

We are intentionally not wrapping git too much, so that you can go on with the workflow you’re used to and the tools you already know.

πŸ‘€ To see an example of how we document the model sharing process in transformers, check out https://huggingface.co/transformers/model_sharing.html

Users add tags into their README.md model cards (e.g. your library_name, a domain tag like audio, etc.) to make sure their models are discoverable.

Documentation about the model hub itself is at https://huggingface.co/docs

API utilities in hf_api.py

You don't need them for the standard publishing workflow, however, if you need a programmatic way of creating a repo, deleting it (⚠️ caution), or listing models from the hub, you'll find helpers in hf_api.py.

We also have an API to query models by specific tags (e.g. if you want to list models compatible to your library)

huggingface-cli

Those API utilities are also exposed through a CLI:

huggingface-cli login
huggingface-cli logout
huggingface-cli whoami
huggingface-cli repo create

Need to upload large (>5GB) files?

To upload large files (>5GB πŸ”₯ ), you need to install the custom transfer agent for git-lfs, bundled in this package.

To install, just run:

$ huggingface-cli lfs-enable-largefiles

This should be executed once for each model repo that contains a model file >5GB. If you just try to push a file bigger than 5GB without running that command, you will get an error with a message reminding you to run it.

Finally, there's a huggingface-cli lfs-multipart-upload command but that one is internal (called by lfs directly) and is not meant to be called by the user.


Visual integration into the huggingface.co hub

Finally, we'll implement a few tweaks to improve the UX for your models on the website – let's use Asteroid as an example:

asteroid-model

Model authors add an asteroid tag to their model card and they get the advantages of model versioning built-in

use-in-asteroid

We add a custom "Use in Asteroid" button.

asteroid-code-sample

When clicked you get a library-specific code sample that you'll be able to specify. πŸ”₯


Feedback (feature requests, bugs, etc.) is super welcome πŸ’™ πŸ’š πŸ’› πŸ’œ β™₯️ 🧑

Comments
  • :triangular_flag_on_post: Scan cache tool: ability to free up space

    :triangular_flag_on_post: Scan cache tool: ability to free up space

    Originally from @stas00 in slack (internal link):

    for me the main query / need is usually to free up some disk space and so I'd look at the top entries of these 3 groups:

    1. nuking the largest entries (obvious)
    2. nuking the long not accessed entries (sorted by access time)
    3. may be also nuking the oldest entries (probably not using them anymore) (i.e. sorted by file create time) - but most likely 2. would have already caught this category

    In general, this is a feature request already discussed in https://github.com/huggingface/huggingface_hub/pull/990. The approach of "pruning" we are aiming is to provide the user information and a tool to delete a specific revision and then make it as easy as possible for the user to define its own strategy.

    enhancement 
    opened by Wauplin 32
  • Developer mode requirement on Windows

    Developer mode requirement on Windows

    The current snapshot_download and hf_hub_download methods currently use symlinks for efficient storage management. However, symlinks are not properly supported on Windows where administrator privileges or Developer Mode needs to be enabled in order to be used.

    We chose to take this approach so that it mirrors the linux/osx behavior.

    Opening an issue here to track issues encountered by users in the ecosystem:

    • https://github.com/huggingface/transformers/issues/19048
    opened by LysandreJik 27
  • Add autcompletion to huggingface-cli (fix #1197)

    Add autcompletion to huggingface-cli (fix #1197)

    $ huggingface-cli repo create -<TAB>
    Name for your repo. Will be namespaced under your username to build the repo id.
    option
    --help           show this help message and exit
    -h               show this help message and exit
    --organization   Optional: organization namespace.
    --space_sdk      Optional: Hugging Face Spaces SDK type. Required when --type is set to "space".
    --type           Optional: repo_type: set to "dataset" or "space" if creating a dataset or space, default is model.
    --yes            Optional: answer Yes to the prompt
    -y               Optional: answer Yes to the prompt
    
    on-hold 
    opened by Freed-Wu 25
  • Git: find a

    Git: find a "better" way to handle tokens than git credential store

    Mentioned in https://github.com/huggingface/huggingface_hub/issues/1043#issuecomment-1246009544.

    Currently we store the user token for git commands in the git-credential-store. This is the default git storage that stores creds in plain text in a file. huggingface_hub warns the user to use it by default to avoid problems (by running git config --global credential.helper store). In a perfect world, it would be good to use the default credential helper from the user. In particular, macos users have a macosxkeychain tool by default to securely handle credentials.

    Another possibility is to not store the credential in git and automatically fill the values (from python) when git requires them (in the Repository module).

    Note: I am no expert on that topic so any addition is welcomed here :)

    Useful links:

    (Edit: also to mention that when a user do huggingface-cli login or notebook_login(), the token is also stored locally in plain text in the home directory ~/.huggingface/token to be reused in API calls. Changing this is out of topic for this issue)

    opened by Wauplin 22
  • 413 Client Error: Payload Too Large when using upload_folder on a lot of files

    413 Client Error: Payload Too Large when using upload_folder on a lot of files

    Describe the bug

    When trying to commit a folder with many CSV files, I got the following error:

    HTTPError: 413 Client Error: Payload Too Large for url: https://huggingface.co/api/datasets/nateraw/test-upload-folder-bug/preupload/main

    I assume there is a limit to total payload size when uploading a folder that I am going over here. I confirmed it has nothing to do with the number of files, but rather the total size of the files that are being uploaded. It would be great in the short term if we could document what this limit is clearly in the upload_folder fn.

    Reproduction

    The following fails on the last line. I wrote it so you can run it yourself without updating the repo ID or anything...so if you're logged in, the below should work (assuming you have torchvision installed).

    import os
    
    from torchvision.datasets.utils import download_and_extract_archive
    from huggingface_hub import upload_folder, whoami, create_repo
    
    user = whoami()['name']
    repo_id = f'{user}/test-upload-folder-bug'
    create_repo(repo_id, exist_ok=True, repo_type='dataset')
    
    os.mkdir('./data')
    download_and_extract_archive(
        url='https://zenodo.org/api/files/f7f7377b-8405-4d4f-b814-f021df5593b1/hyperbard_data.zip',
        download_root='./data',
        remove_finished=True
    )
    upload_folder(
        folder_path='./data',
        path_in_repo="",
        repo_id=repo_id,
        repo_type='dataset'
    )
    

    Logs

    ---------------------------------------------------------------------------
    HTTPError                                 Traceback (most recent call last)
    <ipython-input-2-91516b1ea47f> in <module>()
         18     path_in_repo="",
         19     repo_id=repo_id,
    ---> 20     repo_type='dataset'
         21 )
    
    3 frames
    /usr/local/lib/python3.7/dist-packages/huggingface_hub/hf_api.py in upload_folder(self, repo_id, folder_path, path_in_repo, commit_message, commit_description, token, repo_type, revision, create_pr)
       2115             token=token,
       2116             revision=revision,
    -> 2117             create_pr=create_pr,
       2118         )
       2119 
    
    /usr/local/lib/python3.7/dist-packages/huggingface_hub/hf_api.py in create_commit(self, repo_id, operations, commit_message, commit_description, token, repo_type, revision, create_pr, num_threads)
       1813             token=token,
       1814             revision=revision,
    -> 1815             endpoint=self.endpoint,
       1816         )
       1817         upload_lfs_files(
    
    /usr/local/lib/python3.7/dist-packages/huggingface_hub/_commit_api.py in fetch_upload_modes(additions, repo_type, repo_id, token, revision, endpoint)
        380         headers=headers,
        381     )
    --> 382     resp.raise_for_status()
        383 
        384     preupload_info = validate_preupload_info(resp.json())
    
    /usr/local/lib/python3.7/dist-packages/requests/models.py in raise_for_status(self)
        939 
        940         if http_error_msg:
    --> 941             raise HTTPError(http_error_msg, response=self)
        942 
        943     def close(self):
    
    HTTPError: 413 Client Error: Payload Too Large for url: https://huggingface.co/api/datasets/nateraw/test-upload-folder-bug/preupload/main
    
    
    
    ### System Info
    
    ```shell
    Colab
    
    bug 
    opened by nateraw 22
  • Add fastai upstream and downstream capacities for fastai>=2.4 and fastcore>=1.3.27 versions

    Add fastai upstream and downstream capacities for fastai>=2.4 and fastcore>=1.3.27 versions

    Sorry, I created a new PR (previous one: PR huggingface/huggingface_hub#416) I had a couple of mistakes with git, and seemed easier to create a new one. Thanks @muellerzr and @osanseviero for your comments!

    What

    • Add fastai upstream and downstream capabilities for versions fastai>=2.4 (link) and fastcore>=1.3.27 (link).
    • Inform users that lower versions of fastai are not supported yet (TODO -- Examine whether it is worth implementing previous versions).

    Why

    • Supporting version fastai1 might not be worth it. Loading and pushing Learners involves several changes and the fastai1 library has not been updated for over a year.
    • Supporting versions 2.0.6>= fastai <2.4 involve complex changes. Fastai was updated due to what appear to be modifications to pytorch/serialization.py. For the sake of agility in our first release, I suggest releasing this version without supporting these previous versions.
    • fastai>=2.4 versions work with fastcore>=1.3.27 version which is installed automatically when fastai>=2.4 is installed.

    How?

    • Following huggingface_hub practices, file_download.py checks for fastcore and fastai availability and versions.
    • The fastai and fastcore versions used to train the Learner are automatically stored in a config.json when pushed to hub.
    • A README.md is automatically generated, if none, with the fastai tag.

    Testing?

    • Tested that all fastai>=2.4 versions work with the present code and with the fastcore==1.3.27 version.

    Next?

    • Examine whether it is worth implementing fastai <2.4 versions. I will ask on the fastai forums to assess how many users would require this support.
    • Update description on how to load fastai models in Libraries.ts.
    enhancement 
    opened by omarespejel 22
  • FIX Avoid creating repository when it exists on remote

    FIX Avoid creating repository when it exists on remote

    This PR partially fixes https://github.com/huggingface/huggingface_hub/issues/672.

    I observed that the following error pops up when I create a repository using huggingface API create_repo and then have a local repository cloning from the URL using a token (here) ValueError: No space_sdk provided. create_repo expects space_sdk to be one of ['gradio', 'streamlit', 'static'] when repo_type is 'space' So I noticed the following block was causing the problem:

    if token is not None:
                    whoami_info = self.client.whoami(token)
                    user = whoami_info["name"]
                    valid_organisations = [org["name"] for org in whoami_info["orgs"]]
    
                    if namespace is not None:
                        repo_id = f"{namespace}/{repo_id}"
                    repo_url += repo_id
    
                    scheme = urlparse(repo_url).scheme
                    repo_url = repo_url.replace(f"{scheme}://", f"{scheme}://user:{token}@")
    
                    if namespace == user or namespace in valid_organisations:
                        self.client.create_repo(
                            repo_id=repo_id,
                            token=token,
                            repo_type=self.repo_type,
                            exist_ok=True,
                            private=self.private,
                        )
                else:
                    if namespace is not None:
                        repo_url += f"{namespace}/"
                    repo_url += repo_id
    

    Only name being valid shouldn't be enough to create a repository, we have to check if the repository exists on remote so I added a small check on that (thanks @muellerzr for pointing out to particular HfApi function that does it).

    With this, space_sdk error should be gone. I'll add test once @LysandreJik approves my logic in here.

    Also this is separate but IDK if it's good to have an attribute clone_from and the clone_from() function. I can refactor couple of things I saw there that's not good to me as well.

    opened by merveenoyan 21
  • Add text classification for spaCy

    Add text classification for spaCy

    This requires adding a text-classification script here that can be based in the token-classification implementation.

    There is useful spaCy documentation in https://spacy.io/api/textcategorizer, but I think this should be straightforward to implement. Here is an example repo to use for testing - https://huggingface.co/edichief/en_textcat_goemotions

    good first issue 
    opened by osanseviero 20
  • [RFC] Proposal for a way to cache files in downstream libraries

    [RFC] Proposal for a way to cache files in downstream libraries

    This is a proposal following discussions started with the datasets team (@lhoestq @albertvillanova).

    The goal is to have a proper way to cache any kind of files from a downstream library and manage them (e.g.: scan and delete) from huggingface_hub. From hfh's perspective, there is not much work to do. We should have a canonical procedure to generate cache paths for a library. Then within a cache folder, the downstream library handles its files as it wants. Once this helper starts to be used, we can adapt the scan-cache and delete-cache commands.

    I tried to document the cached_assets_path() helper to describe the way I see it. Any feedback is welcomed, this is really just a proposal. All the examples are very datasets-focused but I think this could benefit to other libraries as transformers (@sgugger @LysandreJik ), diffusers (@apolinario @patrickvonplaten) or skops (@adrinjalali @merveenoyan) to store any kind of intermediate files. IMO the difficulty mainly resides in making the feature used :smile:.

    EDIT: see generated documentation here. EDIT 2: assets/ might be a better naming here (common naming in dev)

    WDYT ?

    (cc @julien-c @osanseviero as well)

    Example:

    >>> from huggingface_hub import cached_assets_path
    
    >>> cached_assets_path(library_name="datasets", namespace="SQuAD", subfolder="download")
    PosixPath('/home/wauplin/.cache/huggingface/extra/datasets/SQuAD/download')
    
    >>> cached_assets_path(library_name="datasets", namespace="SQuAD", subfolder="extracted")
    PosixPath('/home/wauplin/.cache/huggingface/extra/datasets/SQuAD/extracted')
    
    >>> cached_assets_path(library_name="datasets", namespace="SQuAD")
    PosixPath('/home/wauplin/.cache/huggingface/extra/datasets/SQuAD/default')
    
    >>> cached_assets_path(library_name="datasets", subfolder="modules")
    PosixPath('/home/wauplin/.cache/huggingface/extra/datasets/default/modules')
    
    >>> cached_assets_path(library_name="datasets", cache_dir="/tmp/tmp123456")
    PosixPath('/tmp/tmp123456/datasets/default/default')
    

    And the generated tree:

        assets/
        β”œβ”€β”€ datasets/
        β”‚   β”œβ”€β”€ default/
        β”‚   β”‚   β”œβ”€β”€ modules/
        β”‚   β”œβ”€β”€ SQuAD/
        β”‚   β”‚   β”œβ”€β”€ downloaded/
        β”‚   β”‚   β”œβ”€β”€ extracted/
        β”‚   β”‚   └── processed/
        β”‚   β”œβ”€β”€ Helsinki-NLP--tatoeba_mt/
        β”‚       β”œβ”€β”€ downloaded/
        β”‚       β”œβ”€β”€ extracted/
        β”‚       └── processed/
        └── transformers/
            β”œβ”€β”€ default/
            β”‚   β”œβ”€β”€ something/
            β”œβ”€β”€ bert-base-cased/
            β”‚   β”œβ”€β”€ default/
            β”‚   └── training/
        hub/
        └── models--julien-c--EsperBERTo-small/
            β”œβ”€β”€ blobs/
            β”‚   β”œβ”€β”€ (...)
            β”‚   β”œβ”€β”€ (...)
            β”œβ”€β”€ refs/
            β”‚   └── (...)
            └── [ 128]  snapshots/
                β”œβ”€β”€ 2439f60ef33a0d46d85da5001d52aeda5b00ce9f/
                β”‚   β”œβ”€β”€ (...)
                └── bbc77c8132af1cc5cf678da3f1ddf2de43606d48/
                    └── (...)
    
    opened by Wauplin 19
  • API deprecate positional args in file_download and hf_api

    API deprecate positional args in file_download and hf_api

    Fixes #732

    This PR deprecates passing positional args to most functions and methods in file_download.py and hf_api.py.

    Things to discuss:

    • whether we want to make all parameters kwarg only or leave some as positional
    • make more parts of the API kwarg only

    cc @julien-c @LysandreJik @osanseviero

    Question: do we have a place to put changelog/release logs? How do we handle those now?

    opened by adrinjalali 19
  • Logging with organization token is successful and leads to side effects

    Logging with organization token is successful and leads to side effects

    It was reported that users get KeyError when push_to_hub_keras() is used to push for organization.

    Example:

    from huggingface_hub import push_to_hub_keras
    push_to_hub_keras(model=vqvae_trainer, repo_url='https://huggingface.co/keras-io/vq_vae', organization='keras-io')
    

    When organization token is explicitly written, the problem goes away. Example:

    push_to_hub_keras(model=forest_model, .....
                      repo_path_or_name='.', repo_url = "https://huggingface.co/keras-io/deep-neural-decision-forests", 
                      use_auth_token=keras_io_hub_token)
    

    Is this intended behavior? @nateraw Also cc: @osanseviero

    opened by merveenoyan 19
  • Repository does not work on HF spaces

    Repository does not work on HF spaces

    Hi,

    I am recently getting an error when trying to use my HF space together with the Repository from HF hub resulting in the following error message.

    I would assume this is a bug since before it worked nicely. On my local machine the entrypoint.py below also works. I am open for any suggestions πŸ˜ƒ

    Traceback (most recent call last):
      File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/repository.py", line 742, in clone_from
        run_subprocess("git lfs install", self.local_dir)
      File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/utils/_subprocess.py", line 61, in run_subprocess
        return subprocess.run(
      File "/usr/local/lib/python3.8/subprocess.py", line 516, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['git', 'lfs', 'install']' returned non-zero exit status 2.
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "entrypoint.py", line 6, in 
        Repository("repos/hand-ki-model", f"https://oauth2:{os.getenv('HANDKIGIT5')}@git5.cs.fau.de/folle/hand-ki-model.git", use_auth_token=os.getenv(""))
      File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f
        return f(*args, **kwargs)
      File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 124, in _inner_fn
        return fn(*args, **kwargs)
      File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/repository.py", line 528, in __init__
        self.clone_from(repo_url=clone_from)
      File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 124, in _inner_fn
        return fn(*args, **kwargs)
      File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/repository.py", line 797, in clone_from
        raise EnvironmentError(exc.stderr)
    OSError: Hook already exists: pre-push
    
    	#!/bin/sh
    	command -v git-lfs >/dev/null 2>&1 || { echo >&2 "\nThis repository is configured for Git LFS but 'git-lfs' was not found on your path. If you no longer wish to use Git LFS, remove this hook by deleting '.git/hooks/pre-push'.\n"; exit 2; }
    	git lfs pre-push "$@"
    
    To resolve this, either:
      1: run `git lfs update --manual` for instructions on how to merge hooks.
      2: run `git lfs update --force` to overwrite your hook.
    

    This is my entrypoint.py:

    import os
    import sys
    import subprocess
    from huggingface_hub import Repository
    
    Repository("repos/hand-ki-model", f"https://oauth2:{os.getenv('HANDKIGIT5')}@git5.cs.fau.de/folle/hand-ki-model.git", use_auth_token=os.getenv(""))
    subprocess.check_call([sys.executable, "-m", "pip", "install", "repos/hand-ki-model/"])
    import app
    
    opened by lukasfolle 0
  • [Dataset | Model card] When pushing to template repos, work on actual raw contents

    [Dataset | Model card] When pushing to template repos, work on actual raw contents

    Question: do we actually want this feature or not?

    Internal Slack convo cc @Wauplin

    Details

    Generated commit for model cards (would need to do the same for datasets): https://huggingface.co/templates/model-card-example/commit/901deccf5acead553f9b082aca480d966e61f355

    opened by julien-c 5
  • KeyError: 'multilinguality' when calling DatasetSearchArguments()

    KeyError: 'multilinguality' when calling DatasetSearchArguments()

    Describe the bug

    KeyError: 'multilinguality' when calling DatasetSearchArguments()

    Reproduction

    from huggingface_hub import DatasetSearchArguments
    dataset_args = DatasetSearchArguments()
    

    Logs

    from huggingface_hub import DatasetSearchArguments
    dataset_args = DatasetSearchArguments()
    
    
    Error:
    
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    /var/folders/1h/lqt86wdn4nq9z9h_c4q7s4gr0000gn/T/ipykernel_43840/980484241.py in <module>
          1 from huggingface_hub import DatasetSearchArguments
    ----> 2 dataset_args = DatasetSearchArguments()
    
    /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/huggingface_hub/hf_api.py in __init__(self, api)
        548     def __init__(self, api: Optional["HfApi"] = None):
        549         self._api = api if api is not None else HfApi()
    --> 550         tags = self._api.get_dataset_tags()
        551         super().__init__(tags)
        552         self._process_models()
    
    /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/huggingface_hub/hf_api.py in get_dataset_tags(self)
        669         hf_raise_for_status(r)
        670         d = r.json()
    --> 671         return DatasetTags(d)
        672 
        673     @_deprecate_list_output(version="0.14")
    
    /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/huggingface_hub/utils/endpoint_helpers.py in __init__(self, dataset_tag_dictionary)
        365             "license",
        366         ]
    --> 367         super().__init__(dataset_tag_dictionary, keys)
    
    /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/huggingface_hub/utils/endpoint_helpers.py in __init__(self, tag_dictionary, keys)
        298             keys = list(self._tag_dictionary.keys())
        299         for key in keys:
    --> 300             self._unpack_and_assign_dictionary(key)
        301 
        302     def _unpack_and_assign_dictionary(self, key: str):
    
    /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/huggingface_hub/utils/endpoint_helpers.py in _unpack_and_assign_dictionary(self, key)
        303         "Assignes nested attributes to `self.key` containing information as an `AttributeDictionary`"
        304         setattr(self, key, AttributeDictionary())
    --> 305         for item in self._tag_dictionary[key]:
        306             ref = getattr(self, key)
        307             item["label"] = (
    
    KeyError: 'multilinguality'
    

    System info

    - huggingface_hub version: 0.11.1
    - Platform: macOS-10.16-x86_64-i386-64bit
    - Python version: 3.9.6
    - Running in iPython ?: Yes
    - iPython shell: ZMQInteractiveShell
    - Running in notebook ?: Yes
    - Running in Google Colab ?: No
    - Token path ?: /Users/x/.huggingface/token
    - Has saved token ?: False
    - Configured git credential helpers: osxkeychain
    - FastAI: N/A
    - Tensorflow: 2.9.1
    - Torch: N/A
    - Jinja2: 3.0.1
    - Graphviz: N/A
    - Pydot: N/A
    
    bug 
    opened by animator 0
  • Refacto Repository tests

    Refacto Repository tests

    TL;DR:

    1. Sorry for huge diff :cry:
    2. Repository tests work on Windows (related to https://github.com/huggingface/huggingface_hub/pull/1112)
    3. Repository tests run in parallel
    4. Repository tests use /tmp dir
    5. Less redundancy

    First of all, sorry for the huge diff in the PR :cry: . I started it as a preliminary work for https://github.com/huggingface/huggingface_hub/pull/1112 (Windows CI):

    1. I started to refacto how paths are generated => no more "directory/whatever_file.txt" as it breaks on Windows
    2. While doing that, I removed the WORKING_DIR_FIXTURE to replace it by a proper /tmp folder => Avoid to have to clean it after tests (when tests were failing, I often ended up with untracked folders in my huggingface_hub folder that I had to delete)
    3. Then I read more the tests and realized we were doing a lot of redundant work (creating 1 new repo on the Hub for each test even though we do not modify it)
      1. I've split repository tests between TestRepositoryShared and TestRepositoryUniqueRepos. In shared test, a single repo is created on the Hub and cloned multiple times. No files are pushed to this repo during the tests. => Save quite some time of setup/teardown. In Unique test, a repo is created per test (same as before) so that they are independent from each other.
      2. I also added attributes like self.repo_id, self.repo_path and self.repo_url and an helper self.clone_from(...) to avoid reduncancy (e.g. recomputing repo_id=f"{USER}/{self.REPO_NAME}", local_dir="{WORKING_DIR}/{REPO_NAME}" all the time).
    4. And finally I removed some tests that were duplicated (*).

    => Now we have a Repository test suite that should be "iso" compared to before but running in parallel (~1min instead of 8) and on Windows.

    (*) For example in dataset tests, we are testing that cloning from a different repo type works. But also testing that we can commit, pull, push,... which is quite redundant because what we really want is to check if a repo can be cloned with a repo_type. All subsequent actions are already tested separately ("test only 1 feature at a time").

    (EDIT: "non repository" tests are failing but it's not due to this PR :confused:)

    opened by Wauplin 1
  • hf_hub_download call does not increase the download counter

    hf_hub_download call does not increase the download counter

    Describe the bug

    I have downloaded this model more than 100 times in the last week, but the counter shows 0 downloads: https://huggingface.co/fcakyon/yolov5s-v7.0

    Can you tell me which request triggers the download counter? Does hf_hub_download function trigger this counter?

    Reproduction

    Using yolov5 pypi package for downloading the model.

    Here is the download code: https://github.com/fcakyon/yolov5-pip/blob/main/yolov5/utils/downloads.py#L131-L145

    Logs

    No response

    System info

    I have downloaded this model from windows 10 and ubuntu 18.04 with huggingface=hub==0.11.1.
    
    bug 
    opened by fcakyon 7
Releases(v0.11.1)
  • v0.11.1(Nov 28, 2022)

    Hot-fix to fix permission issues when downloading with hf_hub_download or snapshot_download. For more details, see https://github.com/huggingface/huggingface_hub/pull/1220, https://github.com/huggingface/huggingface_hub/issues/1141 and https://github.com/huggingface/huggingface_hub/issues/1215.

    Full changelog: https://github.com/huggingface/huggingface_hub/compare/v0.11.0...v0.11.1

    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(Nov 14, 2022)

    New features and improvements for HfApi

    HfApi is the central point to interact with the Hub API (manage repos, create commits,...). The goal is to propose more and more git-related features using HTTP endpoints to allow users to interact with the Hub without cloning locally a repo.

    Create/delete tags and branches

    from huggingface_hub import create_branch, create_tag, delete_branch, delete_tag
    
    create_tag(repo_id, tag="v0.11", tag_message="Release v0.11")
    delete_tag(repo_id, tag="something") # If you created a tag by mistake
    
    create_branch(repo_id, branch="experiment-154")
    delete_branch(repo_id, branch="experiment-1") # Clean some old branches
    
    • Add a create_tag method to create tags from the HTTP endpoint by @Wauplin in #1089
    • Add delete_tag method to HfApi by @Wauplin in #1128
    • Create tag twice doesn't work by @Wauplin in #1149
    • Add "create_branch" and "delete_branch" endpoints by @Wauplin #1181

    Upload lots of files in a single commit

    Making a very large commit was previously tedious. Files are now processed by chunks which makes it possible to upload 25k files in a single commit (and 1Gb payload limitation if uploading only non-LFS files). This should make it easier to upload large datasets.

    • Create commit by streaming a ndjson payload (allow lots of file in single commit) by @Wauplin in #1117

    Delete an entire folder

    from huggingface_hub import CommitOperationDelete, create_commit, delete_folder
    
    # Delete a single folder
    delete_folder(repo_id=repo_id, path_in_repo="logs/")
    
    # Alternatively, use the low-level `create_commit`
    create_commit(
        repo_id,
        operations=[
            CommitOperationDelete(path_in_repo="old_config.json") # Delete a file
            CommitOperationDelete(path_in_repo="logs/") # Delete a folder
        ],
        commit_message=...,
    )
    
    • Delete folder with commit endpoint by @Wauplin in #1163

    Support pagination when listing repos

    In the future, listing models, datasets and spaces will be paginated on the Hub by default. To avoid breaking changes, huggingface_hub follows already pagination. Output type is currently a list (deprecated), will become a generator in v0.14.

    • Add support for pagination in list_models list_datasets and list_spaces by @Wauplin #1176
    • Deprecate output in list_models by @Wauplin in #1143

    Misc

    • Allow create PR against non-main branch by @Wauplin in #1168
    • 1162 Reorder operations correctly in commit endpoint by @Wauplin in #1175

    Login, tokens and authentication

    Authentication has been revisited to make it as easy as possible for the users.

    Unified login and logout methods

    from huggingface_hub import login, logout
    
    # `login` detects automatically if you are running in a notebook or a script
    # Launch widgets or TUI accordingly
    login()
    
    # Now possible to login with a hardcoded token (non-blocking)
    login(token="hf_***")
    
    # If you want to bypass the auto-detection of `login`
    notebook_login()  # still available
    interpreter_login()  # to login from a script
    
    # Logout programmatically
    logout()
    
    # Still possible to login from CLI
    huggingface-cli login
    
    • Unified login/logout methods by @Wauplin in #1111

    Set token only for a HfApi session

    from huggingface_hub import HfApi
    
    # Token will be sent in every request but not stored on machine
    api = HfApi(token="hf_***")
    
    • Add token attribute to HfApi by @Wauplin in #1116

    Stop using use_auth_token in favor of token, everywhere

    token parameter can now be passed to every method in huggingface_hub. use_auth_token is still accepted where it previously existed but the mid-term goal (~6 months) is to deprecate and remove it.

    • Replace use_auth_token arg by token everywhere by @Wauplin in #1122

    Respect git credential helper from the user

    Previously, token was stored in the git credential store. Can now be in any helper configured by the user -keychain, cache,...-.

    • Refactor git credential handling in login workflow by @Wauplin in #1138

    Better error handling

    Helper to dump machine information

    # Dump all relevant information. To be used when reporting an issue.
    ➜ huggingface-cli env
    
    Copy-and-paste the text below in your GitHub issue.
    
    - huggingface_hub version: 0.11.0.dev0
    - Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.35
    - Python version: 3.10.6
    ...
    
    • 1173 Add dump env helper by @Wauplin in #1177

    Misc

    • Cache not found is not an error by @singingwolfboy in #1101
    • Propagate error messages when multiple on BadRequest by @Wauplin in #1115
    • Add error message from x-error-message header if exists by @Wauplin in #1121

    Modelcards

    Few improvements/fixes in the modelcard module:

    • :art: make repocard content a property by @nateraw in #1147
    • :white_check_mark: fix content string in repocard tests by @nateraw in #1155
    • Add Hub verification token to evaluation metadata by @lewtun in #1142
    • Use default model_name in metadata_update by @lvwerra in #1157
    • Refer to modelcard creator app from doc by @Wauplin in #1184
    • Parent Model --> Finetuned from model by @meg-huggingface #1191
    • FIX overwriting metadata when both verified and unverified reported values by @Wauplin in #1186

    Cache assets

    New feature to provide a path in the cache where any downstream library can store assets (processed data, files from the web, extracted data, rendered images,...)

    • [RFC] Proposal for a way to cache files in downstream libraries by @Wauplin in #1088

    Documentation updates

    • Fixing a typo in the doc. by @Narsil in #1113
    • Fix docstring of list_datasets by @albertvillanova in #1125
    • Add repo_type=dataset possibility to guide by @Wauplin in #1134
    • Fix PyTorch & Keras mixin doc by @lewtun in #1139
    • Update how-to-manage.mdx by @severo in #1150
    • Typo fix by @meg-huggingface in #1166
    • Adds link to model card metadata spec by @meg-huggingface in #1171
    • Removing "Related Models" & just asking for "Parent Model" by @meg-huggingface in #1178

    Breaking changes

    • Cannot provide an organization to create_repo
    • identical_ok removed in upload_file
    • Breaking changes in arguments for validate_preupload_info, prepare_commit_payload, _upload_lfs_object (internal helpers for the commit API)
    • huggingface_hub.snapshot_download is not exposed as a public module anymore

    Deprecations

    • Remove deprecated code from v0.9, v0.10 and v0.11 by @Wauplin in #1092
    • Rename languages to langage + remove duplicate code in tests by @Wauplin in #1169
    • Deprecate output in list_models by @Wauplin in #1143
    • Set back feature to create a repo when using clone_from by @Wauplin in #1187

    Internal

    • Configure pytest to run on staging by default + flags in config by @Wauplin in #1093
    • fix search models test by @Wauplin in #1106
    • Add mypy in the CI (and fix existing type issues) by @Wauplin in #1097
    • Fix deprecation warnings for assertEquals in tests by @Wauplin in #1135
    • Skip failing test in ci by @Wauplin in #1148
    • :green_heart: fix mypy ci by @nateraw in #1167
    • Update pr docs actions by @mishig25 in #1170
    • Revert "Update pr docs actions" by @mishig25 #1192

    Bugfixes & small improvements

    • Expose list_spaces by @osanseviero in #1132
    • respect NO_COLOR env var by @singingwolfboy in #1103
    • Fix list_models bool parameters by @Wauplin in #1152
    • FIX url encoding in hf_hub_url by @Wauplin in #1164
    • Fix cannot create pr on foreign repo by @Wauplin #1183
    • Fix HfApi.move_repo(...) and complete tests by @Wauplin in #1136
    • Commit empty files as regular and warn user by @Wauplin in #1180
    • Parse file size in get_hf_file_metadata by @Wauplin #1179
    • Fix get file size on lfs by @Wauplin #1188
    • More robust create relative symlink in cache by @Wauplin in #1109
    • Test running CI on Python 3.11 #1189
    Source code(tar.gz)
    Source code(zip)
  • v0.10.1(Oct 11, 2022)

    Hot-fix to force utf-8 encoding in modelcards. See https://github.com/huggingface/huggingface_hub/pull/1102 and https://github.com/skops-dev/skops/pull/162#issuecomment-1263516507 for context.

    Full Changelog: https://github.com/huggingface/huggingface_hub/compare/v0.10.0...v0.10.1

    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Sep 28, 2022)

    Modelcards

    Contribution from @nateraw to integrate the work done on Modelcards and DatasetCards (from nateraw/modelcards) directly in huggingface_hub.

    >>> from huggingface_hub import ModelCard
    
    >>> card = ModelCard.load('nateraw/vit-base-beans')
    >>> card.data.to_dict()
    {'language': 'en', 'license': 'apache-2.0', 'tags': ['generated_from_trainer', 'image-classification'],...}
    

    Related commits

    • Add additional repo card utils from modelcards repo by @nateraw in #940
    • Add regression test for empty modelcard update by @Wauplin in #1060
    • Add template variables to dataset card template by @nateraw in #1068
    • Further clarifying Model Card sections by @meg-huggingface in #1052
    • Create modelcard if doesn't exist on update_metadata by @Wauplin in #1061

    Related documentation

    Cache management (huggingface-cli scan-cache and huggingface-cli delete-cache)

    New commands in huggingface-cli to scan and delete parts of the cache. Goal is to manage the cache-system the same way for any dependent library that uses huggingface_hub. Only the new cache-system format is supported.

    ➜ huggingface-cli scan-cache
    REPO ID                     REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS                LOCAL PATH
    --------------------------- --------- ------------ -------- ------------- ------------- ------------------- -------------------------------------------------------------------------
    glue                        dataset         116.3K       15 4 days ago    4 days ago    2.4.0, main, 1.17.0 /home/wauplin/.cache/huggingface/hub/datasets--glue
    google/fleurs               dataset          64.9M        6 1 week ago    1 week ago    refs/pr/1, main     /home/wauplin/.cache/
    (...)
    
    Done in 0.0s. Scanned 6 repo(s) for a total of 3.4G.
    Got 1 warning(s) while scanning. Use -vvv to print details.
    

    Related commits

    • Feature: add an utility to scan cache by @Wauplin in #990
    • Utility to delete revisions by @Wauplin in #1035
    • 1025 add time details to scan cache by @Wauplin in #1045
    • Fix scan cache failure when cached snapshot is empty by @Wauplin in #1054
    • 1025 huggingface-cli delete-cache command by @Wauplin in #1046
    • Sort repos/revisions by age in delete-cache by @Wauplin in #1063

    Related documentation

    Better error handling (and http-related stuff)

    HTTP calls to the Hub have been harmonized to behave the same across the library.

    Major differences are:

    • Unified way to handle HTTP errors using hf_raise_for_status (more informative error message)
    • Auth token is always sent by default when a user is logged in (see documentation).
    • package versions are sent as user-agent header for telemetry (python, huggingface_hub, tensorflow, torch,...). It was already the case for hf_hub_download.

    Related commits

    • Always send the cached token when user is logged in by @Wauplin in #1064
    • Add user agent to all requests with huggingface_hub version (and other) by @Wauplin in #1075
    • [Repository] Add better error message by @patrickvonplaten in #993
    • Clearer HTTP error messages in huggingface_hub by @Wauplin in #1019
    • Handle backoff on HTTP 503 error when pushing repeatedly by @Wauplin in #1038

    Breaking changes

    1. For consistency, the return type of create_commit has been modified. This is a breaking change, but we hope the return type of this method was never used (quite recent and niche output type).
    • Return more information in create_commit output by @Wauplin in #1066
    1. Since repo_id is now validated using @validate_hf_hub_args (see below), a breaking change can be caused if repo_id was previously miused. A HFValidationError is now raised if repo_id is not valid.

    Miscellaneous improvements

    Add support for autocomplete

    • Add autocomplete + tests + type checking by @Wauplin in #1041

    http-based push_to_hub_fastai

    • Add changes for push_to_hub_fastai to use the new http-based approach. by @nandwalritik in #1040

    Check if a file is cached

    • try_to_load_from_cache returns cached non-existence by @sgugger in #1039

    Get file metadata (commit hash, etag, location) without downloading

    • Add get_hf_file_metadata to fetch metadata from the Hub by @Wauplin in #1058

    Validate arguments using @validate_hf_hub_args

    • Add validator for repo id + decorator to validate arguments in huggingface_hub by @Wauplin in #1029
    • Remove repo_id validation in hf_hub_url and hf_hub_download by @Wauplin in #1031

    :warning: This is a breaking change if repo_id was previously misused :warning:

    Related documentation:

    Documentation updates

    • Fix raise syntax: remove markdown bullet point by @mishig25 in #1034
    • docs render tree correctly by @mishig25 in #1070

    Deprecations

    • ENH Deprecate clone_from behavior by @merveenoyan in #952
    • πŸ—‘ Deprecate token in read-only methods of HfApi in favor of use_auth_token by @SBrandeis in #928
    • Remove legacy helper 'install_lfs_in_userspace' by @Wauplin in #1059
    • 1055 deprecate private and repo type in repository class by @Wauplin in #1057

    Bugfixes & small improvements

    • Consider empty subfolder as None in hf_hub_url and hf_hub_download by @Wauplin in #1021
    • enable http request retry under proxy by @MrZhengXin in #1022
    • Add securityStatus to ModelInfo object with default value None. by @Wauplin in #1026
    • πŸ‘½οΈ Add size parameter for lfsFiles when committing on the hub by @coyotte508 in #1048
    • Use /models/ path for api call to update settings by @Wauplin in #1049
    • Globally set git credential.helper to store in google colab by @Wauplin in #1053
    • FIX notebook login by @Wauplin in #1073

    Windows-specific bug fixes

    • Fix default cache on windows by @thomwolf in #1069
    • Degraded but fully working cache-system when symlinks are not supported by @Wauplin in #1067
    • Check symlinks support per directory instead of globally by @Wauplin in #1077
    Source code(tar.gz)
    Source code(zip)
  • v0.9.1(Aug 25, 2022)

    Hot-fix error message on gated repositories (https://github.com/huggingface/huggingface_hub/pull/1015).

    Context: https://huggingface.co/CompVis/stable-diffusion-v1-4 has been widely shared in the last days but since it's a gated-repo, lots of users are getting confused by the Authentification error received. Error message is now more detailed.

    Full Changelog: https://github.com/huggingface/huggingface_hub/compare/v0.9.0...v0.9.1

    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Aug 23, 2022)

    Community API

    Huge work to programmatically interact with the community tab, thanks to @SBrandeis ! It is now possible to:

    • Manage discussions (create_discussion, create_pull_request, merge_pull_request, change_discussion_status, rename_discussion)
    • Comment on them (comment_discussion, edit_discussion_comment)
    • List them (get_repo_discussions, get_discussion_details)

    See full documentation for more details.

    • ✨ Programmatic API for the community tab by @SBrandeis in #930

    HTTP-based push_to_hub mixins

    push_to_hub mixin and push_to_hub_keras have been refactored to leverage the http-endpoint. This means pushing to the hub will no longer require to first download the repo locally. Previous git-based version is planned to be supported until v0.12.

    • Push to hub mixins that do not leverage git by @LysandreJik in #847

    Miscellaneous API improvements

    • ✨ parent_commit argument for create_commit and related functions by @SBrandeis in #916
    • Add a helpful error message when commit_message is empty in create_commit by @sgugger in #962
    • ✨ create_commit: more user-friendly errors on HTTP 400 by @SBrandeis in #963
    • ✨ Add files_metadata option to repo_info by @SBrandeis in #951
    • Add list_spaces to HfApi by @cakiki in #889

    Miscellaneous helpers (advanced)

    Filter which files to upload in upload_folder

    • Allowlist and denylist when uploading a folder by @Wauplin in #994

    Non-existence of files in a repo is now cached

    • Cache non-existence of files or completeness of repo by @sgugger in #986

    Progress bars can be globally disabled via the HF_HUB_DISABLE_PROGRESS_BARS env variable or using disable_progress_bars/enable_progress_bars helpers.

    • Add helpers to disable progress bars globally + tests by @Wauplin in #987

    Use try_to_load_from_cache to check if a file is locally cached

    • Add utility to load files from cache by @sgugger in #980

    Documentation updates

    • [Doc] Update "Download files from the Hub" doc by @julien-c in #948
    • Docs: Fix some missing images and broken links by @NimaBoscarino in #936
    • Replace upload_file with upload_folder in upload_folder docstring by @mariosasko in #927
    • Clarify upload docs by @stevhliu in #944

    Bugfixes & small improvements

    • Handle redirections in hf_hub_download for a renamed repo by @Wauplin in #983
    • PR Make path_in_repo optional in upload folder by @Wauplin in #988
    • Use a finer exception when local_files_only=True and a file is missing in cache by @Wauplin in #985
    • use fixes JSONDecodeError by @Wauplin in #974
    • πŸ› Fix PR creation for a repo the user does not own by @SBrandeis in #922
    • login: tiny messaging tweak by @julien-c in #964
    • Display endpoint URL in whoami command by @juliensimon in #895
    • Small orphaned tweaks from #947 by @julien-c in #958
    • FIX LFS track fix for Hub Mixin by @merveenoyan in #919
    • :bug: fix multilinguality test and example by @nateraw in #941
    • Fix custom handling of refined HTTPError by @osanseviero in #924
    • Followup to #901: Tweak repocard_types.py by @julien-c in #931
    • [Keras Mixin] - Flattening out nested configurations for better table parsing. by @ariG23498 in #914
    • [Keras Mixin] Rendering the Hyperparameter table vertically by @ariG23498 in #917

    Internal

    • Disable codecov + configure pytest FutureWarnings by @Wauplin in #976
    • Enable coverage in CI by @Wauplin in #992
    • Enable flake8 on W605 by @Wauplin in #975
    • Enable flake8-bugbear + adapt existing codebase by @Wauplin in #967
    • Test that TensorFlow is not imported on startup by @lhoestq in #904
    • Pin black to 22.3.0 to benefit from a stable --preview flag by @LysandreJik in #934
    • Update dev version by @gante in #921
    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Jun 15, 2022)

    Git-aware cache file layout

    v0.8.1 introduces a new way of caching files from the Hugging Face Hub, to two methods: snapshot_download and hf_hub_download. The new approach is extensively documented in the Documenting files guide and we recommend checking it out to get a better understanding of how caching works.

    • New git-aware cache file layout by @julien-c in #801

    New create_commit API

    A new create_commit API allows users to upload and delete several files at once using HTTP-based methods. You can read more about it in this guide. The following convenience methods were also introduced:

    • upload_folder: Allows uploading a local directory to a repo.
    • delete_file allows deleting a single file from a repo.

    upload_file now uses create_commit under the hood.

    create_commit also allows creating pull requests with a create_pr=True flag.

    None of the methods rely on Git locally.

    • New create_commit API by @SBrandeis in #888

    Lazy loading

    All modules will now be lazy-loaded. This should drastically reduce the time it takes to import huggingface_hub as it will no longer load all soft dependencies.

    • ENH lazy load modules in the root init by @adrinjalali in #874

    Improvements and bugfixes

    • Add request ID to all requests by @LysandreJik in #909
    • Remove deprecations by @LysandreJik in #910
    • FIX Avoid creating repository when it exists on remote by @merveenoyan in #900
    • πŸ— Use hub-ci for tests by @SBrandeis in #898
    • Refine 404 errors by @LysandreJik in #878
    • Fix typo by @lsb in #902
    • FIX metadata_update: work on a copy of the upstream file, to not mess up the cache by @julien-c in #891
    • ENH Removed history writing in Keras model card by @merveenoyan in #876
    • CI enable codecov by @adrinjalali in #893
    • MNT deprecate imports from snapshot_download by @adrinjalali in #880
    • Pushback deprecation for v0.7 release by @LysandreJik in #882
    • FIX make import machinary private by @adrinjalali in #879
    • ENH Keras Use table instead of dictionary for hyperparameters in model card by @merveenoyan in #877
    • Invert deprecation for create_repo in #912
    • Constant was accidentally removed during deprecation transition in #913
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(May 30, 2022)

    Repocard metadata

    This PR adds a metadata_update function that allows the user to update the metadata in a repository on the hub. The function accepts a dict with metadata (following the same pattern as the YAML in the README) and behaves as follows for all top level fields except model-index.

    Examples:

    Starting from

    existing_results = [{
        'dataset': {'name': 'IMDb', 'type': 'imdb'},
        'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
         'task': {'name': 'Text Classification', 'type': 'text-classification'}
    }]
    

    1. Overwrite existing metric value in existing result

    new_results = deepcopy(existing_results)
    new_results[0]["metrics"][0]["value"] = 0.999
    _update_metadata_model_index(existing_results, new_results, overwrite=True)
    
    [{'dataset': {'name': 'IMDb', 'type': 'imdb'},
      'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.999}],
      'task': {'name': 'Text Classification', 'type': 'text-classification'}}]
    

    2. Add new metric to existing result

    new_results = deepcopy(existing_results)
    new_results[0]["metrics"][0]["name"] = "Recall"
    new_results[0]["metrics"][0]["type"] = "recall"
    
    [{'dataset': {'name': 'IMDb', 'type': 'imdb'},
      'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995},
                  {'name': 'Recall', 'type': 'recall', 'value': 0.995}],
      'task': {'name': 'Text Classification', 'type': 'text-classification'}}]
    

    3. Add new result

    new_results = deepcopy(existing_results)
    new_results[0]["dataset"] = {'name': 'IMDb-2', 'type': 'imdb_2'}
    
    [{'dataset': {'name': 'IMDb', 'type': 'imdb'},
      'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
      'task': {'name': 'Text Classification', 'type': 'text-classification'}},
     {'dataset': ({'name': 'IMDb-2', 'type': 'imdb_2'},),
      'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
      'task': {'name': 'Text Classification', 'type': 'text-classification'}}]
    
    • ENH Add update metadata to repocard by @lvwerra in #844

    Improvements and bug fixes

    • Keras: Saving history in a JSON file by @merveenoyan in #861
    • space after uri by @leondz in #866
    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(May 9, 2022)

    Disclaimer: This release was initially released with advertised support for #844. It was not released in this release and will be in v0.7.

    fastai support

    v0.6.0 introduces downstream (download) and upstream (upload) support for the fastai libraries. It supports fastai versions above 2.4. The integration is detailed in the following blog.

    • Add fastai upstream and downstream capacities for fastai>=2.4 and fastcore>=1.3.27 versions by @omarespejel in #678

    Automatic binary file tracking in Repository

    Binary files are now rejected by default by the Hub. v0.6.0 introduces automatic binary file tracking through the auto_lfs_track argument of the Repository.git_add method. It also introduces the Repository.auto_track_binary_files method which can be used independently of other methods.

    • ENH Auto track binary files in Repository by @LysandreJik in #828

    skip_lfs_file is now added to mixins

    The parameter skip_lfs_files is now added to the different mixins. This will enable pushing files to the hub without first downloading the files above 10MB. This should drammatically reduce the time needed when updating a modelcard, a configuration file, and others.

    • :sparkles: add skip_lfs_files to mixins' push_to_hub by @nateraw in #858

    Keras support improvement

    The support for Keras model is greatly improved through several additions:

    • The save_pretrained_keras method now accepts a list of tags that will automatically be added to the repository.
    • Download statistics are now available on Keras models
    • Introducing list of tags to Keras model card by @merveenoyan in #806
    • Enable keras download stats by @merveenoyan in #860

    Bugfixes and improvements

    • FIX don't raise if name/organizaiton are passed postionally by @adrinjalali in #822
    • ENH Use provided token from HUGGING_FACE_HUB_TOKEN env variable if available by @FrancescoSaverioZuppichini in #794
    • tests(hf_api): remove infectionTypes field by @McPatate in #834
    • Remove docs, tasks and inference API from huggingface_hub by @osanseviero in #833
    • FEAT Uniformize hf_api a bit and add support for Spaces by @julien-c in #792
    • Add a bug report template by @osanseviero in #832
    • clean up formatting by @stevhliu in #839
    • Release guide by @LysandreJik in #820
    • Fix keras test by @osanseviero in #855
    • DOC Add quick start guide by @stevhliu in #850
    • MNT refactor: subprocess.run -> run_subprocess by @LysandreJik in #352
    • MNT enable preview on black by @adrinjalali in #849
    • Update how to guides by @stevhliu in #840
    • Update contribution guide for merging PRs by @stevhliu in #856
    • DOC Update landing page by @stevhliu in #854
    • space after uri by @leondz in #866
    Source code(tar.gz)
    Source code(zip)
  • v0.5.1(Apr 7, 2022)

    This is a patch release fixing a breaking backward compatibility issue.

    Linked PR: https://github.com/huggingface/huggingface_hub/pull/822

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Apr 7, 2022)

    Documentation

    Version v0.5.0 is the first version which features an API reference. It is still a work in progress with features lacking, some images not rendering, and a documentation reorg coming up, but should already provide significantly simpler access to the huggingface_hub API.

    The documentation is visible here.

    • API reference documentation by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/782
    • [API Reference docs] Remove git references from GitHub Action templates by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/813
    • DOC API docstring improvements by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/731

    Model & datasets list improvements

    The list_models and list_datasets methods have been improved in several ways.

    List private models

    These two methods now accept the token keyword to specify your token. Specifying the token will include your private models and datasets in the returned list.

    • Support list_models and list_datasets with token arg by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/638

    Modelcard metadata

    These two methods now accept the cardData boolean argument. If set to True, the modelcard metadata will also be returned when using these two methods.

    • Include cardData in list_models and list_datasets by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/639

    Filtering by carbon emissions

    The list_models method now also accepts an emissions_trehsholds parameter to filter by carbon emissions.

    • Enable filtering by carbon emission by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/668

    Keras improvements

    The Keras serialization and upload methods have been worked on to provide better support for models:

    • All parameters are now included in the saved model when using push_to_hub_keras
    • log_dir parameter for TensorBoard logs, which will automatically spawn a TensorBoard instance on the Hub.
    • Automatic model card
    • Introduce include_optimizer parameter to push_to_hub_keras() by @merveenoyan in https://github.com/huggingface/huggingface_hub/pull/616
    • Add TensorBoard for Keras models by @merveenoyan in https://github.com/huggingface/huggingface_hub/pull/651
    • Create Automatic Keras model card by @merveenoyan in https://github.com/huggingface/huggingface_hub/pull/679
    • Allow TensorBoard Override for same Repository by @merveenoyan in https://github.com/huggingface/huggingface_hub/pull/709
    • Add tempfile for tensorboard logs in tensorboard tests in test_keras_integration.py by @merveenoyan in https://github.com/huggingface/huggingface_hub/pull/761

    Contributing guide

    A contributing guide is now available for the huggingface_hub repository. For any and all information related to contributing to the repository, please check it out!

    Read more about it here: CONTRIBUTING.md.

    Pre-commit hooks

    The huggingface_hub GitHub repository has several checks to ensure that the code respects code quality standards. Opt-in pre-commit hooks have been added in order to make it simpler for contributors to leverage them.

    Read more about it in the aforementionned CONTRIBUTING guide.

    • MNT Add pre-commit hooks by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/807

    Renaming and transferring repositories

    Repositories can now be renamed and transferred programmatically using move_repo.

    • Allow renaming and transferring repos programmatically by @osanseviero in https://github.com/huggingface/huggingface_hub/pull/704

    Breaking changes & deprecation

    β›” The following methods have now been removed following a deprecation cycle

    list_repos_objs

    The list_repos_objs and the accompanying CLI utility huggingface-cli repo ls-files have been removed. The same can be done using the model_info and dataset_info methods.

    • Remove deprecated list_repos_objs and huggingface-cli repo ls-files by @julien-c in https://github.com/huggingface/huggingface_hub/pull/702

    Python 3.6

    Python 3.6 support is now dropped as end of life. Using Python 3.6 and installing huggingface_hub will result in version v0.4.0 being installed.

    • CI support python 3.7-3.10 - remove 3.6 support by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/790

    ⚠️ Items below are now deprecated and will be removed in a future version

    • API deprecate positional args in file_download and hf_api by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/745
    • MNT deprecate name and organization in favor of repo_id by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/733

    What's Changed

    • Include "model" in repo_type to keep consistency by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/620
    • Hotfix for repo_type by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/623
    • fix: typo in docstring by @ariG23498 in https://github.com/huggingface/huggingface_hub/pull/647
    • {upload|delete}_file: Remove client-side filename validation by @SBrandeis in https://github.com/huggingface/huggingface_hub/pull/669
    • Ensure post_method is only executed once by @sgugger in https://github.com/huggingface/huggingface_hub/pull/676
    • Remove paying subscription mention from docstring by @cakiki in https://github.com/huggingface/huggingface_hub/pull/653
    • Improve tests and logging by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/682
    • docs(links): Update settings/token to settings/tokens by @ronvoluted in https://github.com/huggingface/huggingface_hub/pull/699
    • Add support for private hub by @juliensimon in https://github.com/huggingface/huggingface_hub/pull/703
    • Add retry_endpoint for test stability by @osanseviero in https://github.com/huggingface/huggingface_hub/pull/719
    • FIX fix a bug in _filter_emissions to accept numbers w/o decimal and dict emissions by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/753
    • Logging fix for hf_api, logging documentation by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/748
    • Contributing guide & code of conduct by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/692
    • Fix pytorch and tensorflow python matrix by @osanseviero in https://github.com/huggingface/huggingface_hub/pull/760
    • MNT add links to related projects and the forum on issue template by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/773
    • Note on the README by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/772
    • Remove autoreviewers by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/793
    • CI Error on FutureWarning by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/787
    • MNT more informative message on error in Hf.Api.delete_repo by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/783
    • Add security status by @McPatate in https://github.com/huggingface/huggingface_hub/pull/654
    • Remove redundant part of security test by @osanseviero in https://github.com/huggingface/huggingface_hub/pull/802
    • Changed test repository names to fix tests by @merveenoyan in https://github.com/huggingface/huggingface_hub/pull/803
    • TST calling delete_repo under tempfile for fixing the test by @merveenoyan in https://github.com/huggingface/huggingface_hub/pull/804
    • Disable logging in with organization token by @merveenoyan in https://github.com/huggingface/huggingface_hub/pull/780
    • MNT change dev version to 0.5, 0.4 is already released by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/810
    • πŸ‘¨β€πŸ’» Configure HF Hub URL with environment variable by @SBrandeis in https://github.com/huggingface/huggingface_hub/pull/815
    • MNT support oder requests versions by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/817
    • Rename the env variable HF_ENDPOINT. by @Narsil in https://github.com/huggingface/huggingface_hub/pull/819

    New Contributors

    • @McPatate made their first contribution in https://github.com/huggingface/huggingface_hub/pull/583
    • @FremyCompany made their first contribution in https://github.com/huggingface/huggingface_hub/pull/606
    • @simoninithomas made their first contribution in https://github.com/huggingface/huggingface_hub/pull/633
    • @mlonaws made their first contribution in https://github.com/huggingface/huggingface_hub/pull/630
    • @ariG23498 made their first contribution in https://github.com/huggingface/huggingface_hub/pull/647
    • @J-Petiot made their first contribution in https://github.com/huggingface/huggingface_hub/pull/660
    • @ronvoluted made their first contribution in https://github.com/huggingface/huggingface_hub/pull/699
    • @juliensimon made their first contribution in https://github.com/huggingface/huggingface_hub/pull/703
    • @allendorf made their first contribution in https://github.com/huggingface/huggingface_hub/pull/742
    • @frgfm made their first contribution in https://github.com/huggingface/huggingface_hub/pull/747
    • @hbredin made their first contribution in https://github.com/huggingface/huggingface_hub/pull/688

    Full Changelog: https://github.com/huggingface/huggingface_hub/compare/v0.4.0...v0.5.0

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jan 26, 2022)

    Tag listing

    • Introduce Tag Listing by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/537

    This PR introduces the ability to fetch all available tags for models or datasets and returns them as a nested namespace object, for example:

    >>> from huggingface_hub import HfApi
    
    >>> api = HfApi() 
    >>> tags = api.get_model_tags()
    >>> print(tags)
    Available Attributes:
     * benchmark
     * language_creators
     * languages
     * licenses
     * multilinguality
     * size_categories
     * task_categories
     * task_ids
    
    >>> print(tags.benchmark)
    Available Attributes:
     * raft
     * superb
     * test
    

    Namespace objects

    • Namespace Objects for Search Parameters by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/556

    With a goal of adding more tab-completion to the library, this PR introduces two objects:

    • DatasetSearchArguments
    • ModelSearchArguments

    These two AttributeDictionary objects contain all the valid information we can extract from a model as tab-complete parameters. We also include the author_or_organization and dataset (or model) _name as well through careful string splitting.

    Model Filter

    • Implement a Model Filter class by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/553

    This PR introduces a new way to search the hub: the ModelFilter class.

    It is a simple Enum at first to the user, allowing them to specify what they want to search for, such as:

    f = ModelFilter(author="microsoft", model_name="wavlm-base-sd", framework="pytorch")
    

    From there, they can pass in this filter to the new list_models_by_filter function in HfApi to search through it:

    models = api.list_modes(filter=f)
    

    The API may then be used for complex queries:

    args = ModelSearchArguments()
    f = ModelFilter(framework=[args.library.pytorch, args.library.TensorFlow], model_name="bert", tasks=[args.pipeline_tag.Summarization, args.pipeline_tag.TokenClassification])
    
    api.list_models_from_filter(f)
    

    Ignoring filenames in snapshot_download

    This PR introduces a way to limit the files that will be fetched by the snapshot_download. This is useful when you want to download and cache an entire repository without using git, and that you want to skip files according to their filenames.

    • [Snapshot download] allow some filenames to be ignored by @patrickvonplaten in https://github.com/huggingface/huggingface_hub/pull/566

    What's Changed

    • [Hotfix][API] card_data => cardData on /api/datasets by @julien-c in https://github.com/huggingface/huggingface_hub/pull/530
    • Fix the progress bars when cloning a repository by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/517
    • Update Hugging Face Hub documentation README and Endpoints by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/527
    • Convert string functions to f-string by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/536
    • Fixing FS for espnet. by @Narsil in https://github.com/huggingface/huggingface_hub/pull/542
    • [snapshot_download] upgrade to canonical separator by @julien-c in https://github.com/huggingface/huggingface_hub/pull/545
    • Add test directions by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/547
    • [HOTFIX] Change test for missing_input to reflect back-end redirect changes by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/552
    • Bring consistency to download and upload APIs by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/574
    • Search by authors and string by @FrancescoSaverioZuppichini in https://github.com/huggingface/huggingface_hub/pull/531
    • Quick typo by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/575

    New Contributors

    • @kahne made their first contribution in https://github.com/huggingface/huggingface_hub/pull/569
    • @FrancescoSaverioZuppichini made their first contribution in https://github.com/huggingface/huggingface_hub/pull/531

    Full Changelog: https://github.com/huggingface/huggingface_hub/compare/v0.2.1...v0.4.0

    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Jan 26, 2022)

    This is a patch release fixing an issue with the notebook login.

    https://github.com/huggingface/huggingface_hub/commit/5e2da9bae95ed4c99683e9572ecc32c9e0da5e15#diff-fb1696cbcf008dd89dde5e8c1da9d4be5a8f7d809bc32f07d4453caba40df15f

    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Jan 26, 2022)

    Access tokens

    Version v0.2.0 introduces the access token compatibility with the hub. It offers the access tokens as the main login handler, with the possibility to still login with username/password when doing [Ctrl/CMD]+C on the login prompt:

    image

    The notebook login is adapted to work with the access tokens.

    Skipping large files

    The Repository class now has an additional parameter, skip_lfs_files, which allows cloning the repository while skipping the large file download.

    https://github.com/huggingface/huggingface_hub/pull/472

    Local files only for snapshot_download

    The snapshot_download method can now take local_files_only as a parameter to enable leveraging previously downloaded files.

    https://github.com/huggingface/huggingface_hub/pull/505

    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Nov 9, 2021)

    What's Changed

    • clean_ok should be True by default by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/462

    Full Changelog: https://github.com/huggingface/huggingface_hub/compare/v0.1.1...v0.1.2

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Nov 5, 2021)

    What's Changed

    • Fix typing-extensions minimum version by @lhoestq in https://github.com/huggingface/huggingface_hub/pull/453
    • Fix argument order in create_repo for Repository.clone_from by @sgugger in https://github.com/huggingface/huggingface_hub/pull/459

    Full Changelog: https://github.com/huggingface/huggingface_hub/compare/v0.1.0...v0.1.1

    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Nov 2, 2021)

    What's Changed

    Version v0.1.0 is the first minor release of the huggingface_hub package, which promises better stability for the incoming versions. This update comes with big quality of life improvements.

    Make token optional in all HfApi methods. by @sgugger in https://github.com/huggingface/huggingface_hub/pull/379

    Previously, most methods of the HfApi class required the token to be explicitly passed. This is changed in this version, where it defaults to the token stored in the cache. This results in a re-ordering of arguments, but backward compatibility is preserved in most cases. Where it is not preserved, an explicit error is thrown.

    Root methods instead of HfApi by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/388

    The HfApi class now exposes its methods through the hf_api file, reducing the friction to access these helpers. See the example below:

    # Previously
    from huggingface_hub import HfApi
    
    api = HfApi()
    user = api.whoami()
    
    # Now
    from huggingface_hub.hf_api import whoami
    
    user = whoami()
    

    The HfApi can still be imported and works as before for backward compatibility.

    Add list_repo_files util by @sgugger in https://github.com/huggingface/huggingface_hub/pull/395

    Offers a list_repo_files to ... list the repo files! Supports both model repositories and dataset repositories

    Add helper to generate an eval result model-index, with proper typing by @julien-c in https://github.com/huggingface/huggingface_hub/pull/382

    Offers a metadata_eval_result in order to generate a YAML block to put in model cards according to evaluation results.

    Add metrics to API by @mariosasko in https://github.com/huggingface/huggingface_hub/pull/429

    Adds a list_metrics method to HfApi!

    Git prune by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/450

    Adds a git_prune method to the Repository class. This prunes local files which are unneeded as already pushed to a remote. It adds the argument auto_lfs_prune to git_push and the commit context-manager for simpler handling.

    Bug fixes

    • Fix HfApi.create_repo when repo_type is 'space' by @nateraw in https://github.com/huggingface/huggingface_hub/pull/394
    • Last fixes for datasets' push_to_hub method by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/415

    Full Changelog: https://github.com/huggingface/huggingface_hub/compare/v0.0.19...v0.1.0

    Source code(tar.gz)
    Source code(zip)
  • v0.0.18(Oct 4, 2021)

    v0.0.18: Repo metadata, git tags, Keras mixin

    Repository metadata (@julien-c)

    The version v0.0.18 of the huggingface_hub includes tools to manage repository metadata. The following example reads metadata from a repository:

    from huggingface_hub import Repository
    
    repo = Repository("xxx", clone_from="yyy")
    data = repo.repocard_metadata_load()
    

    The following example completes that metadata before writing it to the repository locally.

    data["license"] = "apache-2.0"
    repo.repocard_metadata_save(data)
    
    • Repo metadata load and save #339 (@julien-c)

    Git tags (@AngledLuffa)

    Tag management is now available! Add, check, delete tags locally or remotely directly from the Repository utility.

    • Tags #323 (@AngledLuffa)

    Revisited Keras support (@nateraw)

    The Keras mixin has been revisited:

    • It now saves models as SavedModel objects rather than .h5 files.
    • It now offers methods that can be leveraged simply as a functional API, instead of having to use the Mixin as an actual mixin.

    Improvements and bug fixes

    • Better error message for bad token. #362 (@sgugger)
    • Add utility to get repo name #364 (@sgugger)
    • Improve save and load repocard metadata #355 (@elishowk)
    • Update Keras Mixin #284 (@nateraw)
    • Add timeout to dataset_info #373 (@lhoestq)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.17(Oct 4, 2021)

    v0.0.17: Non-blocking git push, notebook login

    Non-blocking git-push

    The pushing methods now have access to a blocking boolean parameter to indicate whether the push should happen asynchronously.

    In order to see if the push has finished or its status code (to spot a failure), one should use the command_queue property on the Repository object.

    For example:

    from huggingface_hub import Repository
    
    repo = Repository("<local_folder>", clone_from="<user>/<model_name>")
    
    with repo.commit("Commit message", blocking=False):
        # Save data
    
    last_command = repo.command_queue[-1]
    
    # Status of the push command
    last_command.status  
    # Will return the status code
    #     -> -1 will indicate the push is still ongoing
    #     -> 0 will indicate the push has completed successfully
    #     -> non-zero code indicates the error code if there was an error
    
    # if there was an error, the stderr may be inspected
    last_command.stderr
    
    # Whether the command finished or if it is still ongoing
    last_command.is_done
    
    # Whether the command errored-out.
    last_command.failed
    

    When using blocking=False, the commands will be tracked and your script will exit only when all pushes are done, even if other errors happen in your script (a failed push counts as done).

    • Non blocking git push #315 (@LysandreJik)

    Notebook login (@sgugger)

    The huggingface_hub library now has a notebook_login method which can be used to login on notebooks with no access to the shell. In a notebook, login with the following:

    from huggingface_hub import notebook_login
    
    notebook_login()
    
    • Add a widget to login in notebook #329 (@sgugger)

    Improvements and bugfixes

    • added option to create private repo #319 (@philschmid)
    • display git push warnings #326 (@elishowk)
    • Allow specifying data with the Inference API wrapper #271 (@osanseviero)
    • Add auth to snapshot download #340 (@lewtun)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.16(Aug 27, 2021)

    v0.0.16: Progress bars, git credentials

    The huggingface_hub version v0.0.16 introduces several quality of life improvements.

    Progress bars in Repository

    Progress bars are now visible with many git operations, such as pulling, cloning and pushing:

    >>> from huggingface_hub import Repository
    >>> repo = Repository("local_folder", clone_from="huggingface/CodeBERTa-small-v1")
    
    Cloning https://huggingface.co/huggingface/CodeBERTa-small-v1 into local empty directory.
    Download file pytorch_model.bin:  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                   | 144M/321M [00:13<00:12, 14.7MB/s]
    Download file flax_model.msgpack:  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                    | 134M/319M [00:13<00:13, 14.4MB/s]
    

    Branching support

    There is now branching support in Repository. This will clone the xxx repository and checkout the new-branch revision. If it is an existing branch on the remote, it will checkout that branch. If it is another revision, such as a commit or a tag, it will also checkout that revision.

    If the revision does not exist, it will create a branch from the latest commit on the main branch.

    >>> from huggingface_hub import Repository
    >>> repo = Repository("local", clone_from="xxx", revision="new-branch")
    

    Once the repository is instantiated, it is possible to manually checkout revisions using the git_checkout method. If the revision already exists:

    >>> repo.git_checkout("main")
    

    If a branch should be created from the current head in the case that it does not exist:

    >>> repo.git_checkout("brand-new-branch", create_branch_ok=True)
    
    Revision `brand-new-branch` does not exist. Created and checked out branch `brand-new-branch`
    

    Finally, the commit context manager has a new branch parameter to specify to which branch the utility should push:

    >>> with repo.commit("New commit on branch brand-new-branch", branch="brand-new-branch"):
    ...     # Save any file or model here, it will be committed to that branch.
    ...     torch.save(model.state_dict())
    

    Git credentials

    The login system has been redesigned to leverage git-credential instead of a token-based authentication system. It leverages the git-credential store helper. If you're unaware of what this is, you may see the following when logging in with huggingface_hub:

            _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
            _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
            _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
            _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
            _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
            
    Username: 
    Password: 
    Login successful
    Your token has been saved to /root/.huggingface/token
    Authenticated through git-crendential store but this isn't the helper defined on your machine.
    You will have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal to set it as the default
    
    git config --global credential.helper store
    

    Running the command git config --global credential.helper store will set this as the default way to handle credentials for git authentication. All repositories instantiated with the Repository utility will have this helper set by default, so no action is required from your part when leveraging it.

    Improved logging

    The logging system is now similar to the existing logging system in transformers and datasets, based on a logging module that controls the entire library's logging level:

    >>> from huggingface_hub import logging
    >>> logging.set_verbosity_error()
    >>> logging.set_verbosity_info()
    

    Bug fixes and improvements

    • Add documentation to GitHub and the Hub docs about the Inference client wrapper #253 (@osanseviero)
    • Have large files enabled by default when using Repository #219 (@LysandreJik)
    • Clarify/specify/document model card metadata, model-index, and pipeline/task types #265 (@julien-c)
    • [model_card][metadata] Actually, lets make dataset.name required #267 (@julien-c)
    • Progress bars #261 (@LysandreJik)
    • Add keras mixin #230 (@nateraw)
    • Open source code related to the repo type (tag icon, display order, snippets) #273 (@osanseviero)
    • Branch push to hub #276 (@LysandreJik)
    • Git credentials #277 (@LysandreJik)
    • Push to hub/commit with branches #282 (@LysandreJik)
    • Better logging #262 (@LysandreJik)
    • Remove custom language pack behavior #291 (@LysandreJik)
    • Update Hub and huggingface_hub docs #293 (@osanseviero)
    • Adding a handler #292 (@LysandreJik)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.15(Jul 28, 2021)

    v0.0.15: Documentation, bug fixes and misc improvements

    Improvements and bugfixes

    • [Docs] Update link to Gradio documentation #206 (@abidlabs)
    • Fix title typo (Cliet -> Client) #207 (@cakiki)
    • add _from_pretrained hook #159 (@nateraw)
    • Add filename option to lfs_track #212 (@LysandreJik)
    • Repository fixes #213 (@LysandreJik)
    • Repository documentation #214 (@LysandreJik)
    • Add datasets filtering and sorting #194 (@lhoestq)
    • doc: sync github to spaces #221 (@borisdayma)
    • added batch transform documentation & model archive documentation #224 (@philschmid)
    • Sync with hf internal #228 (@mishig25)
    • Adding batching support for superb #215 (@Narsil)
    • Adding SD for superb (speech-classification). #225 (@Narsil)
    • Use Hugging Face fork for s3prl #229 (@lewtun)
    • Mv interfaces -> widgets/lib/interfaces #227 (@mishig25)
    • Tweak to prevent accidental sharing of token #226 (@julien-c)
    • Fix CLI-based repo creation #234 (@osanseviero)
    • Add proxify util function #235 (@mishig25)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.14(Jul 18, 2021)

    v0.0.14: LFS Auto tracking, dataset_info and list_datasets, documentation

    Datasets

    Datasets repositories get better support, by first enabling full usage of the Repository class for datasets repositories:

    from huggingface_hub import Repository
    
    repo = Repository("local_directory", clone_from="<user>/<model_id>", repo_type="dataset")
    

    Datasets can now be retrieved from the Python runtime using the list_datasets method from the HfApi class:

    from huggingface_hub import HfApi
    
    api = HfApi()
    datasets = api.list_datasets()
    
    len(datasets)
    # 1048 publicly available dataset repositories at the time of writing
    

    Information can be retrieved on specific datasets using the dataset_info method from the HfApi class:

    from huggingface_hub import HfApi
    
    api = HfApi()
    api.dataset_info("squad")
    # DatasetInfo: {
    # 	id: squad
    #	lastModified: 2021-07-07T13:18:53.595Z
    #	tags: ['pretty_name:SQuAD', 'annotations_creators:crowdsourced', 'language_creators:crowdsourced', 'language_creators:found', 
    # [...]
    
    • Add dataset_info and list_datasets #164 (@lhoestq)
    • Enable dataset repositories #151 (@LysandreJik)

    Inference API wrapper client

    Version v0.0.14 introduces a wrapper client for the Inference API. No need to use custom-made requests anymore. See below for an example.

    from huggingface_hub import InferenceApi
    
    api = InferenceApi("bert-base-uncased")
    api(inputs="The [MASK] is great")
    # [
    #    {'sequence': 'the music is great', 'score': 0.03599703311920166, 'token': 2189, 'token_str': 'music'}, 
    #    {'sequence': 'the price is great', 'score': 0.02146693877875805, 'token': 3976, 'token_str': 'price'}, 
    #    {'sequence': 'the money is great', 'score': 0.01866752654314041, 'token': 2769, 'token_str': 'money'}, 
    #    {'sequence': 'the fun is great', 'score': 0.01654735580086708, 'token': 4569, 'token_str': 'fun'}, 
    #    {'sequence': 'the effect is great', 'score': 0.015102624893188477, 'token': 3466, 'token_str': 'effect'}
    # ]
    
    • Inference API wrapper client #65 (@osanseviero)

    Auto-track with LFS

    Version v0.0.14 introduces an auto-tracking mechanism with git-lfs for large files. Files that are larger than 10MB can be automatically tracked by using the auto_track_large_files method:

    from huggingface_hub import Repository
    
    repo = Repository("local_directory", clone_from="<user>/<model_id>")
    
    # save large files in `local_directory`
    repo.git_add()
    repo.auto_track_large_files()
    repo.git_commit("Add large files")
    repo.git_push()
    # No push rejected error anymore!
    

    It is automatically used when leveraging the commit context manager:

    from huggingface_hub import Repository
    
    repo = Repository("local_directory", clone_from="<user>/<model_id>")
    with repo.commit("Add large files"):
        # add large files
    
    # No push rejected error anymore!
    
    • Auto track with LFS #177 (@LysandreJik)

    Documentation

    • Update docs structure #145 (@Pierrci)
    • Update links to docs #147 (@LysandreJik)
    • Add new repo guide #153 (@osanseviero)
    • Add documentation for endpoints #155 (@osanseviero)
    • Document hf.co webhook publicly #156 (@julien-c)
    • docs: ✏️ mention the Training metrics tab #193 (@severo)
    • doc for Spaces #189 (@julien-c)

    Breaking changes

    Reminder: the huggingface_hub library follows semantic versioning and is undergoing active development. While the first major version is not out (v1.0.0), you should expect breaking changes and we strongly recommend pinning the library to a specific version.

    Two breaking changes are introduced with version v0.0.14.

    The whoami return changes from a tuple to a dictionary

    • Allow obtaining Inference API tokens with whoami #157 (@osanseviero)

    The whoami method changes its returned value from a tuple of (<user>, [<organisations>]) to a dictionary containing a lot more information:

    In versions v0.0.13 and below, here was the behavior of the whoami method from the HfApi class:

    from huggingface_hub import HfFolder, HfApi
    api = HfApi()
    api.whoami(HfFolder.get_token())
    # ('<user>', ['<org_0>', '<org_1>'])
    

    In version v0.0.14, this is updated to the following:

    from huggingface_hub import HfFolder, HfApi
    api = HfApi()
    api.whoami(HfFolder.get_token())
    # {
    #     'type': str, 
    #     'name': str, 
    #     'fullname': str, 
    #     'email': str,
    #     'emailVerified': bool, 
    #     'apiToken': str,
    #     `plan': str, 
    #     'avatarUrl': str,
    #     'orgs': List[str]
    # }
    

    The Repository's use_auth_token initialization parameter now defaults to True.

    The use_auth_token initialization parameter of the Repository class now defaults to True. The behavior is unchanged if users are not logged in, at which point Repository remains agnostic to the huggingface_hub.

    • Set use_auth_token to True by default #204 (@LysandreJik)

    Improvements and bugfixes

    • Add sklearn code snippet #133 (@osanseviero)
    • Allow passing only model ID to clone when authenticated #150 (@LysandreJik)
    • More robust endpoint with toggled staging endpoint #148 (@LysandreJik)
    • Add config to list_models #152 (@osanseviero)
    • Fix audio-to-audio widget and add icon #142 (@osanseviero)
    • Upgrade spaCy to api 0.0.12 and remove allowlist #161 (@osanseviero)
    • docs: fix webhook response format #162 (@severo)
    • Update link in README.md #163 (@nateraw)
    • Revert "docs: fix webhook response format (#162)" #165 (@severo)
    • Add Keras docker image #117 (@osanseviero)
    • Allow multiple models when testing a pipeline #124 (@osanseviero)
    • scikit rebased #170 (@Narsil)
    • Upgrading community frameworks to audio-to-audio. #94 (@Narsil)
    • Add sagemaker docs #173 (@philschmid)
    • Add Structured Data Classification as task #172 (@osanseviero)
    • Fixing keras outputs (widgets was ignoring because of type mismatch, now testing for it) #176 (@Narsil)
    • Updating spacy. #179 (@Narsil)
    • Create initial superb docker image structure #181 (@osanseviero)
    • Upgrading asteroid image. #175 (@Narsil)
    • Removing tests on huggingface_hub for unrelated changes in api-inference-community #180 (@Narsil)
    • Fixing audio-to-audio validation. #184 (@Narsil)
    • rmdir api-inference-community/src/sentence-transformers #188 (@Pierrci)
    • Allow generic inference for ASR for superb #185 (@osanseviero)
    • Add timestamp to snapshot download tests #201 (@LysandreJik)
    • No need for token to understand HF urls #203 (@LysandreJik)
    • Remove --no_renames argument to list deleted files. #205 (@LysandreJik)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.13(Jun 28, 2021)

    v0.0.13: Context Manager

    Version 0.0.13 introduces a context manager to save files directly to the Hub. See below for some examples.

    Example with a single file

    from huggingface_hub import Repository
    
    repo = Repository("text-files", clone_from="<user>/text-files", use_auth_token=True)
    
    with repo.commit("My first file."):
        with open("file.txt", "w+") as f:
            f.write(json.dumps({"key": "value"}))
    

    Example with a torch.save statement:

    import torch
    from huggingface_hub import Repository
    
    model = torch.nn.Transformer()
    
    repo = Repository("torch-files", clone_from="<user>/torch-files", use_auth_token=True)
    
    with repo.commit("Adding my cool model!"):
        torch.save(model.state_dict(), "model.pt")
    

    Example with a Flax/JAX seralization statement

    from flax import serialization
    from jax import random
    from flax import linen as nn
    from huggingface_hub import Repository
    
    model = nn.Dense(features=5)
    
    key1, key2 = random.split(random.PRNGKey(0))
    x = random.normal(key1, (10,))
    params = model.init(key2, x)
    
    bytes_output = serialization.to_bytes(params)
    
    repo = Repository("flax-model", clone_from="<user>/flax-model", use_auth_token=True)
    
    with repo.commit("Adding my cool Flax model!"):
        with open("flax_model.msgpack", "wb") as f:
            f.write(bytes_output)
    
    Source code(tar.gz)
    Source code(zip)
  • v0.0.12(Jun 23, 2021)

  • v0.0.11(Jun 23, 2021)

    v0.0.11: Improved documentation, hf_hub_download and Repository power-up

    Improved documentation

    The huggingface_hub documentation is now available on hf.co/docs! Additionally, a new step-by-step guide to adding libraries is available.

    • New documentation for πŸ€— Hub #71 (@osanseviero)
    • Step by step guide on adding Model Hub support to libraries #86 (@LysandreJik)

    New method: hf_hub_download

    A new method is introduced: hf_hub_download. It is the equivalent of doing cached_download(hf_hub_url()), in a single method.

    • HF Hub download #137 (@LysandreJik)

    Repository power-up

    The Repository class is updated to behave more similarly to git. It is now impossible to clone a repository in a folder that already contains files.

    The PyTorch Mixin contributed by @vasudevgupta7 is slightly updated to have the push_to_hub method manage a repository as one would from the command line.

    • Repository power-up #132 (@LysandreJik)

    Improvement & Fixes

    • Adding audio-to-audio task. #93 (@Narsil)
    • When pipelines fail to load in framework code, for whatever reason #96 (@Narsil)
    • Solve rmtree issue on windows #105 (@SBrandeis)
    • Add identical_ok option to HfApi.upload_file method #102 (@SBrandeis)
    • Solve compatibility issues when calling subprocess.run #104 (@SBrandeis)
    • Open source Inference widgets + optimize for community contributions #87 (@julien-c)
    • model tags can be undefined #107 (@Pierrci)
    • Doc tweaks #109 (@julien-c)
    • [huggingface_hub] Support for spaces #108 (@julien-c)
    • speechbrain library tag + code snippet #73 (@osanseviero)
    • Allow batching for feature-extraction #106 (@osanseviero)
    • adding audio-to-audio widget. #95 (@Narsil)
    • Add image to text (for image captioning) #114 (@osanseviero)
    • Add formatting and upgrade Sentence Transformers api version for better error messages #119 (@osanseviero)
    • Change videos in docs so they are played directly in our site #120 (@osanseviero)
    • Fix inference API GitHub actions #125 (@osanseviero)
    • Fixing sentence-transformers CACHE value for docker + functools (docker needs Py3.8) #123 (@Narsil)
    • Load errors with flair should now be generating proper API errors. #121 (@Narsil)
    • Simplify manage to autodetect task+framework if possible. #122 (@Narsil)
    • Change sentence transformers source to original repo #128 (@osanseviero)
    • Allow Python versions with letters in the minor version suffix #82 (@ulf1)
    • Update upload_file docs #136 (@LysandreJik)
    • Reformat repo README #130 (@osanseviero)
    • Add config to model info #135 (@osanseviero)
    • Add input validation for structured-data-classification #97 (@osanseviero)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.10(Jun 8, 2021)

    v0.0.10: Merging huggingface_hub with api-inference-community and hub interfaces

    v0.0.10 Signs the merging of three components of the HuggingFace stack: the huggingface_hub repository is now the central platform to contribute new libraries to be supported on the hub.

    It regroups three previously separated components:

    • The huggingface_hub Python library, as the Python library to download, upload, and retrieve information from the hub.
    • The api-inference-community, as the platform where libraries wishing for hub support may be added.
    • The interfaces, as the definition for pipeline types as well as default widget inputs and definitions/UI elements for third-party libraries.

    Future efforts will be focused on further easing contributing third-party libraries to the Hugging Face Hub

    Improvement & Fixes

    • Add typing extensions to conda yaml file #49 (@LysandreJik)
    • Alignment on modelcard metadata specification #39 (@LysandreJik)
    • Bring interfaces from widgets-server #50 (@julien-c)
    • Sentence similarity default widget and pipeline type #52 (@osanseviero)
    • [interfaces] Expose configuration options for external libraries #51 (@julien-c)
    • Adding api-inference-community to huggingface_hub. #48 (@Narsil)
    • Add TensorFlowTTS as library + code snippet #55 (@osanseviero)
    • Add protobuf as a dependency to handle tokenizers that require it: #58 (@Narsil)
    • Update validation for NLP tasks #59 (@osanseviero)
    • spaCy code snippet and language tag #57 (@osanseviero)
    • SpaCy fixes #60 (@osanseviero)
    • Allow changing repo visibility programmatically #61 (@osanseviero)
    • Add Adapter Transformers snippet #62 (@osanseviero)
    • Change order in spaCy snippet #66 (@osanseviero)
    • Add validation to check all rows in table question answering have same length #67 (@osanseviero)
    • added question-answering part for Bengali language #68 (@sagorbrur)
    • Add spaCy to inference API #63 (@osanseviero)
    • AllenNLP library tag + code snippet #72 (@osanseviero)
    • Fix AllenNLP QA example #80 (@epwalsh)
    • do not crash even if this config isn't set #81 (@julien-c)
    • Mark model config as optional #83 (@Pierrci)
    • Add repr() to ModelFile and RepoObj #75 (@lewtun)
    • Refactor create_repo #84 (@SBrandeis)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.9(May 20, 2021)

    v0.0.9: HTTP file uploads, multiple filter model selection

    Support for large file uploads

    Implementation of an endpoint to programmatically upload (large) files to any repo on the hub, without the need for git, using HTTP POST requests.

    • [API] Support for the file upload endpoint #42 (@SBrandeis)

    The HfApi.model_list method now allows multiple filters

    Models may now be filtered using several filters:

                    Example usage:
    
                        >>> from huggingface_hub import HfApi
                        >>> api = HfApi()
    
                        >>> # List all models
                        >>> api.list_models()
    
                        >>> # List only the text classification models
                        >>> api.list_models(filter="text-classification")
    
                        >>> # List only the russian models compatible with pytorch
                        >>> api.list_models(filter=("ru", "pytorch"))
    
                        >>> # List only the models trained on the "common_voice" dataset
                        >>> api.list_models(filter="dataset:common_voice")
    
                        >>> # List only the models from the AllenNLP library
                        >>> api.list_models(filter="allennlp")
    
    • Document the filter argument #41 (@LysandreJik)

    ModelInfo now has a readable representation

    Improvement of the ModelInfo class so that it displays information about the object.

    • Include a readable repr for ModelInfo #32 (@muellerzr)

    Improvements and bugfixes

    • Fix conda by specifying python version + add tests to main branch #28 (@LysandreJik)
    • Improve Mixin #34 (@LysandreJik)
    • Enable library_name and library_version in snapshot_download #38 (@LysandreJik)
    • [Windows support] Very long filenames #40 (@LysandreJik)
    • Make error message more verbose when creating a repo #44 (@osanseviero)
    • Open-source /docs #46 (@julien-c)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.8(Apr 7, 2021)

    • Addition of the HfApi.model_info method to retrieve information about a repo given a revision.
    • The accompanying snapshot_download utility to download to cache all files stored in that repo at that given revision.

    Example usage of HfApi.model_info:

    from huggingface_hub import HfApi
    
    hf_api = HfApi()
    model_info = hf_api.model_info("lysandre/dummy-hf-hub")
    
    print("Model ID:", model_info.modelId)
    
    for file in model_info.siblings:
        print("file:", file.rfilename)
    

    outputs:

    Model ID: lysandre/dummy-hf-hub
    file: .gitattributes
    file: README.md
    

    Example usage of snapshot_download:

    from huggingface_hub import snapshot_download
    import os
    
    repo_path = snapshot_download("lysandre/dummy-hf-hub")
    print(os.listdir(repo_path))
    

    outputs:

    ['.gitattributes', 'README.md']
    
    Source code(tar.gz)
    Source code(zip)
  • v0.0.7(Mar 18, 2021)

    • Networking improvements by @Pierrci and @lhoestq (#21 and #22)

    • Adding mixin class for ease saving, uploading, downloading a PyTorch model. See PR #11 by @vasudevgupta7

    Example usage:

    from huggingface_hub import ModelHubMixin
    
    class MyModel(nn.Module, ModelHubMixin):
       def __init__(self, **kwargs):
          super().__init__()
          self.config = kwargs.pop("config", None)
          self.layer = ...
       def forward(self, ...):
          return ...
    
    model = MyModel()
    
    # saving model to local directory & pushing to hub
    model.save_pretrained("mymodel", push_to_hub=True, config={"act": "gelu"})
    
    # initiatizing model & loading it from trained-weights
    model = MyModel.from_pretrained("username/[email protected]")
    

    Thanks a ton for your contributions β™₯️

    Source code(tar.gz)
    Source code(zip)
Owner
Hugging Face
Solving NLP, one commit at a time!
Hugging Face
APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

APEACH - Korean Hate Speech Evaluation Datasets APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of

Kevin-Yang 70 Dec 06, 2022
Autoregressive Entity Retrieval

The GENRE (Generative ENtity REtrieval) system as presented in Autoregressive Entity Retrieval implemented in pytorch. @inproceedings{decao2020autoreg

Meta Research 611 Dec 16, 2022
μˆ­μ‹€λŒ€ν•™κ΅ 컴퓨터학뢀 μ „κ³΅μ’…ν•©μ„€κ³„ν”„λ‘œμ νŠΈ

✨ μ‹œκ°μž₯애인을 μœ„ν•œ λ²„μŠ€λ„μ°© μ•Œλ¦Ό μž₯치 ✨ πŸ‘€ κ°œμš” ν˜„λŒ€ μ‚¬νšŒμ—μ„œ λŒ€μ€‘κ΅ν†΅ μœ„μΉ˜ 정보λ₯Ό μ΄μš©ν•˜μ—¬ μ‚¬λžŒλ“€μ΄ κ°„λ‹¨ν•˜κ²Œ μ΄μš©ν•  λŒ€μ€‘κ΅ν†΅μ˜ 정보λ₯Ό μ–»κ³  μ‰½κ²Œ λŒ€μ€‘κ΅ν†΅μ„ μ΄μš©ν•  수 μžˆλ‹€. ν•΄λ‹Ή μ •λ³΄λŠ” 각쒅 μ–΄ν”Œλ¦¬μΌ€μ΄μ…˜κ³Ό λŒ€μ€‘κ΅ν†΅ μ΄μš©μ‹œμ„€μ—μ„œ μœ„μΉ˜ 정보λ₯Ό μ œκ³΅ν•˜κ³  μžˆμ§€λ§Œ μ‹œκ°

taegyun 3 Jan 25, 2022
A method for cleaning and classifying text using transformers.

NLP Translation and Classification The repository contains a method for classifying and cleaning text using NLP transformers. Overview The input data

Ray Chamidullin 0 Nov 15, 2022
Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Neural G2P to portuguese language Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written for

fluz 11 Nov 16, 2022
Repositório do trabalho de introdução a NLP

Trabalho da disciplina de BI NLP Repositório do trabalho da disciplina Introdução a Processamento de Linguagem Natural da pós BI-Master da PUC-RIO. Eq

Leonardo Lins 1 Jan 18, 2022
[AAAI 21] Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning

β—₯ Curriculum Labeling β—£ Revisiting Pseudo-Labeling for Semi-Supervised Learning Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, Vicente Ordonez. In the

UVA Computer Vision 113 Dec 15, 2022
A simple Speech Emotion Recognition (SER) API created using Flask and running in a Docker container.

keyword_searching Steps to use this Python scripts: (1)Paste this script into the file folder containing the PDF files you need to search from; (2)Thi

2 Nov 11, 2022
A benchmark for evaluation and comparison of various NLP tasks in Persian language.

Persian NLP Benchmark The repository aims to track existing natural language processing models and evaluate their performance on well-known datasets.

Mofid AI 68 Dec 19, 2022
NLP Overview

NLP-Overview Introduction The field of NPL encompasses a variety of topics which involve the computational processing and understanding of human langu

PeterPham 1 Jan 13, 2022
Suite of 500 procedurally-generated NLP tasks to study language model adaptability

TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas

Belinda Li 20 May 17, 2022
Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

THUNLP 118 Dec 30, 2022
Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

Wasi Ahmad 138 Dec 30, 2022
Predict the spans of toxic posts that were responsible for the toxic label of the posts

toxic-spans-detection An attempt at the SemEval 2021 Task 5: Toxic Spans Detection. The Toxic Spans Detection task of SemEval2021 required participant

Ilias Antonopoulos 3 Jul 24, 2022
ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost LOVE is accpeted by ACL22 main conference as a long pape

Lihu Chen 32 Jan 03, 2023
Amazon Multilingual Counterfactual Dataset (AMCD)

Amazon Multilingual Counterfactual Dataset (AMCD)

35 Sep 20, 2022
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 26 Dec 14, 2022
HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

HuggingSound HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools. I have no intention of building a very complex tool here.

Jonatas Grosman 247 Dec 26, 2022
Natural language Understanding Toolkit

Natural language Understanding Toolkit TOC Requirements Installation Documentation CLSCL NER References Requirements To install nut you need: Python 2

Peter Prettenhofer 119 Oct 08, 2022
BiQE: Code and dataset for the BiQE paper

BiQE: Bidirectional Query Embedding This repository includes code for BiQE and the datasets introduced in Answering Complex Queries in Knowledge Graph

Bhushan Kotnis 1 Oct 20, 2021