The fastai deep learning library

Overview

Welcome to fastai

fastai simplifies training fast and accurate neural nets using modern best practices

Important: This documentation covers fastai v2, which is a from-scratch rewrite of fastai. The v1 documentation has moved to fastai1.fast.ai. To stop fastai from updating to v2, run in your terminal echo 'fastai 1.*' >> $CONDA_PREFIX/conda-meta/pinned (if you use conda).

CI PyPI Conda (channel only) Build fastai images docs

Installing

You can use fastai without any installation by using Google Colab. In fact, every page of this documentation is also available as an interactive notebook - click "Open in colab" at the top of any page to open it (be sure to change the Colab runtime to "GPU" to have it run fast!) See the fast.ai documentation on Using Colab for more information.

You can install fastai on your own machines with conda (highly recommended). If you're using Anaconda then run:

conda install -c fastai -c pytorch -c anaconda fastai gh anaconda

...or if you're using miniconda) then run:

conda install -c fastai -c pytorch fastai

To install with pip, use: pip install fastai. If you install with pip, you should install PyTorch first by following the PyTorch installation instructions.

If you plan to develop fastai yourself, or want to be on the cutting edge, you can use an editable install (if you do this, you should also use an editable install of fastcore to go with it.):

git clone https://github.com/fastai/fastai
pip install -e "fastai[dev]"

Learning fastai

The best way to get started with fastai (and deep learning) is to read the book, and complete the free course.

To see what's possible with fastai, take a look at the Quick Start, which shows how to use around 5 lines of code to build an image classifier, an image segmentation model, a text sentiment model, a recommendation system, and a tabular model. For each of the applications, the code is much the same.

Read through the Tutorials to learn how to train your own models on your own datasets. Use the navigation sidebar to look through the fastai documentation. Every class, function, and method is documented here.

To learn about the design and motivation of the library, read the peer reviewed paper.

About fastai

fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying patterns of many deep learning and data processing techniques in terms of decoupled abstractions. These abstractions can be expressed concisely and clearly by leveraging the dynamism of the underlying Python language and the flexibility of the PyTorch library. fastai includes:

  • A new type dispatch system for Python along with a semantic type hierarchy for tensors
  • A GPU-optimized computer vision library which can be extended in pure Python
  • An optimizer which refactors out the common functionality of modern optimizers into two basic pieces, allowing optimization algorithms to be implemented in 4–5 lines of code
  • A novel 2-way callback system that can access any part of the data, model, or optimizer and change it at any point during training
  • A new data block API
  • And much more...

fastai is organized around two main design goals: to be approachable and rapidly productive, while also being deeply hackable and configurable. It is built on top of a hierarchy of lower-level APIs which provide composable building blocks. This way, a user wanting to rewrite part of the high-level API or add particular behavior to suit their needs does not have to learn how to use the lowest level.

Layered API

Migrating from other libraries

It's very easy to migrate from plain PyTorch, Ignite, or any other PyTorch-based library, or even to use fastai in conjunction with other libraries. Generally, you'll be able to use all your existing data processing code, but will be able to reduce the amount of code you require for training, and more easily take advantage of modern best practices. Here are migration guides from some popular libraries to help you on your way:

Tests

To run the tests in parallel, launch:

nbdev_test_nbs or make test

For all the tests to pass, you'll need to install the following optional dependencies:

pip install "sentencepiece<0.1.90" wandb tensorboard albumentations pydicom opencv-python scikit-image pyarrow kornia \
    catalyst captum neptune-cli

Tests are written using nbdev, for example see the documentation for test_eq.

Contributing

After you clone this repository, please run nbdev_install_git_hooks in your terminal. This sets up git hooks, which clean up the notebooks to remove the extraneous stuff stored in the notebooks (e.g. which cells you ran) which causes unnecessary merge conflicts.

Before submitting a PR, check that the local library and notebooks match. The script nbdev_diff_nbs can let you know if there is a difference between the local library and the notebooks.

  • If you made a change to the notebooks in one of the exported cells, you can export it to the library with nbdev_build_lib or make fastai.
  • If you made a change to the library, you can export it back to the notebooks with nbdev_update_lib.

Docker Containers

For those interested in official docker containers for this project, they can be found here.

Comments
  • System halted when calling `model.fit`

    System halted when calling `model.fit`

    I tried lesson1 and lesson4-imdb jupyter notebook, however, whenever I tried to train a model(calling fit method), the system halted and then rebooted.

    I tried to debug by myself, checked all system logs, and searched any suspicious log via everything, but none of them seems to log the error details.

    I notice that anaconda install cudnn-7.1.4-cuda9.0_0 and perhaps it's conflicting with current cuda?

    The error cell in lesson1

    arch=resnet34
    data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
    learn = ConvLearner.pretrained(arch, data, precompute=True)
    learn.fit(0.01, 2)
    

    The error cell in lesson4-imdb

    learner.fit(3e-3, 4, wds=1e-6, cycle_len=1, cycle_mult=2)
    

    Running the error cells cause system halt and reboot.

    os: windows 10
    jupyter lab: 0.32.1
    notebook: 5.5.0
    cuda: 9.0.176
    
    cudnn1(installed before): cudnn-9.0-windows10-x64-v7.2.1.38
    cudnn2(installed by anaconda): cudnn-7.1.4-cuda9.0_0
    - E:\Anaconda3\pkgs\cudnn-7.1.4-cuda9.0_0
    
    
    packages:
    
    alabaster==0.7.11
    appdirs==1.4.3
    asn1crypto==0.24.0
    astroid==2.0.4
    atomicwrites==1.2.1
    attrs==18.2.0
    Automat==0.7.0
    Babel==2.6.0
    backcall==0.1.0
    bcolz==1.2.1
    beautifulsoup4==4.6.3
    bleach==2.1.4
    bokeh==0.13.0
    certifi==2018.8.24
    cffi==1.11.5
    chardet==3.0.4
    click==6.7
    click-plugins==1.0.3
    cliff==2.8.2
    cligj==0.4.0
    cloudpickle==0.5.5
    cmd2==0.9.4
    colorama==0.3.9
    configparser==3.5.0
    constantly==15.1.0
    cryptography==2.3.1
    cryptography-vectors==2.3.1
    cssselect==1.0.3
    cycler==0.10.0
    cymem==1.31.2
    cytoolz==0.9.0.1
    dask==0.19.0
    decorator==4.3.0
    descartes==1.1.0
    dill==0.2.8.2
    distributed==1.23.0
    docutils==0.14
    en-core-web-sm==2.0.0
    entrypoints==0.2.3
    feather-format==0.4.0
    feedparser==5.2.1
    Fiona==1.7.10
    GDAL==2.2.2
    geopandas==0.4.0
    graphviz==0.9
    h5py==2.8.0rc1
    heapdict==1.0.0
    html5lib==1.0.1
    hyperlink==17.3.1
    idna==2.7
    imagesize==1.1.0
    incremental==17.5.0
    ipykernel==4.9.0
    ipython==6.5.0
    ipython-genutils==0.2.0
    ipywidgets==7.4.1
    isort==4.3.4
    isoweek==1.3.3
    jedi==0.12.1
    Jinja2==2.10
    jsonschema==2.6.0
    jupyter==1.0.0
    jupyter-client==5.2.3
    jupyter-console==5.2.0
    jupyter-contrib-core==0.3.3
    jupyter-contrib-nbextensions==0.5.0
    jupyter-core==4.4.0
    jupyter-highlight-selected-word==0.2.0
    jupyter-latex-envs==1.4.4
    jupyter-nbextensions-configurator==0.4.0
    kaggle-cli==0.12.13
    keyring==13.2.1
    kiwisolver==1.0.1
    lazy-object-proxy==1.3.1
    locket==0.2.0
    lxml==4.0.0
    MarkupSafe==1.0
    matplotlib==2.2.3
    mccabe==0.6.1
    MechanicalSoup==0.8.0
    mistune==0.8.3
    mizani==0.4.6
    mkl-fft==1.0.6
    mkl-random==1.0.1
    more-itertools==4.3.0
    msgpack==0.5.6
    msgpack-numpy==0.4.3.1
    munch==2.3.2
    murmurhash==0.28.0
    nbconvert==5.3.1
    nbformat==4.4.0
    notebook==5.6.0
    numexpr==2.6.6
    numpy==1.15.1
    numpydoc==0.8.0
    olefile==0.45.1
    opencv-python==3.4.2.17
    packaging==17.1
    palettable==3.1.1
    pandas==0.23.4
    pandas-summary==0.0.5
    pandocfilters==1.4.2
    parso==0.3.1
    partd==0.3.8
    path.py==11.0.1
    patsy==0.5.0
    pbr==4.2.0
    pexpect==4.6.0
    pickleshare==0.7.4
    Pillow==5.2.0
    plac==0.9.6
    plotnine==0.4.0
    pluggy==0.7.1
    preshed==1.0.0
    prettytable==0.7.2
    progressbar2==3.34.3
    prometheus-client==0.3.0
    prompt-toolkit==1.0.15
    psutil==5.4.7
    py==1.6.0
    pyarrow==0.10.0
    pyasn1==0.4.4
    pyasn1-modules==0.2.1
    pycodestyle==2.4.0
    pycparser==2.18
    pyflakes==2.0.0
    Pygments==2.2.0
    PyHamcrest==1.9.0
    pylint==2.1.1
    pyOpenSSL==18.0.0
    pyparsing==2.2.0
    pyperclip==1.6.4
    pyproj==1.9.5.1
    pyreadline==2.1
    PySocks==1.6.8
    pytest==3.7.4
    python-dateutil==2.7.3
    python-utils==2.3.0
    pytz==2018.5
    pywin32==223
    pywinpty==0.5.4
    PyYAML==3.13
    pyzmq==17.1.2
    QtAwesome==0.4.4
    qtconsole==4.4.1
    QtPy==1.5.0
    regex==2017.11.9
    requests==2.19.1
    rope==0.11.0
    scikit-learn==0.19.2
    scipy==1.1.0
    seaborn==0.9.0
    Send2Trash==1.5.0
    service-identity==17.0.0
    Shapely==1.6.4.post2
    simplegeneric==0.8.1
    six==1.11.0
    sklearn-pandas==1.7.0
    snowballstemmer==1.2.1
    sortedcontainers==2.0.4
    spacy==2.0.12
    Sphinx==1.7.8
    sphinxcontrib-websupport==1.1.0
    spyder==3.3.1
    spyder-kernels==0.2.6
    statsmodels==0.9.0
    stevedore==1.29.0
    tables==3.4.4
    tblib==1.3.2
    termcolor==1.1.0
    terminado==0.8.1
    testfixtures==6.3.0
    testpath==0.3.1
    thinc==6.10.3
    toolz==0.9.0
    torch==0.4.1
    torchtext==0.2.3
    torchvision==0.2.1
    tornado==4.5.3
    tqdm==4.24.0
    traitlets==4.3.2
    Twisted==18.7.0
    typed-ast==1.1.0
    ujson==1.35
    urllib3==1.23
    wcwidth==0.1.7
    webencodings==0.5.1
    widgetsnbextension==3.4.1
    win-inet-pton==1.0.1
    wincertstore==0.2
    wrapt==1.10.11
    zict==0.1.3
    zope.interface==4.5.0
    
    opened by geekan 68
  • RuntimeError: received 0 items of ancdata

    RuntimeError: received 0 items of ancdata

    I'm running into an issue when trying to predict with the dn models. From what I've researched it seems maybe related to this issue https://github.com/pytorch/pytorch/issues/973 from the pytorch forums and the workaround there was setting the number of workers to 0. If anybody else has encountered this or knows how to set the number of workers to 0, I tried setting num_workers on ImageClassifierData to 0, but that didn't solve the issue for me. I don't know if there is anything that can be done on the fastai side since it appears to be a pytorch problem, but I figured it's at least worth documenting and if anybody has any ideas they can look into it.

    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    <ipython-input-13-c94a818ff72b> in <module>()
         10     learn[i].fit(0.01, 3, cycle_len=1, cycle_mult=4)
         11 
    ---> 12     test_predictions = learn[i].predict(is_test=True)
         13 
         14     #tmp_log_preds,tmp_y = learn[i].TTA(is_test=True, n_aug=50)
    
    ~/fastaip1v2/fastai/courses/dl1/fastai/learner.py in predict(self, is_test)
        136         self.load('tmp')
        137 
    --> 138     def predict(self, is_test=False): return self.predict_with_targs(is_test)[0]
        139 
        140     def predict_with_targs(self, is_test=False):
    
    ~/fastaip1v2/fastai/courses/dl1/fastai/learner.py in predict_with_targs(self, is_test)
        140     def predict_with_targs(self, is_test=False):
        141         dl = self.data.test_dl if is_test else self.data.val_dl
    --> 142         return predict_with_targs(self.model, dl)
        143 
        144     def predict_dl(self, dl): return predict_with_targs(self.model, dl)[0]
    
    ~/fastaip1v2/fastai/courses/dl1/fastai/model.py in predict_with_targs(m, dl)
        115     if hasattr(m, 'reset'): m.reset()
        116     preda,targa = zip(*[(get_prediction(m(*VV(x))),y)
    --> 117                         for *x,y in iter(dl)])
        118     return to_np(torch.cat(preda)), to_np(torch.cat(targa))
        119 
    
    ~/fastaip1v2/fastai/courses/dl1/fastai/model.py in <listcomp>(.0)
        114     m.eval()
        115     if hasattr(m, 'reset'): m.reset()
    --> 116     preda,targa = zip(*[(get_prediction(m(*VV(x))),y)
        117                         for *x,y in iter(dl)])
        118     return to_np(torch.cat(preda)), to_np(torch.cat(targa))
    
    ~/fastaip1v2/fastai/courses/dl1/fastai/dataset.py in __next__(self)
        226         if self.i>=len(self.dl): raise StopIteration
        227         self.i+=1
    --> 228         return next(self.it)
        229 
        230     @property
    
    ~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
        193         while True:
        194             assert (not self.shutdown and self.batches_outstanding > 0)
    --> 195             idx, batch = self.data_queue.get()
        196             self.batches_outstanding -= 1
        197             if idx != self.rcvd_idx:
    
    ~/anaconda3/envs/fastai/lib/python3.6/multiprocessing/queues.py in get(self)
        335             res = self._reader.recv_bytes()
        336         # unserialize the data after having released the lock
    --> 337         return _ForkingPickler.loads(res)
        338 
        339     def put(self, obj):
    
    ~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/multiprocessing/reductions.py in rebuild_storage_fd(cls, df, size)
         68         fd = multiprocessing.reduction.rebuild_handle(df)
         69     else:
    ---> 70         fd = df.detach()
         71     try:
         72         storage = storage_from_cache(cls, fd_id(fd))
    
    ~/anaconda3/envs/fastai/lib/python3.6/multiprocessing/resource_sharer.py in detach(self)
         56             '''Get the fd.  This should only be called once.'''
         57             with _resource_sharer.get_connection(self._id) as conn:
    ---> 58                 return reduction.recv_handle(conn)
         59 
         60 
    
    ~/anaconda3/envs/fastai/lib/python3.6/multiprocessing/reduction.py in recv_handle(conn)
        180         '''Receive a handle over a local connection.'''
        181         with socket.fromfd(conn.fileno(), socket.AF_UNIX, socket.SOCK_STREAM) as s:
    --> 182             return recvfds(s, 1)[0]
        183 
        184     def DupFd(fd):
    
    ~/anaconda3/envs/fastai/lib/python3.6/multiprocessing/reduction.py in recvfds(sock, size)
        159             if len(ancdata) != 1:
        160                 raise RuntimeError('received %d items of ancdata' %
    --> 161                                    len(ancdata))
        162             cmsg_level, cmsg_type, cmsg_data = ancdata[0]
        163             if (cmsg_level == socket.SOL_SOCKET and
    
    RuntimeError: received 0 items of ancdata
    
    opened by kevinbird15 46
  • cannot instantiate 'WindowsPath' on your system

    cannot instantiate 'WindowsPath' on your system

    Describe the bug

    Installed fastai on the python:3.7 image but it fails to load the model:

    empty_data = ImageDataBunch.load_empty(modelPath) File "/usr/local/lib/python3.7/site-packages/fastai/data_block.py", line 649, in _databunch_load_empty sd = LabelLists.load_empty(path, fn=fname) File "/usr/local/lib/python3.7/site-packages/fastai/data_block.py", line 513, in load_empty state = pickle.load(open(path/fn, 'rb')) File "/usr/local/lib/python3.7/pathlib.py", line 997, in new % (cls.name,)) NotImplementedError: cannot instantiate 'WindowsPath' on your system

    Provide your installation details

    === Software ===
    python       : 3.7.2
    fastai       : 1.0.40
    fastprogress : 0.1.18
    torch        : 1.0.0
    torch cuda   : 9.0.176 / is **Not available**
    
    === Hardware ===
    No GPUs available
    
    === Environment ===
    platform     : Linux-4.9.125-linuxkit-x86_64-with-debian-9.6
    distro       : #1 SMP Fri Sep 7 08:20:28 UTC 2018
    conda env    : Unknown
    python       : /usr/local/bin/python
    sys.path     :
    /usr/local/lib/python37.zip
    /usr/local/lib/python3.7
    /usr/local/lib/python3.7/lib-dynload
    /usr/local/lib/python3.7/site-packages
    no supported gpus found on this system
    

    To Reproduce Try build this docker file and then run it:

    FROM python:3.7
    
    WORKDIR /app
    
    RUN pip3 install flask flask-cors gunicorn
    RUN pip3 install torch torchvision
    RUN pip3 install fastai
    
    COPY ./src /app/src
    COPY ./dist /app/dist
    
    CMD gunicorn --bind 0.0.0.0:$PORT src.app:app
    

    Expected behavior The model should load and predict correctly.

    Screenshots

    Additional context

    opened by PsidomPC 35
  • Serialization / Deserialization of Fastai objects to byte streams

    Serialization / Deserialization of Fastai objects to byte streams

    Added options to save/export/load using BytesIO streams to the following functions: Learn.save, Learn.export, load_learner, DataBunch.save, load_data.

    Following a discussion with @sgugger here.

    opened by bachsh 31
  • ImageDataLoaders num_workers >0 → RuntimeError: Cannot pickle CUDA storage; try pickling a CUDA tensor instead

    ImageDataLoaders num_workers >0 → RuntimeError: Cannot pickle CUDA storage; try pickling a CUDA tensor instead

    Please confirm you have the latest versions of fastai, fastcore, fastscript, and nbdev prior to reporting a bug (delete one): YES

    Describe the bug When using a DataLoaders with num_workers>0, training raises RuntimeError: Cannot pickle CUDA storage; try pickling a CUDA tensor instead

    To Reproduce Steps to reproduce the behavior:

    from fastai.vision.data import ImageDataLoaders
    from fastai.vision.learner import cnn_learner
    from fastai.vision.augment import aug_transforms
    import pandas as pd
    from fastai import vision
    
    df = pd.read_csv("/data/cats/labels.csv")
    
    data = ImageDataLoaders.from_df(df=df, path="/", label_col=1, bs=100, batch_tfms=[
        *aug_transforms(size=224)], valid_pct=0.2, num_workers=1)
    learn = cnn_learner(data, getattr(vision.models, "resnet18"))
    learn.fit_one_cycle(10)
    

    Expected behavior There shouldn't be an exception, as there is none when using num_workers=0.

    Error with full stack trace

    Place between these lines with triple backticks:

    Traceback (most recent call last):
      File "/home/df/git/mitl/mitlmodels/model.py", line 426, in train
        pass  # This comment shows up if we ran into a callback error
      File "/home/df/git/mitl/mitlmodels/ml_utils.py", line 63, in __exit__
        raise exc_type(exc_val).with_traceback(exc_tb) from None
      File "/home/df/git/mitl/mitlmodels/model.py", line 401, in train
        learn.fit_one_cycle(max_epochs, slice(lr_init, lr_init * 30), wd=wd,
      File "/home/df/.local/lib/python3.8/site-packages/fastcore/logargs.py", line 56, in _f
        return inst if to_return else f(*args, **kwargs)
      File "/home/df/.local/lib/python3.8/site-packages/fastai/callback/schedule.py", line 113, in fit_one_cycle
        self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
      File "/home/df/.local/lib/python3.8/site-packages/fastcore/logargs.py", line 56, in _f
        return inst if to_return else f(*args, **kwargs)
      File "/home/df/.local/lib/python3.8/site-packages/fastai/learner.py", line 207, in fit
        self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup)
      File "/home/df/.local/lib/python3.8/site-packages/fastai/learner.py", line 155, in _with_events
        try:       self(f'before_{event_type}')       ;f()
      File "/home/df/.local/lib/python3.8/site-packages/fastai/learner.py", line 197, in _do_fit
        self._with_events(self._do_epoch, 'epoch', CancelEpochException)
      File "/home/df/.local/lib/python3.8/site-packages/fastai/learner.py", line 155, in _with_events
        try:       self(f'before_{event_type}')       ;f()
      File "/home/df/.local/lib/python3.8/site-packages/fastai/learner.py", line 191, in _do_epoch
        self._do_epoch_train()
      File "/home/df/.local/lib/python3.8/site-packages/fastai/learner.py", line 183, in _do_epoch_train
        self._with_events(self.all_batches, 'train', CancelTrainException)
      File "/home/df/.local/lib/python3.8/site-packages/fastai/learner.py", line 155, in _with_events
        try:       self(f'before_{event_type}')       ;f()
      File "/home/df/.local/lib/python3.8/site-packages/fastai/learner.py", line 161, in all_batches
        for o in enumerate(self.dl): self.one_batch(*o)
      File "/home/df/.local/lib/python3.8/site-packages/fastai/data/load.py", line 102, in __iter__
        for b in _loaders[self.fake_l.num_workers==0](self.fake_l):
      File "/home/df/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 737, in __init__
        w.start()
      File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
        self._popen = self._Popen(self)
      File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "/usr/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
        return Popen(process_obj)
      File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    
    opened by dreamflasher 26
  • Added heatmap boolean variable to plot_top_losses. By default this va…

    Added heatmap boolean variable to plot_top_losses. By default this va…

    Added heatmap boolean variable to plot_top_losses. By default this variable is True.

    When true, plot_top_losses will overlay heat-maps on the top of images. Otherwise, plot_top_losses will display only images associated with top losses.

    I am not sure how to write a test case. But here is two scenarios in which test worked well with and without my code. (I assumed that passing of test is equivalent of displaying images associated with top losses). I am looking forward to learn more about it.

    #####with my code path = untar_data(URLs.PETS); path_anno = path/'annotations' path_img = path/'images' np.random.seed(2) pat = re.compile(r'/([^/]+)_\d+.jpg$') data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats) learn = create_cnn(data, models.resnet34, metrics=error_rate) interp = ClassificationInterpretation.from_learner(learn) losses,idxs = interp.top_losses() len(data.valid_ds)==len(losses)==len(idxs) interp.plot_top_losses(9, figsize=(15,11),heatmap=True)

    ###without my code path = untar_data(URLs.PETS); path_anno = path/'annotations' path_img = path/'images' np.random.seed(2) pat = re.compile(r'/([^/]+)_\d+.jpg$') data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats) learn = create_cnn(data, models.resnet34, metrics=error_rate) interp = ClassificationInterpretation.from_learner(learn) losses,idxs = interp.top_losses() len(data.valid_ds)==len(losses)==len(idxs) interp.plot_top_losses(9, figsize=(15,11))

    opened by at110 25
  • ImageCleaner.next_batch() and/or .render() broken in JupyterLab

    ImageCleaner.next_batch() and/or .render() broken in JupyterLab

    Describe the bug Hello, After successfully running notebooks in an instance running on GCP by following the instructions, I cannot get the ImageCleaner widget to work properly. First issue: the widget does not even appear in JupyterLab unless I install the ipywidgets JupyterLab extension; only the object is returned. Second, even after installing this extension, the "Next Batch" button does not work. The CSV is properly created and updated, but the next batch of images are not rendered. This leads me to believe that ImageCleaner.render() is broken.

    Provide your installation details

    === Software === 
    python        : 3.7.1
    fastai        : 1.0.42
    fastprogress  : 0.1.18
    torch         : 1.0.0
    nvidia driver : 410.72
    torch cuda    : 10.0.130 / is available
    torch cudnn   : 7401 / is enabled
    
    === Hardware === 
    nvidia gpus   : 1
    torch devices : 1
      - gpu0      : 7611MB | Tesla P4
    
    === Environment === 
    platform      : Linux-4.9.0-8-amd64-x86_64-with-debian-9.7
    distro        : #1 SMP Debian 4.9.130-2 (2018-10-27)
    conda env     : base
    python        : /opt/anaconda3/bin/python
    sys.path      : /home/jupyter/tutorials/fastai/course-v3/nbs/dl1
    /opt/anaconda3/lib/python37.zip
    /opt/anaconda3/lib/python3.7
    /opt/anaconda3/lib/python3.7/lib-dynload
    
    /opt/anaconda3/lib/python3.7/site-packages
    /opt/anaconda3/lib/python3.7/site-packages/IPython/extensions
    /home/jupyter/.ipython
    

    To Reproduce

    1. Create a new instance via GCP instructions
    2. gcloud compute ssh --zone=$ZONE jupyter@$INSTANCE_NAME -- -L 8080:localhost:8080
    3. Point browser to localhost:8080
    4. Open lesson2-download.ipynt
    5. Run all cells as instructed in Lesson 2 (careful to create dirs and download images properly)
    6. Attempt to instantiate ImageCleaner(ds, idxs, path); note that the current version of lesson2-download.ipynt is missing the necessary path argument
    7. See that the output of the cell is merely the object
    8. Install ipywidgets JupyterLab extension via jupyter labextension install @jupyter-widgets/jupyterlab-manager
    9. Refresh notebook browser window
    10. Attempt to instantiate ImageCleaner(ds, idxs, path) again
    11. See that the widget appears
    12. Interact with the widget and click "Next Batch"
    13. See that the next batch of images does not render, but that cleaned.csv is created

    Expected behavior The next batch of images should appear.

    Thanks.

    opened by amqdn 24
  • Tokenization is time and space inefficient

    Tokenization is time and space inefficient

    Going through the code in transform.py I cannot help but notice several opportunities for optimization for parallel execution. In its current form it would take >4 days to tokenize a 12Gb corpus on a 16-core/32 thread CPU (if it wouldn't run out of memory first as 36G RAM weren't enough). Writing a custom implementation reduced both the time to a little more than 4 hours and memory use by 2-3x. The code is mission-specific and write-once dirt so I'm feeling reluctant to share, but I'd gladly share the gist of it below.

    1. In its current implementation the tokenization process is parallelized using the very inefficient concurrent.futures.ProcessPoolExecutor's map function which creates Future objects where there is no good reason to. These are good for fine-grained control like progress reporting, cancelling etc, but are fairly heavy. In this case we are actually only interested in the returned tokens. multiprocessing.Pool's map should perform considerably better. See this SO post for more details.

    2. A number of new processes are created to tokenize each bach of text. This means a new batch of processes need to be forked, initialize a fresh instance of Spacy Tokenizer and receive a fairly large chunk of text via IPC every few seconds. This seems very inefficient. For small enough batches more time will be spent in fork-IPC-joining than on the actual work being done. Alternatively, there should be a number of long-lived tokenizer worker processes initialized at the beginning with a workload to process and each should process a stream of text with the more efficient Tokenizer.pipe function from Spacy.

    Also, for large batches it is by leaps and bounds more efficient to have each process read its own batch from disk than to have some producer process provide it by IPC. Python's performance for reading large objects through IPC is atrocious (I don't know if this is Python-specific). See this SO post for more context.

    1. The current implementation requires enormous amounts of RAM for relatively small corpora (something like ~1Gb requires >24G RAM). Serializing the tokens and word counts from each tokenizer worker to disk, merging them after tokenization and (if needed) truncate the vocabulary and replace deleted instances with UNK in the tokenized text files is vastly more memory-efficient and can scale to much larger texts.

    I don't really know if the goal of the code as it exists today is to make it easier to "bring your own tokenizer" or to just make it small and understandable, and it is definitely nice to have functions like .from_csv or .from_files that "magically" do everything in one go, for demonstration purposes, but for more serious datasets, maybe breaking the process to more manageable pieces would be a better approach?

    [EDIT: Some demonstration]

    Here is what processor utilization looks like with the current implementation:

    impl1

    Here is what it should look like (running the example script from here)

    impl2

    opened by kliron 20
  • AttributeError: 'Learner' object has no attribute 'min_grad_lr'

    AttributeError: 'Learner' object has no attribute 'min_grad_lr'

    Describe the bug

    Intermittently, I am getting the error AttributeError: 'Learner' object has no attribute 'min_grad_lr' When attempting to do:

    learn.lr_find()
    fig = learn.recorder.plot(suggestion=True, return_fig=True);
    lr = learn.recorder.min_grad_lr 
    

    Provide your installation details

    === Software === 
    python        : 3.6.5
    fastai        : 1.0.46
    fastprogress  : 0.1.20
    torch         : 1.0.0
    nvidia driver : 410.79
    torch cuda    : 10.0.130 / is available
    torch cudnn   : 7401 / is enabled
    
    === Hardware === 
    nvidia gpus   : 1
    torch devices : 1
      - gpu0      : 11441MB | Tesla K80
    
    === Environment === 
    platform      : Linux-4.14.97-74.72.amzn1.x86_64-x86_64-with-glibc2.9
    distro        : #1 SMP Tue Feb 5 20:59:30 UTC 2019
    conda env     : pytorch_p36
    python        : /home/ec2-user/anaconda3/envs/pytorch_p36/bin/python
    sys.path      : 
    /home/ec2-user/src/cntk/bindings/python
    /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python36.zip
    /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6
    /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/lib-dynload
    /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages
    /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/IPython/extensions
    /home/ec2-user/.ipython
    

    To Reproduce This error is happening while training inside a docker container (local-notebook SageMaker training). It seems that I can't reproduce this error directly inside a notebook environment. I've also tried lr_find(learn), but I assume that's the same thing.

    opened by austinmw 20
  • Save load

    Save load

    The export and load_learner methods of Learner were only working when a gpu with cuda was available, so there were no possibility to export a model and then load it on a cpu only device. You can now do that by specifiying device='cpu' when calling load_learner

    opened by pouannes 20
  • Support PyTorch 1.8, TorchVision 0.9.0 and TorchAduio 0.8.0

    Support PyTorch 1.8, TorchVision 0.9.0 and TorchAduio 0.8.0

    I know there is probably some testing that needs to happen, and that you devs are probably already aware of it, but PyTorch 1.8, TorchVision 0.9.0 and TorchAduio 0.8.0 were released two days ago so support for these in the next FastAI release would be nice.

    https://github.com/pytorch/pytorch/releases/tag/v1.8.0

    opened by DavidSpek 19
  • AttributeError: module 'sklearn.metrics._dist_metrics' has no attribute 'DistanceMetric32'

    AttributeError: module 'sklearn.metrics._dist_metrics' has no attribute 'DistanceMetric32'

    This was removed post scikit-learn version 1.1.0 I believe.

    Installing scikit-learn 1.1.0 fixed this issue for me when trying the first line import of the vision tutorial.

    from fastai.vision.all import *

    opened by talentoscope 0
  • Gradio unable to render output properly in Jupyter Notebook (fastai uses np.int but Gradio does not)

    Gradio unable to render output properly in Jupyter Notebook (fastai uses np.int but Gradio does not)

    Describe the Bug

    In creating a simple image classifier, there appears to be a bug when trying to render the output onto a Jupyter Notebook. Specifically, here's the output issue that I experience:

    AttributeError: module 'numpy' has no attribute 'int'

    The code deploys properly on Huggingface Spaces, but I get an error when the output is rendered on Jupyter Notebook. How can this be resolved so that I can actually create and test the output locally in Jupyter Notebook before deploying it more broadly on Huggingface Spaces? I've posted the same question under issues in the Gradio repo, but was suggested to post in the fastai repo

    Reproduction

    Here's the full source code on GitHub: https://github.com/emptytank/invoice_classifier/blob/main/invoice_classifier.ipynb

    Here's the Hugging face spaces: https://huggingface.co/spaces/emptytank/invoice_classifier

    Here's the link to the issue posted in the Gradio GitHub repo: https://github.com/gradio-app/gradio/issues/2908

    Logs

    Traceback (most recent call last): File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\gradio\routes.py", line 321, in run_predict output = await app.blocks.process_api( File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\gradio\blocks.py", line 1015, in process_api result = await self.call_function(fn_index, inputs, iterator, request) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\gradio\blocks.py", line 856, in call_function prediction = await anyio.to_thread.run_sync( File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\anyio_backends_asyncio.py", line 867, in run result = context.run(func, *args) File "C:\Users\tangmi2\AppData\Local\Temp\ipykernel_22992\830469006.py", line 6, in predict pred, pred_idx, probs = learn.predict(img) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastai\learner.py", line 313, in predict inp,preds,_,dec_preds = self.get_preds(dl=dl, with_input=True, with_decoded=True) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastai\learner.py", line 300, in get_preds self._do_epoch_validate(dl=dl) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastai\learner.py", line 236, in _do_epoch_validate with torch.no_grad(): self._with_events(self.all_batches, 'validate', CancelValidException) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastai\learner.py", line 193, in with_events try: self(f'before{event_type}'); f() File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastai\learner.py", line 199, in all_batches for o in enumerate(self.dl): self.one_batch(*o) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastai\data\load.py", line 127, in iter for b in _loadersself.fake_l.num_workers==0: File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\torch\utils\data\dataloader.py", line 628, in next data = self._next_data() File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\torch\utils\data\dataloader.py", line 671, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\torch\utils\data_utils\fetch.py", line 43, in fetch data = next(self.dataset_iter) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastai\data\load.py", line 138, in create_batches yield from map(self.do_batch, self.chunkify(res)) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastcore\basics.py", line 230, in chunked res = list(itertools.islice(it, chunk_sz)) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastai\data\load.py", line 153, in do_item try: return self.after_item(self.create_item(s)) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastcore\transform.py", line 208, in call def call(self, o): return compose_tfms(o, tfms=self.fs, split_idx=self.split_idx) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastcore\transform.py", line 158, in compose_tfms x = f(x, **kwargs) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastcore\transform.py", line 81, in call def call(self, x, **kwargs): return self._call('encodes', x, **kwargs) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastcore\transform.py", line 91, in _call return self.do_call(getattr(self, fn), x, **kwargs) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastcore\transform.py", line 98, in do_call res = tuple(self.do_call(f, x, **kwargs) for x in x) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastcore\transform.py", line 98, in res = tuple(self.do_call(f, x, **kwargs) for x in x) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastcore\transform.py", line 97, in _do_call return retain_type(f(x, **kwargs), x, ret) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastcore\dispatch.py", line 120, in call return f(*args, **kwargs) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastai\vision\core.py", line 236, in encodes def encodes(self, o:PILBase): return o._tensor_cls(image2tensor(o)) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastai\vision\core.py", line 106, in image2tensor res = tensor(img) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastai\torch_core.py", line 154, in tensor else _array2tensor(array(x), **kwargs)) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\fastai\torch_core.py", line 136, in array2tensor if sys.platform == "win32" and x.dtype==np.int: x = x.astype(np.int64) File "c:\Users\tangmi2\GitHub\invoice_classifier\venv-invoice\lib\site-packages\numpy_init.py", line 284, in getattr raise AttributeError("module {!r} has no attribute "

    System Info

    Gradio Version: gradio==3.15.0 Operating System: Windows 10 Enterprise 64-bit Browser: Microsoft Edge

    opened by emptytank 0
  • Add option to (optionally) save confusion matrix plot

    Add option to (optionally) save confusion matrix plot

    This PR adds an optional parameter save_plot to the plot_confusion_matrix function which allows offline analysis of models and their confusion matrix across several tuning or iterations.

    opened by aspiringastro 1
  • Fastai docs not available as Colab notebooks any more?

    Fastai docs not available as Colab notebooks any more?

    The https://docs.fast.ai/ website says that every page of the docs is available as a Colab notebook. But I couldn't find the Colab link on any of the pages. Are the Colab notebooks not available any more?

    opened by amoghvaishampayan 0
  • Multi-GPU training CNN hangs when using TensorboardCallback

    Multi-GPU training CNN hangs when using TensorboardCallback

    Please confirm you have the latest versions of fastai, fastcore, and nbdev prior to reporting a bug (delete one): YES

    Describe the bug Hi! I have recently been experimenting with TensorBoard and fastai, especially as a means of tracking metrics in real-time with ClearML.

    I've noticed that the steps to train a CNN across GPUs using Accelerate works fine, but the moment you introduce a TensorBoardCallback in training, it hangs indefinitely without any errors. Training the CNN on a multi-GPU instance but without distributed training (ie only using one of the GPUs and without including Accelerate) works perfectly fine too.

    I mentioned this issue on the Accelerate repo: https://github.com/huggingface/accelerate/issues/900 And @muellerzr hypothesised that it's because TensorBoard can only run as a main process only which fastai doesn't guard. (Thanks, Zachary!)

    To Reproduce Steps to reproduce the behavior:

    1. Spin up a notebook session with multiple GPUs. Here is the information on all of my settings here:
      Accelerate version: 0.15.0
      OS: CentOS 7 (running JupyterLab through Docker with a CUDA-configured container
      Python version: 3.9.12
      numpy version: 1.23.5
      ClearML version: 1.8.2
      torch version:
      * torch==1.12.1+cu113
      * torchaudio==0.12.1+cu113
      * torchvision==0.13.1+cu113
      fastai version: 2.7.10
      protobuf version: 3.19.6 (because of tensorboard issues)
      accelerate configuration:
        * command_file: null
        * commands: null
        * compute_environment: LOCAL_MACHINE
        * deepspeed_config: {}
        * distributed_type: MULTI_GPU
        * downcast_bf16: 'no'
        * dynamo_backend: 'NO'
        * fsdp_config: {}
        * gpu_ids: all
        * machine_rank: 0
        * main_process_ip: null
        * main_process_port: null
        * main_training_function: main
        * megatron_lm_config: {}
        * mixed_precision: 'no'
        * num_machines: 1
        * num_processes: 4
        * rdzv_backend: static
        * same_network: true
        * tpu_name: null
        * tpu_zone: null
        * use_cpu: false
      CUDA version: 11.3
      EC2 instance type: p3.8xlarge
      
    2. Take the following base script:
      from fastai.vision.all import *
      
      from accelerate import notebook_launcher
      from fastai.distributed import *
      from clearml import Task, Logger
      from fastai.callback.tensorboard import TensorBoardCallback
      
      path = untar_data(URLs.PETS)/'images'
      
      # Not included - the credentials and host information for ClearML set as environment variables
      task = Task.init(project_name='Test Project', task_name='clearml-fastai-integration-demo-4')
      logger = Logger.current_logger()
      
      path = untar_data(URLs.PETS)/'images'
      task = Task.init(project_name='Listing Image Tagger', task_name='clearml-fastai-integration-demo-4')
      logger = Logger.current_logger()
      
      def train():
          print('Creating DataLoader')
          dls = ImageDataLoaders.from_name_func(
              path, get_image_files(path), valid_pct=0.2,
              label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))
          print('Creating learner')
          learn = vision_learner(dls, resnet34, metrics=error_rate).to_fp16()
          print('Outside learn.distrib_ctx')
          with learn.distrib_ctx(in_notebook=True, sync_bn=False):
              print('Inside learn.distrib_ctx')
              # learn.fine_tune(2, cbs=[TensorBoardCallback()])
              learn.fine_tune(2)
      
      notebook_launcher(train, num_processes=4)
      
    3. Paste it into a cell in a Jupyter Lab session
    4. Uncomment one of the learn.fine_tune lines and comment the other (eg try first without any callbacks)
    5. Run the cell
    6. Swap the uncommented and commented learn.fine_tune lines (eg now try with TensorBoardCallback)

    Expected behavior I expect the model to train across GPUs with a TensorBoardCallback enabled.

    Error with full stack trace Here's the output when there is no callback:

    Launching training on 4 GPUs.
    Creating DataLoader
    Creating DataLoader
    Creating DataLoader
    Creating DataLoader
    Creating learner
    /root/miniconda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning:
    
    The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
    
    /root/miniconda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning:
    
    Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet34_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet34_Weights.DEFAULT` to get the most up-to-date weights.
    
    Creating learner
    /root/miniconda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning:
    
    The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
    
    /root/miniconda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning:
    
    Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet34_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet34_Weights.DEFAULT` to get the most up-to-date weights.
    
    2022-12-02 04:02:37,332 - clearml.model - INFO - Selected model id: b212faeef29d4a54861d19d9bd2a3bde
    Creating learner
    /root/miniconda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning:
    
    The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
    
    /root/miniconda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning:
    
    Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet34_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet34_Weights.DEFAULT` to get the most up-to-date weights.
    
    Creating learner
    /root/miniconda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning:
    
    The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
    
    /root/miniconda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning:
    
    Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet34_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet34_Weights.DEFAULT` to get the most up-to-date weights.
    
    2022-12-02 04:02:37,632 - clearml.model - INFO - Selected model id: b212faeef29d4a54861d19d9bd2a3bde
    2022-12-02 04:02:38,745 - clearml.model - INFO - Selected model id: b212faeef29d4a54861d19d9bd2a3bde
    Outside learn.distrib_ctx
    2022-12-02 04:02:39,055 - clearml.model - INFO - Selected model id: b212faeef29d4a54861d19d9bd2a3bde
    [W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
    Outside learn.distrib_ctx
    [W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
    Outside learn.distrib_ctx
    Outside learn.distrib_ctx
    [W socket.cpp:401] [c10d] The server socket cannot be initialized on [::]:29500 (errno: 97 - Address family not supported by protocol).
    [W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
    [W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
    [W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
    [W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
    Training Learner...
    Inside learn.distrib_ctxInside learn.distrib_ctxInside learn.distrib_ctxInside learn.distrib_ctx
    
    
    
    [W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
    [W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
    
     0.00% [0/1 00:00<?]
    epoch	train_loss	valid_loss	error_rate	time
    
     39.13% [9/23 00:06<00:09 0.0596]
    

    Here's the output when using TensorBoardCallback as a callback:

    Launching training on 4 GPUs.
    Creating DataLoader
    Creating DataLoader
    Creating DataLoader
    Creating DataLoader
    Creating learner
    /root/miniconda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning:
    
    The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
    
    /root/miniconda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning:
    
    Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet34_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet34_Weights.DEFAULT` to get the most up-to-date weights.
    
    Creating learner
    /root/miniconda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning:
    
    The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
    
    /root/miniconda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning:
    
    Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet34_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet34_Weights.DEFAULT` to get the most up-to-date weights.
    
    Creating learner
    /root/miniconda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning:
    
    The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
    
    /root/miniconda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning:
    
    Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet34_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet34_Weights.DEFAULT` to get the most up-to-date weights.
    
    Creating learner
    /root/miniconda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning:
    
    The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
    
    /root/miniconda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning:
    
    Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet34_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet34_Weights.DEFAULT` to get the most up-to-date weights.
    
    2022-12-02 03:58:35,509 - clearml.model - INFO - Selected model id: b212faeef29d4a54861d19d9bd2a3bde
    2022-12-02 03:58:35,608 - clearml.model - INFO - Selected model id: b212faeef29d4a54861d19d9bd2a3bde
    2022-12-02 03:58:35,646 - clearml.model - INFO - Selected model id: b212faeef29d4a54861d19d9bd2a3bde
    2022-12-02 03:58:35,712 - clearml.model - INFO - Selected model id: b212faeef29d4a54861d19d9bd2a3bde
    Outside learn.distrib_ctx
    Outside learn.distrib_ctx
    [W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
    [W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
    Outside learn.distrib_ctx
    Outside learn.distrib_ctx
    [W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
    [W socket.cpp:401] [c10d] The server socket cannot be initialized on [::]:29500 (errno: 97 - Address family not supported by protocol).
    [W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
    [W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
    [W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
    Training Learner...
    Inside learn.distrib_ctx
    Inside learn.distrib_ctxInside learn.distrib_ctx
    
    Inside learn.distrib_ctx
    [W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
    [W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
    

    The output does not progress from here.

    Additional context To summarise: | Using Accelerate to multi-process training? | Callbacks enabled? | Result | |:---|:--:|---:| | No | No | Runs successfully | | No | Yes | Runs successfully | | Yes | No | Runs successfully | | Yes | Yes | Hangs indefinitely |

    opened by ntdesilv 0
Releases(2.7.10)
  • 2.7.10(Nov 2, 2022)

    New Features

    • Add torch save and load kwargs (#3831), thanks to @JonathanGrant
      • This lets us do nice things like set pickle_module to cloudpickle
    • PyTorch 1.13 Compatibility (#3828), thanks to @warner-benjamin
    • Recursive copying of attribute dictionaries for TensorImage subclass (#3822), thanks to @restlessronin
    • OptimWrapper sets same param groups as Optimizer (#3821), thanks to @warner-benjamin
      • This PR harmonizes the default parameter group setting between OptimWrapper and Optimizer by modifying OptimWrapper to match Optimizer's logic.
    • Support normalization of 1-channel images in unet (#3820), thanks to @marib00
    • Add img_cls param to ImageDataLoaders (#3808), thanks to @tcapelle
      • This is particularly useful for passing PILImageBW for MNIST.
    • Add support for kwargs to tensor() when arg is an ndarray (#3797), thanks to @SaadAhmedGit
    • Add latest TorchVision models on fastai (#3791), thanks to @datumbox
    • Option to preserve filenames in download_images (#2983), thanks to @mess-lelouch

    Bugs Squashed

    • get_text_classifier fails with custom AWS_LSTM (#3817)
    • revert auto-enable of mac mps due to pytorch limitations (#3769)
    • Workaround for performance bug in PyTorch with subclassed tensors (#3683), thanks to @warner-benjamin
    Source code(tar.gz)
    Source code(zip)
  • 2.7.8(Aug 2, 2022)

  • 2.7.6(Jul 7, 2022)

  • 2.7.5(Jul 4, 2022)

  • 2.7.4(Jun 28, 2022)

  • 2.7.2(Jun 19, 2022)

  • 2.7.1(Jun 19, 2022)

  • 2.7.0(Jun 19, 2022)

    Breaking changes

    • Distributed training now uses Hugging Face Accelerate, rather than fastai's launcher. Distributed training is now supported in a notebook -- see this tutorial for details

    New Features

    • resize_images creates folder structure at dest when recurse=True (#3692)
    • Integrate nested callable and getcallable (#3691), thanks to @muellerzr
    • workaround pytorch subclass performance bug (#3682)
    • Torch 1.12.0 compatibility (#3659), thanks to @josiahls
    • Integrate Accelerate into fastai (#3646), thanks to @muellerzr
    • New Callback event, before and after backward (#3644), thanks to @muellerzr
    • Let optimizer use built torch opt (#3642), thanks to @muellerzr
    • Support PyTorch Dataloaders with DistributedDL (#3637), thanks to @tmabraham
    • Add channels_last cb (#3634), thanks to @tcapelle
    • support all timm kwargs (#3631)
    • send self.loss_func to device if it is an instance on nn.Module (#3395), thanks to @arampacha

    Bugs Squashed

    • Solve hanging load_model and let LRFind be ran in a distributed setup (#3689), thanks to @muellerzr
    • pytorch subclass functions fail if no positional args (#3687)
    • Workaround for performance bug in PyTorch with subclassed tensors (#3683), thanks to @warner-benjamin
    • Fix Tokenizer.get_lengths (#3667), thanks to @karotchykau
    • load_learner with cpu=False doesn't respect the current cuda device if model exported on another; fixes #3656 (#3657), thanks to @ohmeow
    • [Bugfix] Fix smoothloss on distributed (#3643), thanks to @muellerzr
    • WandbCallback Error: "Tensors must be CUDA and dense" on distributed training (#3291)
    • vision tutorial failed at learner.fine_tune(1) (#3283)
    Source code(tar.gz)
    Source code(zip)
  • 2.6.3(May 1, 2022)

  • 2.6.2(Apr 30, 2022)

  • 2.6.1(Apr 30, 2022)

  • 2.6.0(Apr 24, 2022)

  • 2.5.6(Apr 2, 2022)

  • 2.5.5(Mar 25, 2022)

  • 2.5.4(Mar 25, 2022)

    New Features

    • Support py3.10 annotations (#3601)

    Bugs Squashed

    • Fix pin_memory=True breaking (batch) Transforms (#3606), thanks to @johan12345
    • Add Python 3.9 to setup.py for PyPI (#3604), thanks to @nzw0301
    • removes add_vert from get_grid calls (#3593), thanks to @kevinbird15
    • Making loss_not_reduced work with DiceLoss (#3583), thanks to @hiromis
    • Fix bug in URLs.path() in 04_data.external (#3582), thanks to @malligaraj
    • Custom name for metrics (#3573), thanks to @bdsaglam
    • Update import for show_install (#3568), thanks to @fr1ll
    • Fix Classification Interpretation (#3563), thanks to @warner-benjamin
    • Updates Interpretation class to be memory efficient (#3558), thanks to @warner-benjamin
    • Learner.show_results uses passed dataloader via dl_idx or dl arguments (#3554), thanks to @warner-benjamin
    • Fix learn.export pickle error with MixedPrecision Callback (#3544), thanks to @warner-benjamin
    • Fix concurrent LRFinder instances overwriting each other by using tempfile (#3528), thanks to @warner-benjamin
    • Fix _get_shapes to work with dictionaries (#3520), thanks to @ohmeow
    • Fix torch version checks, remove clip_grad_norm check (#3518), thanks to @warner-benjamin
    • Fix nested tensors predictions compatibility with fp16 (#3516), thanks to @tcapelle
    • Learning rate passed via OptimWrapper not updated in Learner (#3337)
    • Different results after running lr_find() at different times (#3295)
    • lr_find() may fail if run in parallel from the same directory (#3240)
    Source code(tar.gz)
    Source code(zip)
  • 2.5.3(Oct 23, 2021)

  • 2.5.1(Aug 11, 2021)

  • 2.5.0(Aug 6, 2021)

    Breaking changes

    • config.yml has been renamed to config.ini, and is now in ConfigParser format instead of YAML
    • THe _path suffixes in config.ini have been removed

    Bugs Squashed

    • Training with learn.to_fp16() fails with PyTorch 1.9 / Cuda 11.4 (#3438)
    • pandas 1.3.0 breaks add_elapsed_times (#3431)
    Source code(tar.gz)
    Source code(zip)
  • 2.4.1(Jul 14, 2021)

  • 2.4(Jun 16, 2021)

    Breaking changes

    • QRNN module removed, due to incompatibility with PyTorch 1.9, and lack of utilization of QRNN in the deep learning community. QRNN was our only module that wasn't pure Python, so with this change fastai is now a pure Python package.

    New Features

    • Support for PyTorch 1.9
    • Improved LR Suggestions (#3377), thanks to @muellerzr
    • SaveModelCallback every nth epoch (#3375), thanks to @KeremTurgutlu
    • Send self.loss_func to device if it is an instance of nn.Module (#3395), thanks to @arampacha
    • Batch support for more than one image (#3339)
    • Changable tfmdlists for TransformBlock, Datasets, DataBlock (#3327)

    Bugs Squashed

    Source code(tar.gz)
    Source code(zip)
  • 2.3.2(Jun 16, 2021)

    New Features

    • send self.loss_func to device if it is an instance of nn.Module (#3395), thanks to @arampacha
    • Improved LR Suggestions (#3377), thanks to @muellerzr
    • SaveModelCallback every nth epoch (#3375), thanks to @KeremTurgutlu
    • Batch support for more than one image (#3339)
    • Changable tfmdlists for TransformBlock, Datasets, DataBlock (#3327)

    Bugs Squashed

    Source code(tar.gz)
    Source code(zip)
  • 2.3.1(May 4, 2021)

    New Features

    • Add support for pytorch 1.8 (#3349)
    • Add support for spacy3 (#3348)
    • Add support for Windows. Big thanks to Microsoft for many contributions to get this working
    • Timedistributed layer and Image Sequence Tutorial (#3124), thanks to @tcapelle
    • Add interactive run logging to AzureMLCallback (#3341), thanks to @yijinlee
    • Batch support for more than one image (#3339)
    • Have interp use ds_idx, add tests (#3332), thanks to @muellerzr
    • Automatically have fastai determine the right device, even with torch DataLoaders (#3330), thanks to @muellerzr
    • Add at_end feature to SaveModelCallback (#3296), thanks to @tmabraham
    • Improve inplace params in Tabular's new and allow for new and test_dl to be in place (#3292), thanks to @muellerzr
    • Update VSCode & Codespaces dev container (#3280), thanks to @bamurtaugh
    • Add max_scale param to RandomResizedCrop(GPU) (#3252), thanks to @kai-tub
    • Increase testing granularity for speedup (#3242), thanks to @ddobrinskiy

    Bugs Squashed

    • Make TTA turn shuffle and drop_last off when using ds_idx (#3347), thanks to @muellerzr
    • Add order to TrackerCallback derived classes (#3346), thanks to @muellerzr
    • Prevent schedule from crashing close to the end of training (#3335), thanks to @Lewington-pitsos
    • Fix ability to use raw pytorch DataLoaders (#3328), thanks to @hamelsmu
    • Fix PixelShuffle_icnr weight (#3322), thanks to @pratX
    • Creation of new DataLoader in Learner.get_preds has wrong keyword (#3316), thanks to @tcapelle
    • Correct layers order in tabular learner (#3314), thanks to @gradientsky
    • Fix vmin parameter default (#3305), thanks to @tcapelle
    • Ensure call to one_batch places data on the right device (#3298), thanks to @tcapelle
    • Fix Cutmix Augmentation (#3259), thanks to @MrRobot2211
    • Fix custom tokenizers for DataLoaders (#3256), thanks to @iskode
    • fix error setting 'tok_tfm' parameter in TextDataloaders.from_folder
    • Fix lighting augmentation (#3255), thanks to @kai-tub
    • Fix CUDA variable serialization (#3253), thanks to @mszhanyi
    • change batch tfms to have the correct dimensionality (#3251), thanks to @trdvangraft
    • Ensure add_datepart adds elapsed as numeric column (#3230), thanks to @aberres
    Source code(tar.gz)
    Source code(zip)
  • 2.3.0(Mar 31, 2021)

    Breaking Changes

    • fix optimwrapper to work with param_groups (#3241), thanks to @tmabraham
      • OptimWrapper now has a different constructor signature, which makes it easier to wrap PyTorch optimizers

    New Features

    • Support discriminative learning with OptimWrapper (#2829)

    Bugs Squashed

    • Updated to support adding transforms to multiple dataloaders (#3268), thanks to @marii-moe
      • This fixes an issue in 2.2.7 which resulted in incorrect validation metrics when using Normalization
    Source code(tar.gz)
    Source code(zip)
  • 2.2.7(Feb 22, 2021)

  • 2.2.6(Feb 21, 2021)

  • 2.2.5(Feb 8, 2021)

    New Features

    • Enhancement: Let TextDataLoaders take in a custom tok_text_col (#3208), thanks to @muellerzr
    • Changed dataloaders arguments to have consistent overrides (#3178), thanks to @marii-moe
    • Better support for iterable datasets (#3173), thanks to @jcaw

    Bugs Squashed

    • BrokenProcessPool in download_images() on Windows (#3196)
    • error on predict() or using interp with resnet and MixUp (#3180)
    • Fix 'cat' attribute with pandas dataframe: AttributeError: Can only use .cat accessor with a 'category' dtype (#3165), thanks to @dreamflasher
    • cont_cat_split does not support pandas types (#3156)
    • DataBlock.dataloaders does not support the advertised "shuffle" argument (#3133)
    Source code(tar.gz)
    Source code(zip)
  • 2.2.3(Jan 12, 2021)

  • 2.2.2(Jan 7, 2021)

  • 2.2.0(Jan 6, 2021)

    Breaking Changes

    • Promote NativeMixedPrecision to default MixedPrecision (and similar for Learner.to_fp16); old MixedPrecision is now called NonNativeMixedPrecision (#3127)
      • Use the new GradientClip callback instead of the clip parameter to use gradient clipping
    • Adding a Callback which has the same name as an attribute no longer raises an exception (#3109)
    • RNN training now requires RNNCallback, but does not require RNNRegularizer; out and raw_out have moved to RNNRegularizer (#3108)
      • Call rnn_cbs to get all callbacks needed for RNN training, optionally with regularization
    • replace callback run_after with order; do not run after cbs on exception (#3101)

    New Features

    • Add GradientClip callback (#3107)
    • Make Flatten cast to TensorBase to simplify type compatibility (#3106)
    • make flattened metrics compatible with all tensor subclasses (#3105)
    • New class method TensorBase.register_func to register types for __torch_function__ (#3097)
    • new dynamic flag for controlling dynamic loss scaling in NativeMixedPrecision (#3096)
    • remove need to call to_native_fp32 before predict; set skipped in NativeMixedPrecision after NaN from dynamic loss scaling (#3095)
    • make native fp16 extensible with callbacks (#3094)
    • Calculate correct nf in create_head based on concat_pool (#3115) thanks to @muellerzr
    Source code(tar.gz)
    Source code(zip)
  • 2.1.10(Dec 22, 2020)

The code release of paper Low-Light Image Enhancement with Normalizing Flow

[AAAI 2022] Low-Light Image Enhancement with Normalizing Flow Paper | Project Page Low-Light Image Enhancement with Normalizing Flow Yufei Wang, Renji

Yufei Wang 176 Jan 06, 2023
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data Au

14 Nov 28, 2022
Reimplementation of Learning Mesh-based Simulation With Graph Networks

Pytorch Implementation of Learning Mesh-based Simulation With Graph Networks This is the unofficial implementation of the approach described in the pa

Jingwei Xu 33 Dec 14, 2022
AlphaBot2 Pi Core software for interfacing with the various components.

AlphaBot2-Pi-Core AlphaBot2 Pi Core software for interfacing with the various components. This project is currently a W.I.P. I will update this readme

KyleDev 1 Feb 13, 2022
DEEPAGÉ: Answering Questions in Portuguese about the Brazilian Environment

DEEPAGÉ: Answering Questions in Portuguese about the Brazilian Environment This repository is related to the paper DEEPAGÉ: Answering Questions in Por

0 Dec 10, 2021
PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision.

PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision @misc{CV2018, author = {Donny You ( Donny You 40 Sep 14, 2022

MultiTaskLearning - Multi Task Learning for 3D segmentation

Multi Task Learning for 3D segmentation Perception stack of an Autonomous Drivin

2 Sep 22, 2022
SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning

Datasets | Website | Raw Data | OpenReview SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning Christopher

67 Dec 17, 2022
Aws-machine-learning-university-accelerated-tab - Machine Learning University: Accelerated Tabular Data Class

Machine Learning University: Accelerated Tabular Data Class This repository contains slides, notebooks, and datasets for the Machine Learning Universi

AWS Samples 916 Dec 23, 2022
I will implement Fastai in each projects present in this repository.

DEEP LEARNING FOR CODERS WITH FASTAI AND PYTORCH The repository contains a list of the projects which I have worked on while reading the book Deep Lea

Thinam Tamang 43 Dec 20, 2022
iBOT: Image BERT Pre-Training with Online Tokenizer

Image BERT Pre-Training with iBOT Official PyTorch implementation and pretrained models for paper iBOT: Image BERT Pre-Training with Online Tokenizer.

Bytedance Inc. 435 Jan 06, 2023
Multi-layer convolutional LSTM with Pytorch

Convolution_LSTM_pytorch Thanks for your attention. I haven't got time to maintain this repo for a long time. I recommend this repo which provides an

Zijie Zhuang 733 Dec 30, 2022
Spectral Tensor Train Parameterization of Deep Learning Layers

Spectral Tensor Train Parameterization of Deep Learning Layers This repository is the official implementation of our AISTATS 2021 paper titled "Spectr

Anton Obukhov 12 Oct 23, 2022
Project ArXiv Citation Network

Project ArXiv Citation Network Overview This project involved the analysis of the ArXiv citation network. Usage The complete code of this project is i

Dennis Núñez-Fernández 5 Oct 20, 2022
Original code for "Zero-Shot Domain Adaptation with a Physics Prior"

Zero-Shot Domain Adaptation with a Physics Prior [arXiv] [sup. material] - ICCV 2021 Oral paper, by Attila Lengyel, Sourav Garg, Michael Milford and J

Attila Lengyel 40 Dec 21, 2022
Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation The code of: Cross-Image Region Mining with Region Proto

LiuWeide 16 Nov 26, 2022
GeneralOCR is open source Optical Character Recognition based on PyTorch.

Introduction GeneralOCR is open source Optical Character Recognition based on PyTorch. It makes a fidelity and useful tool to implement SOTA models on

57 Dec 29, 2022
Distributed Asynchronous Hyperparameter Optimization in Python

Hyperopt: Distributed Hyperparameter Optimization Hyperopt is a Python library for serial and parallel optimization over awkward search spaces, which

6.5k Jan 01, 2023
YOLOX-CondInst - Implement CondInst which is a instances segmentation method on YOLOX

YOLOX CondInst -- YOLOX 实例分割 前言 本项目是自己学习实例分割时,复现的代码. 通过自己编程,让自己对实例分割有更进一步的了解。 若想

DDGRCF 16 Nov 18, 2022
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond

GCNet for Object Detection By Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu. This repo is a official implementation of "GCNet: Non-local Networ

Jerry Jiarui XU 1.1k Dec 29, 2022