Plato: A New Framework for Federated Learning Research

Related tags

Deep Learningplato
Overview

Plato: A New Framework for Federated Learning Research

Welcome to Plato, a new software framework to facilitate scalable federated learning research.

Installing Plato with PyTorch

To install Plato, first clone this repository to the desired directory.

The Plato developers recommend using Miniconda to manage Python packages. Before using Plato, first install Miniconda, update your conda environment, and then create a new conda environment with Python 3.8 using the command:

$ conda update conda
$ conda create -n federated python=3.8
$ conda activate federated

where federated is the preferred name of your new environment.

Update any packages, if necessary by typing y to proceed.

The next step is to install the required Python packages. PyTorch should be installed following the advice of its getting started website. The typical command in Linux with CUDA GPU support, for example, would be:

$ conda install pytorch torchvision cudatoolkit=11.1 -c pytorch

The CUDA version, used in the command above, can be obtained on Ubuntu Linux systems by using the command:

nvidia-smi

In macOS (without GPU support), the typical command would be:

$ conda install pytorch torchvision -c pytorch

We will need to install several packages using pip as well:

$ pip install -r requirements.txt

If you use Visual Studio Code, it is possible to use yapf to reformat the code every time it is saved by adding the following settings to ..vscode/settings.json:

"python.formatting.provider": "yapf", 
"editor.formatOnSave": true

In general, the following is the recommended starting point for .vscode/settings.json:

"python.linting.enabled": true,
"python.linting.pylintEnabled": true,
"python.formatting.provider": "yapf", 
"editor.formatOnSave": true,
"python.linting.pylintArgs": [
    "--init-hook",
    "import sys; sys.path.append('/absolute/path/to/project/home/directory')"
],
"workbench.editor.enablePreview": false

It goes without saying that /absolute/path/to/project/home/directory should be replaced with the actual path in the specific development environment.

Tip: When working in Visual Studio Code as the development environment, one of the project developer's colour theme favourites is called Bluloco, both of its light and dark variants are excellent and very thoughtfully designed. The Pylance extension is also strongly recommended, which represents Microsoft's modern language server for Python.

Running Plato in a Docker container

Most of the codebase in Plato is designed to be framework-agnostic, so that it is relatively straightfoward to use Plato with a variety of deep learning frameworks beyond PyTorch, which is the default framwork it is using. One example of such deep learning frameworks that Plato currently supports is MindSpore. Due to the wide variety of tricks that need to be followed correctly for running Plato without Docker, it is strongly recommended to run Plato in a Docker container, on either a CPU-only or a GPU-enabled server.

To build such a Docker image, use the provided Dockerfile for PyTorch and Dockerfile_MindSpore for MindSpore:

docker build -t plato -f Dockerfile .

or:

docker build -t plato -f Dockerfile_MindSpore .

To run the docker image that was just built, use the command:

./dockerrun.sh

Or if GPUs are available, use the command:

./dockerrun_gpu.sh

To remove all the containers after they are run, use the command:

docker rm $(docker ps -a -q)

To remove the plato Docker image, use the command:

docker rmi plato

On Ubuntu Linux, you may need to add sudo before these docker commands.

The provided Dockerfile helps to build a Docker image running Ubuntu 20.04, with a virtual environment called federated pre-configured to support PyTorch 1.8.1 and Python 3.8. If MindSpore support is needed, the provided Dockerfile_MindSpore contains a pre-configured environment, also called federated, that supports MindSpore 1.1.1 and Python 3.7.5 (which is the Python version that MindSpore requires). Both Dockerfiles have GPU support enabled. Once an image is built and a Docker container is running, one can use Visual Studio Code to connect to it and start development within the container.

Running Plato

To start a federated learning training workload, run run from the repository's root directory. For example:

./run --config=configs/MNIST/fedavg_lenet5.yml
  • --config (-c): the path to the configuration file to be used. The default is config.yml in the project's home directory.
  • --log (-l): the level of logging information to be written to the console. Possible values are critical, error, warn, info, and debug, and the default is info.

Plato uses the YAML format for its configuration files to manage the runtime configuration parameters. Example configuration files have been provided in the configs directory.

Plato uses wandb to produce and collect logs in the cloud. If this is not needed, run the command wandb offline before running Plato.

If there are issues in the code that prevented it from running to completion, there could be running processes from previous runs. Use the command pkill python to terminate them so that there will not be CUDA errors in the upcoming run.

Installing YOLOv5 as a Python package

If object detection using the YOLOv5 model and any of the COCO datasets is needed, it is required to install YOLOv5 as a Python package first:

cd packages/yolov5
pip install .

Plotting Runtime Results

If the configuration file contains a results section, the selected performance metrics, such as accuracy, will be saved in a .csv file in the results/ directory. By default, the results/ directory is under the path to the used configuration file, but it can be easily changed by modifying Config.result_dir in config.py.

As .csv files, these results can be used however one wishes; an example Python program, called plot.py, plots the necessary figures and saves them as PDF files. To run this program:

python plot.py --config=config.yml
  • --config (-c): the path to the configuration file to be used. The default is config.yml in the project's home directory.

Running Unit Tests

All unit tests are in the tests/ directory. These tests are designed to be standalone and executed separately. For example, the command python lr_schedule_tests.py runs the unit tests for learning rate schedules.

Installing Plato with MindSpore

Though we provided a Dockerfile for building a Docker container that supports MindSpore 1.1, in rare cases it may still be necessary to install Plato with MindSpore in a GPU server running Ubuntu Linux 18.04 (which MindSpore requires). Similar to a PyTorch installation, we need to first create a new environment with Python 3.7.5 (which MindSpore 1.1 requires), and then install the required packages:

conda create -n mindspore python=3.7.5
pip install -r requirements.txt

We should now install MindSpore 1.1 with the following command:

pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.1.1/MindSpore/gpu/ubuntu_x86/cuda-10.1/mindspore_gpu-1.1.1-cp37-cp37m-linux_x86_64.whl

MindSpore may need additional packages that need to be installed if they do not exist:

sudo apt-get install libssl-dev
sudo apt-get install build-essential

If CuDNN has not yet been installed, it needs to be installed with the following commands:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
sudo apt-get install libcudnn8=8.0.5.39-1+cuda10.1

To check the current CuDNN version, the following commands are helpful:

function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep $1; }
function check() { lib_installed $1 && echo "$1 is installed" || echo "ERROR: $1 is NOT installed"; }
check libcudnn

To check if MindSpore is correctly installed on the GPU server, try to import mindspore with a Python interpreter.

Finally, to use trainers and servers based on MindSpore, assign true to use_mindspore in the trainer section of the configuration file. This variable is unassigned by default, and Plato would use PyTorch as its default framework.

Uninstalling Plato

Remove the conda environment used to run Plato first, and then remove the directory containing Plato's git repository.

conda-env remove -n federated
rm -rf plato/

where federated (or mindspore) is the name of the conda environment that Plato runs in.

For more specific documentation on how Plato can be run on GPU cluster environments such as Lambda Labs' GPU cloud or Compute Canada, refer to docs/Running.md.

Technical support

Technical support questions should be directed to the maintainer of this software framework: Baochun Li ([email protected]).

Comments
  • Unifying data transfer with numpy array

    Unifying data transfer with numpy array

    All data transfer are now in numpy array.

    Description

    For model weights, the transfer type is an OrderedDict{name: numpy.nparray} For features, the transfer type is an list[(numpy.nparray, numpy.nparray)], first value is feature while second value is target.

    How has this been tested?

    Tested with the following config:

    'configs/MNIST/fedavg_lenet5_noniid'
    'configs/MNIST/fedavg_lenet5'
    'configs/MNIST/fedprox_lenet5'
    'configs/MNIST/mistnet_lenet5'
    'configs/MNIST/mistnet_pretrain_lenet5'
    

    Please help test for mindspore and Tensorflow. I don't have a proper machine for testing for now.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    opened by hcngac 19
  • Update to support simulation of different client's speed in async mode

    Update to support simulation of different client's speed in async mode

    Description

    In the async mode, most of the clients have a relatively fast speed, so it is sometimes quite hard to test Plato in a scenario where clients have various different speeds. This change allows users to simulate clients' speeds by providing a distribution in the configuration file. The user can also choose to only enable the simulation without providing a specific distribution, and the code would just use a default one.

    Currently, the simulation only supports Zipf and Normal distribution. More distributions can be added in the future.

    How has this been tested?

    • Test 1 with default distribution: Run ./run -c ./configs/MNIST/fedavg_async_lenet5.yml
    • Test 2 with Normal distribution: First update the client configuration in configs/MNIST/fedavg_async_lenet5.yml as below:
    clients:
        type: simple
    
        total_clients: 2
    
        per_round: 2
    
        do_test: false
    
        simulation: true
    
        simulation_distribution:
            distribution: normal
            mean: 2
            sd: 1
    

    Then run ./run -c ./configs/MNIST/fedavg_async_lenet5.yml

    • Test 3 with Zipf distribution: First update the client configuration in configs/MNIST/fedavg_async_lenet5.yml as below:
    clients:
        type: simple
    
        total_clients: 2
    
        per_round: 2
    
        do_test: false
    
        simulation: true
    
        simulation_distribution:
            distribution: zipf
            s: 2
    

    Then run ./run -c ./configs/MNIST/fedavg_async_lenet5.yml

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.

    Additional information

    The current implementation deviates a bit from the original intent. The original intents were to put clients to sleep at the end of each epoch. To accomplish that, the await asyncio.sleep() code should be inserted in method Trainer.train_process() in file plato/trainers/basic.py, and that would require the method Trainer.train_process() to be changed to async, and we need to change every function that calls train_process() to async. I'm afraid that might break the code, so I just stay with the current implementation, where the clients are put to sleep after they finished the model training.

    UPDATE: please ignore the information above, the code is now implemented in a way such that clients are put to sleep at the end of each epoch. Please refer to the conversation below for details.

    opened by cuiboyuan 16
  • Add Support for FEMNIST

    Add Support for FEMNIST

    Add support for the FEMNIST dataset by referring to an open-sourced project LEAF.

    Description

    Main changes:

    1. Added a new datasource femnist at ~/plato/datasources/femnist.py and modified the ~/plato/datasources/registry.py accordingly.
    2. Added a new sampler empty at ~/plato/samplers/empty.py and modified the ~/plato/samplers/registry.py accordingly.
    3. Made minor changes at ~/plato/trainers/basic.py and ~/plato/models/lenet5.py for further adaptation.

    Remark: while the implementation is mainly borrowed from LEAF, it does not need to plug in the LEAF project and can work independently.

    Motivation

    Apart from the label distribution skew (that can be implemented with LDA), non-IID scenarios also consist of other circumstances including (1) feature distribution skew, (2) same label with different features, as well as (3) same feature with different labels (see this survey for more details). Thus, it would be useful if Plato can supports FL research with more realistic datasets. The FEMNIST dataset is one celebrated example, and it is inherently partitioned by the clients' identification. We thus considered adding support for it, and hopefully, our design can be compatible with those realistic datasets that are also partitioned by clients.

    p.s. One may want to refer to an external tutorial in a forked version of Plato for more context.

    How has this been tested?

    At the root directory,

    conda activate federated
    python run --config=./examples/async/data_hetero/data_hetero_femnist_lenet5.yml > out.txt 2>&1 &
    

    p.s. Please expect hours for the first test in your environment due to the data preprocessing overhead.

    In our test, we observed the generated out.txt and confirmed that the training can go on smoothly.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code follows the code style of this project.
    • [x] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    opened by SamuelGong 14
  • [FR] Local Differential Privacy Methods

    [FR] Local Differential Privacy Methods

    Is your feature request related to a problem? Please describe. Currently there is only one implementation of local differential privacy (LDP): RAPPOR[1], implemented in https://github.com/TL-System/plato/blob/main/plato/utils/unary_encoding.py and it is not decoupled with algorithm implementation.

    https://github.com/TL-System/plato/blob/fac44a6bdbe64d3060ae290e4633b316b02a1474/plato/algorithms/mistnet.py#L52-L64

    https://github.com/TL-System/plato/blob/fac44a6bdbe64d3060ae290e4633b316b02a1474/plato/algorithms/mindspore/mistnet.py#L44-L48

    https://github.com/TL-System/plato/blob/fac44a6bdbe64d3060ae290e4633b316b02a1474/examples/nnrt/nnrt_algorithms/mistnet.py#L60-L65

    This feature request calls for a modular LDP plugin interface and a number of different other methods e.g. [2][3]

    Describe the solution you'd like

    • [x] ~~Unified data exchange format between clients and server.~~
    • [x] A modular interface for plugging in data processing modules into the server-client data exchange.
    • [x] A config entry for enabling specific data processing modules.
    • [ ] LDP modules implementation.
    • [ ] Test on the theoretical property of modules i.e. ε-LDP

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered. To be filled.

    Additional context Add any other context or screenshots about the feature request here. [1] Ú. Erlingsson, V. Pihur, and A. Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054–1067. ACM, 2014. [2] Differential Privacy Team, Apple. Learning with privacy at scale. 2017. [3] B. Ding, J. Kulkarni, and S. Yekhanin. Collecting telemetry data privately. In Advances in Neural Information Processing Systems 30, December 2017.

    enhancement 
    opened by hcngac 12
  • [RFC]Android Clients

    [RFC]Android Clients

    Development of android FL client to further enhance the simulation of FL on mobile devices

    Approach

    • Use of chaquo to adapt the current Python code base to Android.
      • Chaquo is not open source, but it provides free license for open source projects.
      • Chaquo is the only Python to Android tool that has PyTorch packaged.
      • Building PyTorch for Android in other tools require significant amount of work.
    • Use of redroid to support multiple instances of android devices.
      • Redroid is Android in container, using the same kernel as the host.
      • The performance of Redroid is close to the host, making multiple Android instances possible.
    • Separate log server to receive log entries from android clients.
      • There is no good way to directly extract log contents from Android app.
      • Using an HTTP log server and modifying the logging handler in clients can handle the logs nicely.
    enhancement 
    opened by hcngac 10
  • Added General Support for Asynchronous Training

    Added General Support for Asynchronous Training

    Add support for asynchronous FL where the central server can eagerly start the training of idle clients before receiving sufficient model updates.

    Motivation

    The FL practice is currently dominated by the synchronous mode, wherein each round the server needs to wait until receiving a sufficient number of clients' updates prior to deriving an aggregated model update (an example in Plato can be found in the method Server.client_payload_done() of ~/plato/servers/base.py). On the other hand, asynchronous mode has been extensively studied in traditional distributed learning (where the data distribution across clients is IID). In asynchronous training, the server eagerly starts the training of idle clients before receiving sufficient model updates sent by previously selected clients (an example is illustrated in the following figure). Out of curiosity, we want to explore the spectrum of the system performance (instead of theoretical convergence rate as in existing work) of asynchronous mode in the context of FL under varying degrees of client heterogeneity.

    Description

    1. Added a module ~/servers/async_timer.py, which acts as a virtual client at the server-side and plays the role of sending heartbeats for periodically triggered client selection.
    2. Added a module ~/servers/async_base.py, the base class of the respective server, which mainly implements the workflow of an asynchronous step:
    3. Added a module ~/servers/async_fedavg.py, where the aggregation logic is simply performing FedAvg on unaggregated weights. It implies that we can have other implementations of aggregation even in asynchronous mode.

    How has this been tested?

    We test it in a fresh clone like

    git clone [email protected]:SamuelGong/plato.git
    cd plato
    [with all necessary installation steps]
    conda activate federated
    python run --config=./examples/async/async_train/async_train_mnist_lenet5.yml > log.txt 2>&1 &
    

    Example results

    Time-to-accuracy performance w.r.t. the provided configuration can be seen as follows,

    while the corresponding time sequence diagram is also depicted for providing more insights.

    Context

    This is our preliminary attempt and we would like to hear from the authors in an agile manner. Thus, we still anticipate necessary changes on the code, let alone the coding styles, comments, and documentation (though they should be already easy to read at the moment). More context can be found in an external tutorial.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code follows the code style of this project.
    • [x] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    opened by SamuelGong 10
  • Feature/client sim

    Feature/client sim

    Some details: On server side (in plato/servers/base.py):

    • The (actual / launched) client_id is paired with sid
    • Each launched client has an attribute virtual_id:
      • Not in simulation: equal to the actual client_id
      • In simulation: updated every round after the server selects clients from self.client_pool
    • The server selects clients from self.client_pool instead of self.clients:
      • Not in simulation: a list of connected clients' ids updated with self.clients
      • In simulation: a list of all possible clients' ids according to config parameter total_clients

    On client side (in plato/clients/base.py and plato/clients/simple.py):

    • The client_id is the virtual one designated by the server updated each round
    • The actual_client_id is paired with sid used for connection
    • The client will update the trainer and algorithm used in this round whenever it receives a response from the server with new designated virtual_id

    Status:

    • Tests regarding client simulation passed for several examples (FedAvg, FedAtt, FedAdp, AFL).
    • Conflicts in example server or client were solved as much as possible.
    • README.md or other documents haven't been updated with this new feature yet.

    Potential Concerns:

    • Logging info might be confusing: In client simulation, the id of a new contact sent to the server is still the actual (launched) client id even though the client may represent a virtual one with a different virtual id in the last round.
    • One should be careful with self.selected_clients, self.clients_pool, self.client_id, self.virtual_id when designing example servers and self.client_id, self.actual_client_id when designing example clients.
    opened by silviafeiwang 8
  • Enable Oort working in the async mode.

    Enable Oort working in the async mode.

    Description

    Previously, the implementation of Oort cannot work normally in asynchronous mode since the server updates client utility according to the 'self.explored_client' list which contains delayed clients that haven't sent out the updates. To address it, we make the server update based on the update list.

    How has this been tested?

    Ran 'oort_MNIST_lenet5.yml' in asynchronous environment.

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [ x] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    opened by Yufei-Kang 7
  • Added the facility to record local testing accuracies in .csv files

    Added the facility to record local testing accuracies in .csv files

    Description

    When running a job, if the configuration file has the attribute "do_test" set to true and if there also exists a "results" attribute, then the test accuracies of each client will be computed locally and stored in a csv file with the round number, client ID and test accuracy as headers.

    How has this been tested?

    Tested on local machine using the "fedavg_async_lenet5" configuration file with 1-3 clients. Each with 1-3 rounds after setting "do_test" to True and another time to False

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [x] I have updated the documentation accordingly.
    opened by kevinsun203 7
  • RLFL: A Reinforcement Learning Framework for Active Federated Learning

    RLFL: A Reinforcement Learning Framework for Active Federated Learning

    This implements a reinforcement learning framework for learning and controlling federated learning tasks

    Description

    The added directory plato/utils/rlfl is the framework base; the added directory examples/fei is an instance of a DRL agent that learns the global aggregation strategy.

    How has this been tested?

    Tests of the instance examples/fei are passed on the latest plato environment.

    To know how to customize another DRL agent and run the training/testing, please refer to plato/utils/rlfl/README.md.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [x] I have updated the documentation accordingly.
    opened by silviafeiwang 7
  • Fixed the reported bug of the config test in #190

    Fixed the reported bug of the config test in #190

    Noticing the bug reported in #190, I then fixed all issues that existed in the tests/config_tests.py.

    Description

    I made three changes to the code. First, I moved the configuration files, including Pipelines and Models, into the Kinetics directory. This fixed the issue of "FileNotFoundError". Second, all PyLint errors were addressed, making the code rate 10.00/10. Finally, I added more comments further to describe the objective of each unit test function.

    How has this been tested?

    1. config_tests As for the code running, I tested it with python tests/config_tests.py. As for the format test, I executed the command pylint tests/config_tests.py.

    2. data_tests As for the code running, I tested it with python tests/data_tests.py. As for the format test, I executed the command pylint tests/data_tests.py.

    3. sampler_tests As for the code running, I tested it with python tests/sampler_tests.py. As for the format test, I executed the command pylint tests/sampler_tests.py.

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    opened by CSJDeveloper 6
  • Added the search space MobileNetV3 into example code of PerFedRLNAS

    Added the search space MobileNetV3 into example code of PerFedRLNAS

    Description

    This PR has two contributions. First, the nasvit space is moved to the plato/models as it has been tested through experiments and confirmed that no big changes will be added over nasvit space. Regarding other search space, they can also inherit part of the code from nasvit.

    Second, I added another search space mobilenetv3 on basis of previous codes. As nasvit has some basic units of linear layers, convolution layers, residual blocks for NAS supernet. It is easy to build this search space on the basis of nasvit code. The detailed code of implementation of search space mobilenetv3 in under examples/pfedrlnas/MobileNetV3/model. The idea of how to build this search space refers to the paper Searchiing for MobileNetV3.

    How has this been tested?

    To test the mobilenetv3 search space, we can test by running the command:

    python3 ./examples/pfedrlnas/MobileNetV3/fednas.py -c ./examples/pfedrlnas/configs/FedNAS_CIFAR10_Mobilenet_NonIID03_Scratch.yml
    

    To test if the nasvit is moved to plato/models correct and the search space NASVIT, we can run the command:

    python3 ./examples/pfedrlnas/VIT/fednas.py -c ./examples/pfedrlnas/configs/FedNAS_CIFAR10_NASVIT_NonIID01_Scratch.yml
    

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code has been formatted using Black and checked using PyLint.
    • [x] My change requires a change to the documentation.
    • [x] I have updated the documentation accordingly.
    opened by dixiyao 1
  • Added a new fedavg algorithm supporting aggregating partial sub-modules of one model

    Added a new fedavg algorithm supporting aggregating partial sub-modules of one model

    This PR implements a new, perhaps enhanced, FedAvg algorithm for Plato to support extracting and aggregating partial sub-modules of one defined model.

    Description

    In many learning setups, only part of the model is used as the global model to be exchanged between server and client. For instance, after defining a ResNet model, its fully convolutional neural network will be utilized as the global model, while the fully-connected part will remain locally.

    To achieve the aforementioned feature, this PR inherits from Plato's conventional FedAvg algorithm and boosts the extract_weights function. Besides, there are also some necessary functions to support a wider range of applications.

    With the new FedAvg, which parts of the model are utilized as the global model can be set by the hyper-parameter named global_submodules_name whose format should be: {submodule1_prefix}__{submodule2_prefix}__{submodule3_prefix}__... where names for different submodules are separated by two consecutive underscores.

    How has this been tested?

    This PR can be tested through the unit test called fedavg_tests.py under the folder tests/.

    To run the test, you have to first switch to Plato's root folder. And, then you can run:

    [email protected]:~$ python tests/fedavg_tests.py
    

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code has been formatted using Black and checked using PyLint.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    opened by CSJDeveloper 2
  • Supported a more general way of checkpoint operations

    Supported a more general way of checkpoint operations

    This PR implements multiple checkpoint operations, which can be utilized directly.

    Description

    Plato should contain sufficient checkpoint operations. Based on these operations, the checkpoint can be saved, loaded, or operated based on desired requirements.

    Therefore, this PR mainly includes three types of operations:

    1. Saving
    2. Loading
    3. Checkpoint searching, such as searching for the latest checkpoint

    Additionally, the code to generate a consistent filename is implemented to make all filenames of Plato share the same format.

    How has this been tested?

    As the code under this does not influence existing Plato's examples, the only way utilized to test the implementation is the unit test checkpoint_tests.py placed under tests/ folder of Plato.

    To run the test, you must first switch to Plato's root folder. And, then you can run:

    [email protected]:~$ python tests/checkpoint_tests.py
    

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code has been formatted using Black and checked using PyLint.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    opened by CSJDeveloper 1
  • Added more visual data augmentations

    Added more visual data augmentations

    This PR introduces more data augmentations for the visual images.

    Description

    When implementing other methods, such as self-supervised learning (SSL), under the components of Plato, the datasource generally requires more additional and complex augmentations. One great example is that once the typical SSL method, called BYOL, is utilized for training the model in Plato, the input image should be processed to generate multi-view samples, each corresponding to one specific data augmentation.

    Currently, Plato's simple data augmentation method does not support this.

    To fill this gap, this PR is created to 1). add a more general way to create different data augmentations; 2). implement multiple visual transforms used in SSL; 3). collect normalizations for different datasets for clarity.

    How has this been tested?

    No test is needed as 1). the correctness of the code has been proved by work under the 'contrastive_adaptation' branch; 2). sufficient links are added in the comment to present the source and support of the implementation.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue) Fixes #
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [x] My code has been formatted using Black and checked using PyLint.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    opened by CSJDeveloper 1
Releases(v0.4.6)
  • v0.4.6(Dec 5, 2022)

  • v0.4.5(Oct 27, 2022)

    Improved client and server APIs; made client-side processors more customizable; added several examples showcasing how the APIs are to be used; various bug fixes.

    Source code(tar.gz)
    Source code(zip)
  • v0.4.4(Aug 20, 2022)

    Redesigned the API for the server, trainer, and algorithm; Supported new documentation and its automated deployment; Redesigned some examples based on the new API.

    Source code(tar.gz)
    Source code(zip)
  • v0.4.3(Jul 20, 2022)

    Added more learning rate schedules from PyTorch; added approximate simulations of communication times; revised quantization processors; revised the way of using custom models, datasources, trainers, and algorithms; many bug fixes.

    Source code(tar.gz)
    Source code(zip)
  • v0.4.2(Jun 1, 2022)

  • v0.4.1(May 20, 2022)

    Fixed several important issues related to client-side samplers, loading custom algorithms, federated unlearning, and added default values for configurations.

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(May 14, 2022)

    Supported running an FL session on multiple GPUs, and further improved scalability in memory usage by always launching a constant number of client processes regardless of the number of clients selected per round. Made client simulation mode the default and only mode of operation.

    Source code(tar.gz)
    Source code(zip)
  • v0.3.9(May 2, 2022)

  • v0.3.8(Apr 30, 2022)

  • v0.3.7(Feb 20, 2022)

    Added support for HuggingFace Language Modelling models and datasets, reinforcement learning servers, simulating client/server communication, measuring communication time, additional examples using the asynchronous mode, and removed wandb usage.

    Source code(tar.gz)
    Source code(zip)
  • v0.3.6(Feb 4, 2022)

  • v0.3.5(Jan 28, 2022)

  • v0.3.4(Jan 23, 2022)

    Added several multi-modal data sources, and supported simulating the wall clock time in asynchronous mode, when the clients on the same physical machine are training in small batches (controlled by trainer -> max_concurrency) due to insufficient GPU memory.

    Source code(tar.gz)
    Source code(zip)
  • v0.3.3(Dec 30, 2021)

    Added support for differentially private training on the client side, fixed issues related to cross-silo training, and added basic support for asynchronous training with bounded staleness.

    Source code(tar.gz)
    Source code(zip)
  • v0.3.2(Dec 9, 2021)

Owner
System [email protected] Lab
System <a href=[email protected] Lab">
Lightweight, Python library for fast and reproducible experimentation :microscope:

Steppy What is Steppy? Steppy is a lightweight, open-source, Python 3 library for fast and reproducible experimentation. Steppy lets data scientist fo

minerva.ml 134 Jul 10, 2022
This is a collection of all challenges in HKCERT CTF 2021

香港網絡保安新生代奪旗挑戰賽 2021 (HKCERT CTF 2021) This is a collection of all challenges (and writeups) in HKCERT CTF 2021 Challenges ID Chinese name Name Score S

10 Jan 27, 2022
Generalized and Efficient Blackbox Optimization System.

OpenBox Doc | OpenBox中文文档 OpenBox: Generalized and Efficient Blackbox Optimization System OpenBox is an efficient and generalized blackbox optimizatio

DAIR Lab 238 Dec 29, 2022
Pytorch Implementation of Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

NANSY: Unofficial Pytorch Implementation of Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations Notice Papers' D

Dongho Choi 최동호 104 Dec 23, 2022
API for RL algorithm design & testing of BCA (Building Control Agent) HVAC on EnergyPlus building energy simulator by wrapping their EMS Python API

RL - EmsPy (work In Progress...) The EmsPy Python package was made to facilitate Reinforcement Learning (RL) algorithm research for developing and tes

20 Jan 05, 2023
Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.

Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.

Aviv Gabbay 41 Nov 29, 2022
Pyeventbus: a publish/subscribe event bus

pyeventbus pyeventbus is a publish/subscribe event bus for Python 2.7. simplifies the communication between python classes decouples event senders and

15 Apr 21, 2022
A toolset for creating Qualtrics-based IAT experiments

Qualtrics IAT Tool A web app for generating the Implicit Association Test (IAT) running on Qualtrics Online Web App The app is hosted by Streamlit, a

0 Feb 12, 2022
Segment axon and myelin from microscopy data using deep learning

Segment axon and myelin from microscopy data using deep learning. Written in Python. Using the TensorFlow framework. Based on a convolutional neural network architecture. Pixels are classified as eit

NeuroPoly 103 Nov 29, 2022
CL-Gym: Full-Featured PyTorch Library for Continual Learning

CL-Gym: Full-Featured PyTorch Library for Continual Learning CL-Gym is a small yet very flexible library for continual learning research and developme

Iman Mirzadeh 36 Dec 25, 2022
This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations,

labml.ai Deep Learning Paper Implementations This is a collection of simple PyTorch implementations of neural networks and related algorithms. These i

labml.ai 16.4k Jan 09, 2023
StarGAN - Official PyTorch Implementation (CVPR 2018)

StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

Yunjey Choi 5.1k Dec 30, 2022
Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

DocEnTR Description Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on to

Mohamed Ali Souibgui 74 Jan 07, 2023
Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021 Spotlight

PCAN for Multiple Object Tracking and Segmentation This is the offical implementation of paper PCAN for MOTS. We also present a trailer that consists

ETH VIS Group 328 Dec 29, 2022
Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

HaloNet - Pytorch Implementation of the Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones. This re

Phil Wang 189 Nov 22, 2022
PyTorch implementation for "Sharpness-aware Quantization for Deep Neural Networks".

Sharpness-aware Quantization for Deep Neural Networks Recent Update 2021.11.23: We release the source code of SAQ. Setup the environments Clone the re

Zhuang AI Group 30 Dec 19, 2022
Implementation of "Bidirectional Projection Network for Cross Dimension Scene Understanding" CVPR 2021 (Oral)

Bidirectional Projection Network for Cross Dimension Scene Understanding CVPR 2021 (Oral) [ Project Webpage ] [ arXiv ] [ Video ] Existing segmentatio

Hu Wenbo 135 Dec 26, 2022
Multi-Joint dynamics with Contact. A general purpose physics simulator.

MuJoCo Physics MuJoCo stands for Multi-Joint dynamics with Contact. It is a general purpose physics engine that aims to facilitate research and develo

DeepMind 5.2k Jan 02, 2023
A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning

Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

Mathieu Godbout 1 Nov 19, 2021
Robust, modular and efficient implementation of advanced Hamiltonian Monte Carlo algorithms

AdvancedHMC.jl AdvancedHMC.jl provides a robust, modular and efficient implementation of advanced HMC algorithms. An illustrative example for Advanced

The Turing Language 167 Jan 01, 2023