A forwarding MPI implementation that can use any other MPI implementation via an MPI ABI

Overview

MPItrampoline

  • MPI wrapper library: GitHub CI
  • MPI trampoline library: GitHub CI
  • MPI integration tests: GitHub CI

MPI is the de-facto standard for inter-node communication on HPC systems, and has been for the past 25 years. While highly successful, MPI is a standard for source code (it defines an API), and is not a standard defining binary compatibility (it does not define an ABI). This means that applications running on HPC systems need to be compiled anew on every system. This is tedious, since the software that is available on every HPC system is slightly different.

This project attempts to remedy this. It defines an ABI for MPI, and provides an MPI implementation based on this ABI. That is, MPItrampoline does not implement any MPI functions itself, it only forwards them to a "real" implementation via this ABI. The advantage is that one can produce "portable" applications that can use any given MPI implementation. For example, this will make it possible to build external packages for Julia via Yggdrasil that run efficiently on almost any HPC system.

A small and simple MPIwrapper library is used to provide this ABI for any given MPI installation. MPIwrapper needs to be compiled for each MPI installation that is to be used with MPItrampoline, but this is quick and easy.

Successfully Tested

  • Debian 11.0 via Docker (MPICH; arm32v5, arm32v7, arm64v8, mips64le, ppc64le, riscv64; C/C++ only)
  • Debian 11.0 via Docker (MPICH; i386, x86-64)
  • macOS laptop (MPICH, OpenMPI; x86-64)
  • macOS via Github Actions (OpenMPI; x86-64)
  • Ubuntu 20.04 via Docker (MPICH; x86-64)
  • Ubuntu 20.04 via Github Actions (MPICH, OpenMPI; x86-64)
  • Blue Waters, HPC system at the NCSA (Cray MPICH; x86-64)
  • Graham, HPC system at Compute Canada (Intel MPI; x86-64)
  • Marconi A3, HPC system at Cineca (Intel MPI; x86-64)
  • Niagara, HPC system at Compute Canada (OpenMPI; x86-64)
  • Summit, HPC system at ORNL (Spectrum MPI; IBM POWER 9)
  • Symmetry, in-house HPC system at the Perimeter Institute (MPICH, OpenMPI; x86-64)

Workflow

Preparing an HPC system

Install MPIwrapper, wrapping the MPI installation you want to use there. You can install MPIwrapper multiple times if you want to wrap more than one MPI implementation.

This is possibly as simple as

cmake -S . -B build -DMPIEXEC_EXECUTABLE=mpiexec -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX=$HOME/mpiwrapper
cmake --build build
cmake --install build

but nothing is ever simple on an HPC system. It might be necessary to load certain modules, or to specify more cmake MPI configuration options.

The MPIwrapper libraries remain on the HPC system, they are installed independently of any application.

Building an application

Build your application as usual, using MPItrampline as MPI library.

Running an application

At startup time, MPItrampoline needs to be told which MPIwrapper library to use. This is done via the environment variable MPITRAMPOLINE_LIB. You also need to point MPItrampoline's mpiexec to a respective wrapper created by MPIwrapper, using the environment variable MPITRAMPOLINE_MPIEXEC.

For example:

env MPITRAMPOLINE_MPIEXEC=$HOME/mpiwrapper/bin/mpiwrapper-mpiexec MPITRAMPOLINE_LIB=$HOME/mpiwrapper/lib/libmpiwrapper.so mpiexec -n 4 ./your-application

The mpiexec you run here needs to be the one provided by MPItrampoline.

Current state

MPItrampoline uses the C preprocessor to create wrapper functions for each MPI function. This is how MPI_Send is wrapped:

FUNCTION(int, Send,
         (const void *buf, int count, MT(Datatype) datatype, int dest, int tag,
          MT(Comm) comm),
         (buf, count, (MP(Datatype))datatype, dest, tag, (MP(Comm))comm))

Unfortunately, MPItrampoline does not yet wrap the Fortran API. Your help is welcome.

Certain MPI types, constants, and functions are difficult to wrap. Theoretically, there could be MPI libraries where it is not possible to implement the current MPI ABI. If you encounter this, please let me know -- maybe there is a work-around.

Comments
  • Add support for MPI profiling interface

    Add support for MPI profiling interface

    Each standard MPI function can be called with an MPI_ or PMPI_ prefix (quoting from https://www.open-mpi.org/faq/?category=perftools#PMPI), I think MPItrampoline doesn't currently support the PMPI_ calls.

    opened by ocaisa 6
  • Supporting `MPIX_Query_cuda_support()`

    Supporting `MPIX_Query_cuda_support()`

    While not part of the standard, MPIX_Query_cuda_support() is available in a number of MPI implementations (see https://github.com/pmodels/mpich/pull/4741). It's also being used by a number of applications (and hopefully that will grow, see my issue at https://github.com/lammps/lammps/issues/3140 which links back to the support in GROMACS). This would be a valuable inclusion in MPItrampoline since it would then be able to handle the runtime detection of CUDA support in the MPI implementation.

    opened by ocaisa 6
  • Allow overriding default compilation options

    Allow overriding default compilation options

    Currently default compilation options are set during the build of MPItrampoline, it would probably be useful to be able to fully override these, e.g.,

    exec ${MPITRAMPOLINE_CC:[email protected]_C_COMPILER@} ${CFLAGS:-"@CMAKE_C_FLAGS@"} [email protected]_INSTALL_PREFIX@/@CMAKE_INSTALL_INCLUDEDIR@ @LINK_FLAGS@ [email protected]_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@ -Wl,-rpath,@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@ "$@" -lmpi -ldl
    
    opened by ocaisa 5
  • Allow use of a fallback/default value for `MPITRAMPOLINE_MPIEXEC`

    Allow use of a fallback/default value for `MPITRAMPOLINE_MPIEXEC`

    It's possible to configure a default MPI library with -DMPITRAMPOLINE_DEFAULT_LIB=XXX, it would be good to be able to also configure a default mpiexec (-DMPITRAMPOLINE_DEFAULT_MPIEXEC=XXX) so that one can have a fully functional fallback in place at build time.

    opened by ocaisa 3
  • Incomplete installation

    Incomplete installation

    Hello ! I am looking at installing MPItrampoline, and following the steps outlined in the README.md, I cannot access files later referenced.

    For instance, on Summit at ORNL, using the following script:

    INSTALLDIR=$HOME/mpiwrapper
    
    module load cmake/3.18
    module load gcc
    
    cmake -S . -B build -DMPIEXEC_EXECUTABLE=mpiexec \
                                   -DCMAKE_BUILD_TYPE=RelWithDebInfo \
                                   -DCMAKE_INSTALL_PREFIX=$INSTALLDIR
    cmake --build build
    cmake --install build
    

    The following installation is generated:

    $ tree mpiwrapper/
    mpiwrapper/
    |-- bin
    |   |-- mpicc
    |   |-- mpicxx
    |   |-- mpiexec
    |   |-- mpifc
    |   `-- mpifort
    |-- include
    |   |-- mpi.h
    |   |-- mpi.mod
    |   |-- mpi_declarations.h
    |   |-- mpi_declarations_fortran.h
    |   |-- mpi_declarations_fortran90.h
    |   |-- mpi_defaults.h
    |   |-- mpi_f08.mod
    |   |-- mpi_version.h
    |   |-- mpiabi.h
    |   |-- mpiabif.h
    |   |-- mpif.h
    |   `-- mpio.h
    |-- lib
    |   |-- cmake
    |   |   `-- MPItrampoline
    |   |       |-- MPItrampolineConfig.cmake
    |   |       |-- MPItrampolineConfigVersion.cmake
    |   |       |-- MPItrampolineTargets-relwithdebinfo.cmake
    |   |       `-- MPItrampolineTargets.cmake
    |   `-- pkgconfig
    |       `-- MPItrampoline.pc
    `-- lib64
        |-- libmpi.a
        `-- libmpifort.a
    
    7 directories, 24 files
    

    The README.md dictates to use the wrapped libraries, but no shared libraries are available here. I looked through the options of the CMakeLists.txt but the only one defined in the project is the fortran flag.

    Am I missing something ?

    opened by spoutn1k 2
  •  Issues building Global Arrays

    Issues building Global Arrays

    I'm trying to build the Global Arrays library with MPItrampoline and have run into some issues. This package is a dependency for Molpro and several other computational chemistry applications. Any assistance that you can provide to get it working would be greatly appreciated.

    The compilation fails at https://github.com/GlobalArrays/ga/blob/f4016b869dfd1a2b2856f74a73dd9452dbbc8ae4/comex/src-mpi-pr/comex.c#L4968 with the error message:

    libtool: compile:  mpicc -DHAVE_CONFIG_H -I. -I./src-common -I./src-mpi-pr -g -O2 -MT src-mpi-pr/comex.lo -MD -MP -MF src-mpi-pr/.deps/comex.Tpo -c src-mpi-pr/comex.c -o src-mpi-pr/comex.o
    src-mpi-pr/comex.c: In function ‘str_mpi_retval’:
    src-mpi-pr/comex.c:4932:9: error: case label does not reduce to an integer constant
             case MPI_SUCCESS       : msg = "MPI_SUCCESS"; break;
             ^
    

    Steps to reproduce:

    git clone -b develop https://github.com/GlobalArrays/ga
    cd ga
    ./autogen.sh
    ./configure --with-mpi-pr --with-blas=no --with-lapack=no --with-scalapack=no --disable-f77
    make
    

    I've tried both the develop and master branches.

    There are lots of configurations which can be used, we're most interested in the recommended port which uses MPI-1 with progress ranks (--with-mpi-pr) but I tried several other configurations without success.

    I'm not sure whether it's an MPItrampoline issue or whether it's non-standard use of MPI within Global Arrays.

    I've attached the full output from build build-ga.txt

    I've tried on

    • RHEL 7.9 with gcc 4.8.5
    • Ubuntu 22.04 with gcc 11.3.0
    opened by nick-wilson 9
  • Issues building CP2K

    Issues building CP2K

    I thought I would give this a full test with Fortran, and CP2K is a good benchmark for that. The build (v8.2) is failing with:

    /project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/exts/dbcsr/src/mpi/dbcsr_mpiwrap.F:1669:21:
    
     1669 |       CALL mpi_bcast(msg, msglen, MPI_LOGICAL, source, gid, ierr)
          |                     1
    ......
     3160 |       CALL mpi_bcast(msg, msglen, ${mpi_type1}$, source, gid, ierr)
          |                     2
    Error: Type mismatch between actual argument at (1) and actual argument at (2) (LOGICAL(4)/COMPLEX(4)).
    
    opened by ocaisa 24
  • Building shared and static libraries at once

    Building shared and static libraries at once

    Currently the default behaviour is to only build static libraries. It might be good build both static and shared libraries since then if libmpi.so is in the default search path it is not selected over the library from MPItrampoline (MPItrampoline would shadow libmpi.so and libmpi.a).

    opened by ocaisa 4
Releases(v5.2.0)
Owner
Erik Schnetter
Erik Schnetter
Network Enhancement implementation in pytorch

network_enahncement_pytorch Network Enhancement implementation in pytorch Research paper Network Enhancement: a general method to denoise weighted bio

Yen 1 Nov 12, 2021
The implementation of DeBERTa

DeBERTa: Decoding-enhanced BERT with Disentangled Attention This repository is the official implementation of DeBERTa: Decoding-enhanced BERT with Dis

Microsoft 1.2k Jan 06, 2023
Compressed Video Action Recognition

Compressed Video Action Recognition Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl. In CVPR, 2018. [Proj

Chao-Yuan Wu 479 Dec 26, 2022
PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

PyTorch Image Models Sponsors What's New Introduction Models Features Results Getting Started (Documentation) Train, Validation, Inference Scripts Awe

Ross Wightman 22.9k Jan 09, 2023
Single Red Blood Cell Hydrodynamic Traps Via the Generative Design

Rbc-traps-generative-design - The generative design for single red clood cell hydrodynamic traps using GEFEST framework

Natural Systems Simulation Lab 4 Jun 16, 2022
Source Code of NeurIPS21 paper: Recognizing Vector Graphics without Rasterization

YOLaT-VectorGraphicsRecognition This repository is the official PyTorch implementation of our NeurIPS-2021 paper: Recognizing Vector Graphics without

Microsoft 49 Dec 20, 2022
Learning a mapping from images to psychological similarity spaces with neural networks.

LearningPsychologicalSpaces v0.1: v1.1: v1.2: v1.3: v1.4: v1.5: The code in this repository explores learning a mapping from images to psychological s

Lucas Bechberger 8 Dec 12, 2022
Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

567 Dec 26, 2022
High-quality implementations of standard and SOTA methods on a variety of tasks.

Uncertainty Baselines The goal of Uncertainty Baselines is to provide a template for researchers to build on. The baselines can be a starting point fo

Google 1.1k Dec 30, 2022
Implementation of the "PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences" paper.

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences Introduction Point cloud sequences are irregular and unordered in the spatial dimen

Hehe Fan 63 Dec 09, 2022
Computationally efficient algorithm that identifies boundary points of a point cloud.

BoundaryTest Included are MATLAB and Python packages, each of which implement efficient algorithms for boundary detection and normal vector estimation

6 Dec 09, 2022
Code for the Paper "Diffusion Models for Handwriting Generation"

Code for the Paper "Diffusion Models for Handwriting Generation"

62 Dec 21, 2022
A PyTorch implementation of the Transformer model in "Attention is All You Need".

Attention is all you need: A Pytorch Implementation This is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish V

Yu-Hsiang Huang 7.1k Jan 04, 2023
Deep Anomaly Detection with Outlier Exposure (ICLR 2019)

Outlier Exposure This repository contains the essential code for the paper Deep Anomaly Detection with Outlier Exposure (ICLR 2019). Requires Python 3

Dan Hendrycks 464 Dec 27, 2022
This repository is an unoffical PyTorch implementation of Medical segmentation in 3D and 2D.

Pytorch Medical Segmentation Read Chinese Introduction:Here! Recent Updates 2021.1.8 The train and test codes are released. 2021.2.6 A bug in dice was

EasyCV-Ellis 618 Dec 27, 2022
《K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters》(2020)

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters This repository is the implementation of the paper "K-Adapter: Infusing Knowledge

Microsoft 118 Dec 13, 2022
Deep learning for Engineers - Physics Informed Deep Learning

SciANN: Neural Networks for Scientific Computations SciANN is a Keras wrapper for scientific computations and physics-informed deep learning. New to S

SciANN 195 Jan 03, 2023
DimReductionClustering - Dimensionality Reduction + Clustering + Unsupervised Score Metrics

Dimensionality Reduction + Clustering + Unsupervised Score Metrics Introduction

11 Nov 15, 2022
Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

Face Recognition Using Pytorch Python 3.7 3.6 3.5 Status This is a repository for Inception Resnet (V1) models in pytorch, pretrained on VGGFace2 and

Tim Esler 3.3k Jan 04, 2023
Convenient tool for speeding up the intern/officer review process.

icpc-app-screen Convenient tool for speeding up the intern/officer applicant review process. Eliminates the pain from reading application responses of

1 Oct 30, 2021