signac-flow - manage workflows with signac

Overview

signac-flow - manage workflows with signac

Affiliated with NumFOCUS PyPI conda-forge CircleCI RTD License PyPI-downloads Slack Twitter GitHub Stars

The signac framework helps users manage and scale file-based workflows, facilitating data reuse, sharing, and reproducibility.

The signac-flow tool provides the basic components to set up simple to complex workflows for projects managed by the signac framework. That includes the definition of data pipelines, execution of data space operations and the submission of operations to high-performance super computers.

Resources

Installation

The recommended installation method for signac-flow is through conda or pip. The software is tested for Python versions 3.6+ and is built for all major platforms.

To install signac-flow via the conda-forge channel, execute:

conda install -c conda-forge signac-flow

To install signac-flow via pip, execute:

pip install signac-flow

Detailed information about alternative installation methods can be found in the documentation.

Testing

You can test this package by executing

$ python -m pytest tests/

within the repository root directory.

Acknowledgment

When using signac as part of your work towards a publication, we would really appreciate that you acknowledge signac appropriately. We have prepared examples on how to do that here. Thank you very much!

The signac framework is a NumFOCUS Affiliated Project.

Comments
  • Implementing a grouping feature to organize flow operations

    Implementing a grouping feature to organize flow operations

    Description

    Goal of this branch is to create a grouping feature that will allow for operations in FlowProject to be organized into "metaoperations." Currently to impliment this behavior either a new FlowProject has to operate on the original set of operations or user commenting has to be done to prevent the group operations from running individually.

    An example snippet of expected functional shows in more detail a potential API and use of the feature,

    a_group = Project.make_group('a')
    
    @Pr.operation
    @a_group
    def foo(job):
        pass
    
    @Pr.operation
    @a_group
    def bar(job):
        pass
    

    Then to use the grouping a command like, submit --group a_group would output a job for the scheduler with the final command being something akin to python project.py run -j abc123 -o foo bar

    Motivation and Context

    This change is a feature change that seeks to give the user more control over the workflow of their project. The group concept will allow for operations to be tagged to be run at once in a single job which can work even if the pre and post conditions are not met initially but will be once other operations have run.

    This abstraction of a workflow is useful for multiple scenerios such as automatically restricting operations to particular environements, running a job repeatedly in a single submit #4 , and moving workflow logic into the FlowProject and not the individual operations for idiomatic code. Another use case would to be mark operations as submit only #33 . Another issue tangentially related would be #41 as groups could be considered to create graphs themselves or groups might make up higher level graphs.

    Types of Changes

    • [x] Documentation update
    • [ ] Bug fix
    • [x] New feature
    • [x] Breaking change1

    1The change breaks (or has the potential to break) existing functionality.

    Checklist:

    If necessary:

    • [ ] I have updated the API documentation as part of the package doc-strings.
    • [x] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
    • [ ] I have updated the changelog.
    groups 
    opened by b-butler 68
  • Feature/enable aggregate operations

    Feature/enable aggregate operations

    This pull request refactors the current job-operation model into jobs-operations, that means each operation is a function of one or more jobs.

    Prior to merging we need to tackle the following items:

    • [ ] Implement parallelized status update
    • [ ] Update changelog
    • [ ] Deprecate JobOperation
    • [ ] Ensure that the scheduler status is updated prior to submission.
    GSoC aggregation 
    opened by csadorf 25
  • Adopt pytest as testing framework

    Adopt pytest as testing framework

    Description

    Adopting pytest as testing framework. This version of my code outputs an assertion error.

    Motivation and Context

    After this change, We'll be using pytest instead of unittest to perform the unit tests.

    Types of Changes

    • [ ] Documentation update
    • [ ] Bug fix
    • [x] New feature
    • [ ] Breaking change1

    1The change breaks (or has the potential to break) existing functionality.

    Checklist:

    If necessary:

    • [ ] I have updated the API documentation as part of the package doc-strings.
    • [ ] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
    • [x] I have updated the changelog.
    enhancement 
    opened by kidrahahjo 23
  • Deprecation Warning for --cmd option in script

    Deprecation Warning for --cmd option in script

    Description

    Added eligibility check for operations generated when cmd is provided in main_script

    Motivation and Context

    Fixes #218

    Types of Changes

    • [ ] Documentation update
    • [ ] Bug fix
    • [ ] New feature
    • [x] Breaking change1

    1The change breaks (or has the potential to break) existing functionality.

    Checklist:

    If necessary:

    • [ ] I have updated the API documentation as part of the package doc-strings.
    • [ ] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
    • [ ] I have updated the changelog.
    opened by vishav1771 22
  • issue #113, create markdown with jinja template in status view

    issue #113, create markdown with jinja template in status view

    Description

    Motivation and Context

    see issue #113

    Types of Changes

    • [ ] Documentation update
    • [ ] Bug fix
    • [x] New feature
    • [ ] Breaking change1

    1The change breaks (or has the potential to break) existing functionality.

    Checklist:

    If necessary:

    • [x] I have updated the API documentation as part of the package doc-strings.
    • [ ] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
    • [ ] I have updated the changelog.
    enhancement 
    opened by zhou-pj 22
  • Implement execution hooks framework

    Implement execution hooks framework

    Description

    This pull request implements an execution hooks framework that enables users to execute certain functions on start, on finish, on success, and on fail of the execution of an operation.

    Resolves #28, resolves #14.

    Motivation and Context

    The motivation for this framework is to enable users to automatically execute certain functions with operations for example for logging and tracking purposes. For example, to log when a specific operation is executed, a user can provide the following function:

    @Project.operation
    @Project.hooks.on_start(lambda op: logger.info(f"Executing {op}..."))
    def foo(job):
        pass
    

    The framework comes with a selection of pre-implemented hook systems to cover all the currently expected use cases, roughly in order of expensiveness:

    1. LogOperation - Log basic information about the execution of operations to a log file.
    2. TrackOperations - Keep detailed metadata records about the state of the project root directory and the operation, including directives, to a log file, optionally in conjunction with git.
    3. SnapshotProject - Create a snapshot of the project root directory to keep track of the code used for the execution of an operation.
    4. TrackWorkspaceWithGit - The workspace is treated as a git repository and automatically committed to before and after the execution of an operation. The can be done either on a per-job basis or workspace global.

    A hook system is meant to describe a collection of hook functions that together achieve a specific purpose. The shipped hook systems are implemented as classes that can be installed project-wide via the respective install_hooks function.

    High-level API

    Hooks can be installed either on a per-operation or a per-project basis. In this way users have the option to execute hook functions either with specific operations or with all operations of a project.

    Furthermore, there are two ways that hooks can be installed. One way is directly in Python, for example within the __main__ clause of the project.py file:

    # ...
    
    if __name__ == '__main__':
        from flow.hooks import LogOperations
        LogOperations().install_hooks(Project()).main()
    

    Alternatively, hooks can also be installed through the (project) configuration file:

    project = my_project
    [flow]
    hooks=flow.hooks.LogOperations,flow.hooks.SnapshotProject(compress=True)
    

    It is assumed that the entities provided here, are callable, i.e., are either a function or a functor. The first argument must be the instance of FlowProject.

    Low-level API

    The FlowProject class has a hooks attribute with four lists: on_start, on_finish, on_success, and on_fail, which can be appended to. Hook functions installed in this way are executed for all operations.

    Furthermore, hook functions can be installed for individual operations either with a decorator:

    @Project.operation
    @Project.hook.on_start(my_hook_function)
    def op(job):
        pass
    

    or by passing a dictionary to the add_operation() function: project.add_operation(..., hooks=dict(on_start=my_hook_function)).

    Custom hook systems

    Users can very easily implement their own hook systems and install them in similar manner. For example:

    # myhooks.py
    def log_op(operation):
        with open(operation.job.fn('operations.log'), 'a') as logfile:
            logfile.write(f"Executed operation '{operation}'.")
    
    def install(project):
        project.hooks.on_success.append(log_op)
        return project
    

    This could then be installed either in the project module:

    # ...
    
    if __name__ == '__main__':
        import myhooks
        myhooks.install(Project()).main()
    

    or via the configuration file:

    project = myproject
    [flow]
    hooks=myhooks.install
    

    Types of Changes

    • [ ] Documentation update
    • [ ] Bug fix
    • [x] New feature
    • [ ] Breaking change1

    1The change breaks (or has the potential to break) existing functionality.

    Checklist:

    If necessary:

    • [x] I have updated the API documentation as part of the package doc-strings.
    • [ ] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
    • [x] I have updated the changelog.
    enhancement 
    opened by csadorf 21
  • show_traceback = on by default

    show_traceback = on by default

    Feature description

    Related to #61 and #144, I think it would be good if show_traceback = on by default.

    Proposed solution

    Change default behavior to enable traceback.

    @glotzerlab/signac-developers What are your thoughts about this potential change? The tracebacks are sometimes a little complicated, but I think it's reasonable to expect that Python scripts will show a traceback on error.

    enhancement 
    opened by bdice 19
  • Add default aggregate support to flow

    Add default aggregate support to flow

    Description

    This pull request adds the support of aggregator class in project.py

    Motivation and Context

    Types of Changes

    • [ ] Documentation update
    • [ ] Bug fix
    • [x] New feature
    • [ ] Breaking change1

    1The change breaks (or has the potential to break) existing functionality.

    Checklist:

    If necessary:

    • [ ] I have updated the API documentation as part of the package doc-strings.
    • [ ] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
    • [ ] I have updated the changelog.
    GSoC aggregation 
    opened by kidrahahjo 18
  • Conditional use of Pool and ThreadPool

    Conditional use of Pool and ThreadPool

    We can now use Pool or ThreadPool conditionally.

    Description

    This pull request is a consequence of #269, which was accidentally closed by me.

    Motivation and Context

    Fixes #264

    Types of Changes

    • [ ] Documentation update
    • [ ] Bug fix
    • [x] New feature
    • [ ] Breaking change1

    1The change breaks (or has the potential to break) existing functionality.

    Checklist:

    If necessary:

    • [ ] I have updated the API documentation as part of the package doc-strings.
    • [ ] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
    • [ ] I have updated the changelog.
    opened by kidrahahjo 18
  • CUDA initialized before forking

    CUDA initialized before forking

    Description

    I am trying to integrate fbpic, a well-known CUDA code (based on Python + Numba) for laser-plasma simulation with signac. The integration repo is signac-driven-fbpic.

    I managed to succesfully run on a single GPU, via python3 src/project.py run from inside the signac folder, but if I add --parallel I get

    numba.cuda.cudadrv.error.CudaDriverError: CUDA initialized before forking
    

    The goal is to get 8 (independent) copies of fbpic (with different input params) running in parallel on the 8 NVIDIA P100 GPUs that are on the same machine.

    To reproduce

    Clone the signac-driven-fbpic repo and follow the install instructions. Then go to the signac subfolder, and do

    conda activate signac-driven-fbpic
    python3 src/init.py
    python3 src/project.py run --parallel
    

    Error output

    (signac-driven-fbpic) [email protected]:~/Development/signac-driven-fbpic/signac$ python3 src/project.py run --parallel --show-traceback
    Using environment configuration: UnknownEnvironment
    Serialize tasks|----------------------------------------------------------------------------------Serialize tasks|#####-----------------------------------------------------------------------------Serialize tasks|##########------------------------------------------------------------------------Serialize tasks|###############-------------------------------------------------------------------Serialize tasks|####################--------------------------------------------------------------Serialize tasks|##########################--------------------------------------------------------Serialize tasks|###############################---------------------------------------------------Serialize tasks|####################################----------------------------------------------Serialize tasks|#########################################-----------------------------------------Serialize tasks|###############################################-----------------------------------Serialize tasks|####################################################------------------------------Serialize tasks|#########################################################-------------------------Serialize tasks|##############################################################--------------------Serialize tasks|###################################################################---------------Serialize tasks|#########################################################################---------Serialize tasks|##############################################################################----Serialize tasks|##################################################################################Serialize tasks|##################################################################################Serialize tasks|##############################################################################################|100%
    ERROR: Encountered error during program execution: 'CUDA initialized before forking'
    Execute with '--show-traceback' or '--debug' to get more information.
    multiprocessing.pool.RemoteTraceback: 
    """
    Traceback (most recent call last):
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/multiprocessing/pool.py", line 119, in worker
        result = (True, func(*args, **kwds))
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 2727, in _fork_with_serialization
        project._fork(project._loads_op(operation))
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 1467, in _fork
        self._operation_functions[operation.name](operation.job)
      File "src/project.py", line 172, in run_fbpic
        verbose_level=2,
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/fbpic/main.py", line 232, in __init__
        n_guard, n_damp, None, exchange_period, use_all_mpi_ranks )
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/fbpic/boundaries/boundary_communicator.py", line 267, in __init__
        self.d_left_damp = cuda.to_device( self.left_damp )
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/devices.py", line 212, in _require_cuda_context
        return fn(*args, **kws)
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/api.py", line 103, in to_device
        to, new = devicearray.auto_device(obj, stream=stream, copy=copy)
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/devicearray.py", line 683, in auto_device
        devobj = from_array_like(obj, stream=stream)
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/devicearray.py", line 621, in from_array_like
        writeback=ary, stream=stream, gpu_data=gpu_data)
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/devicearray.py", line 102, in __init__
        gpu_data = devices.get_context().memalloc(self.alloc_size)
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 697, in memalloc
        self._attempt_allocation(allocator)
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 680, in _attempt_allocation
        allocator()
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 695, in allocator
        driver.cuMemAlloc(byref(ptr), bytesize)
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 290, in safe_cuda_api_call
        self._check_error(fname, retcode)
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 324, in _check_error
        raise CudaDriverError("CUDA initialized before forking")
    numba.cuda.cudadrv.error.CudaDriverError: CUDA initialized before forking
    """
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "src/project.py", line 238, in <module>
        Project().main()
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 2721, in main
        _exit_or_raise()
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 2689, in main
        args.func(args)
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 2414, in _main_run
        run()
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/legacy.py", line 193, in wrapper
        return func(self, jobs=jobs, names=names, *args, **kwargs)
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 1597, in run
        np=np, timeout=timeout, progress=progress)
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 1421, in run_operations
        pool, cloudpickle, operations, progress, timeout)
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 1458, in _run_operations_in_parallel
        result.get(timeout=timeout)
      File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/multiprocessing/pool.py", line 644, in get
        raise self._value
    numba.cuda.cudadrv.error.CudaDriverError: CUDA initialized before forking
    

    Relevant numba link.

    System configuration

    • Operating System: Ubuntu 16.04
    • Version of Python: 3.6.8
    • Version of signac: 1.1.0
    • Version of signac-flow: 0.7.1
    • NVIDIA Driver Version: 410.72
    enhancement expertise needed 
    opened by berceanu 18
  • Add a class for Directives

    Add a class for Directives

    By - @b-butler

    Adds two classes Directives and DirectivesItem that serve as a smart mapping for the environment and user-specified directives and a specification for environmental directives respectively. FlowProject and FlowGroup have been changed accordingly.

    Description

    Motivation and Context

    This resolves #265 and helps to centralize logic for directives. This pull request is a necessary follow-up for #282. Also resolves #240.

    Types of Changes

    • [ ] Documentation update
    • [ ] Bug fix
    • [x] New feature
    • [ ] Breaking change1

    1The change breaks (or has the potential to break) existing functionality.

    Tasks to accomplish:

    • [x] Test the individual DirectivesItems.
    • [x] Test the Directives class.
    • [x] Determine and add environment specific DirectivesItem and correct their get_default_directives function (this at least at the decision level will likely involve a discussion between multiple people).
    • [x] Some probably large degree of code refactoring (I was just initially trying to get the outline working and there).
    • [ ] Opening a PR in signac-docs to update documentation.
    • [x] Going through docstrings and ensuring they are complete, grammatically correct, and helpful. Multiple methods will need these as well.
    • [x] Add tracking of user specified directives. Previously TrackGetItemDict found in flow/util/misc.py was used. We can use this internally or create our own fix.
    • [x] Fix code so tests pass

    Checklist:

    If necessary:

    • [ ] I have updated the API documentation as part of the package doc-strings.
    • [ ] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
    • [x] I have updated the changelog.
    enhancement GSoC directives 
    opened by kidrahahjo 17
  • Move to a fully pyproject.toml based build

    Move to a fully pyproject.toml based build

    Description

    This PR removes setup.py and setup.cfg entirely, migrating all project and build configuration information into pyproject.toml. In the process, all linter configs have also been moved into pyproject.toml. The exception is flake8, which does not (and will not) support pyproject.toml, so the flake8 configuration is now stored in the .flake8 file, which is specific to this linter. Additionally, bump2version also does not support pyproject.toml (although unlike flake8 the proposal has not been entirely rejected, so it may eventually), so that configuration has also been moved to a project-specific .bumpversion file.

    Motivation and Context

    Various changes to Python packaging over the last 6 or 7 years have moved towards more static packaging and towards storing data in a backend-agnostic format. These changes allow these of setuptools alternatives (like flit) as well as more reproducible builds based on build isolation into virtual environments that provide all necessary build dependencies. Direct invocation of setup.py has been deprecated in the process. The changes in this PR modernize flow's build system for compatibility with these new approaches.

    Checklist:

    opened by vyasr 1
  • Allow setting account via flag when submitting

    Allow setting account via flag when submitting

    Feature description

    Discussed with @joaander today.

    I would like to be able to set the account on the command line when submitting jobs through flow.

    I'm setting it via environment variable, so signac-flow gives me this warning. I would like to be able to silence this warning.

    Environment 'DeltaEnvironment' allows the specification of an account name that will be charged for jobs' compute time.
    Set the account name with the command:
    
      $ signac config --global set 'flow.DeltaEnvironment.account' ACCOUNT_NAME
    
    

    Proposed solution

    This may be a good first issue.

    Additional context

    Other ways to set account:

    • environment variable
    • flow config
    • custom template
    opened by cbkerr 0
  • fix: Multi-node GPU summissions for greatlakes and picotte.

    fix: Multi-node GPU summissions for greatlakes and picotte.

    Description

    Fixes logic where the --ntasks-per-node would not normalize based on number of nodes for GPU submissions where the number of tasks is often the number of GPUs.

    Motivation and Context

    Resolved: #566

    Checklist:

    opened by b-butler 3
  • Change Conditions Execution Order

    Change Conditions Execution Order

    Description

    Currently we execute operation decorators from outside in or top bottom. This is confusing when using decorators as functions directly.

    @FlowProject.operation
    def op(job):
        pass
    
    
    FlowProject.pre(expesive_cond)(FlowProject.pre.true("foo")(op))
    

    This actually runs the expensive computation first. This is to make this

    @FlowProject.pre.true("foo")
    @FlowProject.pre(expensive_cond)
    @FlowProject.operation
    def op(job):
        pass
    

    more intuitive. However, I disagree that this is more intuitive. As someone learns Python in fact this begins to become less and less intuitive to the point that without our documentation suggesting the correct ordering, a Python expert would write,

    @FlowProject.pre(expensive_cond)
    @FlowProject.pre.true("foo")
    @FlowProject.operation
    def op(job):
        pass
    

    Given our recent decorator ordering requirements we are already making users come to understand decorators apply bottom to top.

    Suggestion

    I think we should apply conditions in the order they come in the execution (i.e. bottom to top). As an irrelevant side note this would make the project definition run faster.

    opened by b-butler 1
  • Add GitHub Actions.

    Add GitHub Actions.

    Description

    Migrates signac-flow's CI to use GitHub Actions.

    This PR drops coverage for a few things that I think are non-essential:

    • Nightly/weekly testing of pip install signac signac-flow and conda install signac signac-flow
    • Testing against the latest commit of signac (which is currently disabled because it's broken until we release 2.0)
    • Checking Zenodo metadata on release/.* branches.

    If another contributor wishes to add these to GitHub Actions, that would be fine.

    Before merging, a repo administrator will need to update the CI checks used for branch protections.

    Checklist:

    opened by bdice 1
  • Added the no-progress flag for project.py status to hide the progress…

    Added the no-progress flag for project.py status to hide the progress…

    Description

    Added the no-progress flag for FlowProject.print_status. This hides the progress bar output generated when FlowProject.print_status, and adds the options to hide the progress bar when desired.

    Motivation and Context

    The progress bar displays a significant amount of the output for small workspaces and does not give any benefit. Additionally, using FlowProject.print_status in Jupyter notebooks requires special configurations or else it does not work. This addresses issue #602

    Checklist:

    opened by iblanco11981870 1
Releases(v0.23.0)
  • v0.23.0(Dec 9, 2022)

    Version 0.23.0

    2022-12-09

    Added


    • Official Python 3.11 support (#697).
    • The flow.FlowProject.operation decorator now has an aggregator keyword argument: @FlowProject.operation(aggregator=aggregator.groupsof(2)) (#681).
    • The FlowGroupEntry class can be called with a directives keyword argument: FlowGroupEntry(directives={...}) (#696).

    Changed


    • Deprecated using flow.aggregate.aggregator as a decorator.
    • Deprecated placing @FlowProject.pre and @FlowProject.post before the FlowProject.operation decorator (#690).
    • Require signac version 1.8.0 (#693).
    • Deprecated alias CLI argument to flow init (#693).
    • Algorithm for computing cluster job ids (#695).
    • Deprecated FlowGroupEntry.with_directives in favor of a directives keyword argument in FlowGroupEntry()(#696).

    Fixed


    • Detecting correct environment on Delta GPU nodes (#682).
    • Identical aggregates are used only once in submission and running (#694, #688).

    Removed


    • show_traceback from CLI and config (#690).
    • Formatting the output of a FlowCmdOperation (#686).
    • @flow.cms and flow.with_job (#686, #669).
    • @FlowProject.operation.with_directives (#686).
    • The flow.testing module (#691, #692).
    Source code(tar.gz)
    Source code(zip)
  • v0.22.0(Oct 14, 2022)

    Version 0.22

    [0.22.0] -- 2022-10-14

    Added

    • Support for formatting with operation function arguments for FlowCmdOperation (#666, #630).
    • The CLI status command can show document parameters by using flag -p doc.PARAM (#667).
    • FlowProject.operation now has cmd, with_job, and directives keyword only arguments (#679, #655, #669).

    Changed

    • Deprecated formatting the output of a FlowCmdOperation (#666, #630).
    • @flow.cmd and flow.with_job are deprecated (#679, #669, #665).
    • @FlowProject.operation.with_directives is deprecated (#679, #665).
    • Deprecated the --show-traceback option for flow's CLI run and submitcommands (#674, #169).
    • flow CLI run and submit show tracebacks by default (#674, #169).
    • Broke TestBidict and TestTemplateFilters into smaller and simpler functions (#626).
    Source code(tar.gz)
    Source code(zip)
  • v0.21.0(Aug 18, 2022)

    Version 0.21

    [0.21.0] -- 2022-08-18

    Added

    • XSEDE Delta environment and template (#658).

    Changed

    • Changed get_config_value template filter to error without default on missing key (#649).
    • Changed get_config_value template filter now takes a FlowProject as its first argument (#649).

    Removed

    • Removed require_config_value template filter (#649).
    • Removed configuration key 'flow.import_packaged_environments' (#653).
    • Removed configuaration key 'flow.environment_modules' (#651).
    Source code(tar.gz)
    Source code(zip)
  • v0.20.0(Jun 23, 2022)

    Version 0.20

    [0.20.0] -- 2022-06-23

    Added

    • Added support to run aggregate operations in parallel (#642, #644).
    • Added an argument, run_options, to FlowProject.make_group which allows passing options to exec for operations running in a different process (#650).

    Changed

    • Deprecated configuaration key 'flow.environment_modules' (#651).
    • Deprecated configuration key 'flow.import_packaged_environments' (#651).
    • Changed argument options to submit_options in FlowProject.make_group (#650).

    Removed

    • Dropped support for cloudpickle versions below 1.6.0 (#644).
    • Removed upper bound on python_requires (#654).
    Source code(tar.gz)
    Source code(zip)
  • v0.19.0(Apr 8, 2022)

    Version 0.19

    This release changes the names of hook triggers and improves the behavior of progress bars in Jupyter notebooks. The minimum supported Python version is now 3.8. We also welcome the first released contributions from @rohanbabbar04!

    [0.19.0] -- 2022-04-07

    Changed

    • Dropped support for tqdm versions older than 4.60.0 (#614).
    • Renamed hook triggers on_finish to on_exit and on_fail to on_exception (#612, #627).

    Fixed

    • Progress bars shown in notebooks fall back to text-based output if ipywidgets is not available (#602, #614).

    Removed

    • Internal utility functions have been removed from the public API (#615).
    • Dropped support for Python 3.6 and Python 3.7 following the recommended support schedules of NEP 29 (#628).
    • Dropped support for Jinja2 versions below 3.0.0 (#628).
    Source code(tar.gz)
    Source code(zip)
  • v0.18.1(Feb 14, 2022)

    Version 0.18.1

    Hey signac users, we fixed some bugs to make signac-flow work better for you. Happy Valentine's Day from the signac team! 🌹

    [0.18.1] -- 2022-02-14

    Fixed

    • Fixed bug in project status output when no operations are eligible (#607, #609).
    • Improved traceback handling for errors in signac-flow (#608).
    Source code(tar.gz)
    Source code(zip)
  • v0.18.0(Feb 4, 2022)

    Version 0.18

    [0.18.0] -- 2022-02-03

    Added

    • Feature to install execution hooks for the automated execution of functions with operations (#28, #189, #508).

    Changed

    • Add user defined FlowGroups to status output (#547, #593).
    • Raise UserOperationError for failed execution of FlowProject operations (#571, #601).

    Fixed

    • Fix issue with GPU submission on Bridges-2 (#588, #589).
    Source code(tar.gz)
    Source code(zip)
  • v0.17.0(Nov 16, 2021)

    This release adds Python 3.10 support and addresses a couple bugs. Thanks to all the contributors in this release! :art:

    Version 0.17

    [0.17.0] -- 2021-11-15

    Added

    • Add official support for Python version 3.10 (#578).

    Fixed

    • Scripts are now generated correctly when project path contains spaces and special characters (#572).
    • XSEDE Expanse template has been fixed to remove leading spaces (#575, #576).

    Changed

    • FlowProject configuration is now validated independently of signac Project configuration (#573).
    • jsonschema is now a dependency (#573).
    Source code(tar.gz)
    Source code(zip)
  • v0.16.0(Aug 20, 2021)

    This release adds support for the XSEDE Expanse Cluster, simplifies user customization of templates, improves documentation, and fixes minor bugs. Thanks to everyone who contributed :art:

    Added

    • The --job-output command line flag for submission can be set for SLURM, PBS, and LSF schedulers (#550).
    • Added a custom_content block to templates for user customization (#553, #520).
    • Added official support for XSEDE Expanse cluster (#537).
    • Added FlowProjectDefinitionError for exceptions involving FlowProject definitions (#549).

    Changed

    • Improved documentation of directives (#554).
    • Raise FlowProjectDefinitionError error for inaccurate FlowProject definitions (#549).

    Removed

    • Removed deprecated environment classes (#556).
    • Removed support for signac < 1.3.0 (#558).
    • Removed support for decommissioned XSEDE Comet cluster (#537).
    • Removed --memory option from University of Michigan Great Lakes cluster submission. Use directives instead (#563).
    Source code(tar.gz)
    Source code(zip)
  • v0.15.0(Jun 24, 2021)

    This release adds the ability for operations to operate on subset of jobs known as aggregates and improves submissions to HPC clusters. Thanks to @klywang as a first-time contributor for this release. :art:

    Added

    • Add support for aggregation (operations acting on multiple jobs) via flow.aggregator (#464, #516, #542).
    • Add official support for Andes cluster (#500).
    • Decorator for setting directives while registering operation function FlowProject.operation.with_directives (#309, #502).
    • Add new flow command flow template create for automatic creation of custom templates (#520, #534).

    Changed

    • Jinja templates are indented for easier reading (#461, #495).
    • flow.directives is deprecated in favor of flow.FlowProject.operation.with_directives (#309, #502).
    • All environments require a scheduler in order to submit, even in pretend mode (#533).
    • Submitting in pretend mode will show additional scheduler command information (#533).
    • Support fractional timeout value in Python and command line interface (#541).

    Fixed

    • Errors raised during submission were not being shown to users (#517, #518).
    • Fixed dependency flag for SLURM submissions (#530, #531).
    Source code(tar.gz)
    Source code(zip)
  • v0.14.0(Apr 27, 2021)

    This release improvements to status output, documentation on directives, and fixed regressions in version 0.13. Thanks to @Charlottez112 as a first time contributor and the other 4 people who contributed code for this release. :art:

    Added

    • Documentation for all directives (#480).
    • Defined validators for the fork directive (#480).
    • Submission summary now appears in FlowProject status output, showing the number of queued, running, unknown statuses. (#472, #488).
    • Status overview now shows the number of jobs with incomplete operations and totals for the label overviews (#481, #501).

    Changed

    • Renamed TorqueEnvironment and related classes to PBSEnvironment (#388, #491).
    • LSF and SLURM schedulers will appear to be present if the respective commands bjobs -V or sbatch --version exit with a non-zero error code (#498).
    • Only known JobStatus values will be written to the project document, to save space and writing time (#503).

    Fixed

    • Strictly enforce that operation functions cannot be used as condition functions (and vice-versa) and prevent the registration of two operations with the same name (#496).
    • Changed default value of status_parallelization to none, to avoid bugs in user code caused by thread parallelism and overhead in process parallelism (#486).
    • Memory directives are converted to an integer number of gigabytes or megabytes in submission scripts (#482, #484).
    • Fixed behavior of --only-incomplete-operations (#481, #501).

    Removed

    • Removed FlowProject.add_operation (#479, #487).
    • Removed deprecated --walltime argument (#478).
    • Removed deprecated flow.run interface (#478).
    • Removed deprecated FlowProject.export_job_statuses (#478).
    • Removed deprecated script feature, use submit --pretend instead (#478).
    • Removed deprecated CPUEnvironment, GPUEnvironment classes (#478).
    Source code(tar.gz)
    Source code(zip)
  • v0.13.0(Mar 17, 2021)

    This release adds support for the new Bridges-2 cluster, expands the use of directives to include memory and walltime requests, removes deprecated features, and fixes bugs. Thanks to the 7 people who contributed code to this release, including first-time contributors @berceanu and @adgnabro!

    Added

    • Add official support for Bridges-2 cluster (#441).
    • Add support for memory requests via directives (#258, #466).
    • Add support for walltime requests via directives, deprecated --walltime argument to submit (#240, #476).

    Fixed

    • Support for multi-line @flow.cmd operations (#451, #453).
    • FlowProject status shows labels and correct number of jobs for projects with zero operations (#454, #460).

    Removed

    • Removed public API of deprecated class JobOperation (#445).
    • Removed public API of deprecated methods eligible and complete of BaseFlowOperation and FlowGroup (#445).
    • Removed configuration option use_buffered_mode (#445).
    • Removed public API of script, next_operations and submit_operations of FlowProject (#445).
    • Removed support for decommissioned Bridges cluster (#441).
    • Removed support for memory command line argument in submit (#466).
    Source code(tar.gz)
    Source code(zip)
  • v0.12.0(Jan 30, 2021)

    This release includes a wide range of performance improvements and internal refactoring that will enable the addition of an "aggregation" feature in subsequent releases (not yet available). Performance of a sample workflow that checks status, runs, and submits a FlowProject with 1000 jobs, 3 operations, and 2 label functions has improved roughly 4x compared to the 0.11.0 release.

    Added

    • Code is formatted with black and isort pre-commit hooks (#365).
    • Add official support for Python version 3.9 (#365).
    • Documentation has been added for all public classes and methods (#387, #389).
    • Added internal support for aggregates of jobs (#334, #348, #351, #364, #383, #390, #415, #422, #430).
    • Added code coverage to continuous integration (#405).

    Changed

    • Command line interface always uses --job-id instead of --jobid (#363, #386).
    • CPUEnvironment and GPUEnvironment classes are deprecated (#381).
    • Docstrings are now written in numpydoc style (#392).
    • Default environment for the University of Minnesota Mangi cluster changed from SLURM to Torque (#393).
    • Run commands are evaluted lazily (#70, #396).
    • Deprecated method export_job_statuses (#402).
    • Improved internal caching of scheduler status (#410).
    • Refactored status fetching code (#368, #417).
    • Optimization: Directives are no longer deep-copied (#420, #421).
    • The use_buffered_mode config option is deprecated. Buffering is always internally enabled (#425).
    • Evaluate directives when called instead of when defined (#398, #402).
    • Various internal refactorings and optimizations (#371, #373, #374, #375, #376, #377, #378, #379, #380, #400, #410, #416, #423, #426).
    • Scheduler is now an abstract base class (#426).
    • flow.scheduling.fakescheduler has been renamed to flow.scheduling.fake_scheduler (#426).
    • Arguments to submit have been changed for all scheduler classes (#426).
    • Python 3.6 is only tested with oldest dependencies (#436).
    • Drop support for tqdm versions older than 4.48.1 (#436, #440).
    • Drop support for Jinja2 versions older than 2.10.0 (#436).

    Fixed

    • Ensure that directives are always evaluated before running or submitting (#408, #409).
    • Cache the fully qualified domain name during environment detection to fix a performance issue on macOS (#339, #394).
    • Ensure that next CLI command displays eligible jobs for the exact operation name provided (#443).

    Removed

    • Removed the deprecated method flow.util.misc.write_human_readable_statepoints (#397).
    • Removed the deprecated argument --no-parallelize (#424).
    • Removed the deprecated env argument from submission methods (#424).
    • flow.render_status.Renderer class has been removed. FlowProject.print_status no longer returns the renderer (#426).
    • Removed deprecated status.py module (#426).
    • Removed the --test argument from FlowProject.submit (#439).
    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(Oct 9, 2020)

    Added

    • Added classes _Directives and _Directive that serve as a smart mapping for directives specified by the environment or user (#265, #283).
    • Added support for pre-commit hooks (#333).
    • Add environment profile for University of Minnesota, Minnesota Supercomputing Institute, Mangi supercomputer (#353).

    Changed

    • Make FlowCondition class private (#307, #315).
    • Deprecate JobOperation class, make SubmissionJobOperation a private class and deprecate the following methods of FlowProject: script, run_operations, submit_operations, next_operations. (#313)
    • Deprecate the following methods: FlowGroup.eligible, FlowGroup.complete, BaseFlowOperation.eligible, BaseFlowOperation.complete (#337).

    Fixed

    • Serial execution on Summit correctly counts total node requirements (#342).
    • Fixed performance regression in job submission in large workspaces (#354).

    Removed

    • Drop support for Python 3.5 (#305). The signac project will follow the NEP 29 deprecation policy going forward.
    • Remove the deprecated methods always, make_bundles, and JobOperation.get_id (#312).
    Source code(tar.gz)
    Source code(zip)
  • v0.10.1(Aug 21, 2020)

    Fixed

    • Fix issue with the submission of bundled operations on cluster environments that do not allow slashes ('/') in cluster scheduler job names (#343).
    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Jun 27, 2020)

    Added

    • Add FlowGroup (one or more operations can be grouped within an execution environment) (#114).
    • Add official support for University of Michigan Great Lakes cluster (#185).
    • Add official support for Bridges AI cluster (#222).
    • Add IgnoreConditions option for submit(), run() and script() (#38).
    • Add pytest support for Testing Framework (#227, #232).
    • Add markdown and html format support for print_status() (#113, #163).
    • Add memory flag option for default Slurm scheduler (#256).
    • Add optional environment variable to specify submission script separator (#262).
    • Add status_parallelization configuration to specify the parallelization used for fetching status (#264, #271).

    Changed

    • Raises ValueError when an operation function is passed to FlowProject.pre() and FlowProject.post(), or a non-operation function passed to FlowProject.pre.after() (#248, #249).
    • The option to provide the env argument to submit and submit_operations has been deprecated (#245).
    • The command line option --cmd for script has been deprecated and will trigger a DeprecationWarning upon use until removed (#243, #218).
    • Raises ValueError when --job-name is passed by the user because that interferes with status checking (#164, #241).
    • Submitting with --memory no longer assumes a unit of gigabytes on Bridges and Comet clusters (#257).
    • Buffering is enabled by default, improving the performance of status checks (#273).
    • Deprecate the use of no_parallelize argument while printing status (#264, #271).
    • Submission via the command-line interface now calls the FlowProject.submit function instead of bypassing it for FlowProject.submit_operations (#238, #286).
    • Updated Great Lakes GPU request syntax (#299).

    Fixed

    • Ensure that label names are used when displaying status (#263).
    • Fix node counting for large resource sets on Summit (#294).

    Removed

    • Removed ENVIRONMENT global variable in the flow.environment module (#245).
    • Removed vendored tqdm module and replaced it with a requirement (#247).
    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Jan 9, 2020)

    Added

    • Add official support for Python version 3.8 (#190, #210).
    • Add descriptive error message when tag is not set and cannot be autogenerated for conditions (#195).
    • Add "fork" directive to enforce the execution of operations within a subprocess (#159).
    • Operation graph detection based on function comparison (#178).
    • Exceptions raised during operations always show tracebacks of user code (#169, #171).

    Changed

    • Raise a warning when a condition's tag is not set and raise an error if this occurs during graph detection (#195).
    • Raise errors if a forked process or @cmd operation returns a non-zero exit code. (#170, #172).

    Removed

    • Drop support for Python version 2.7 (#157, #158, #201).
    • The "always" condition has been deprecated and will trigger a DeprecationWarning upon use until removed (#179).
    • Removed deprecated UnknownEnvironment in favor of StandardEnvironment (#204).
    • Removed support for decommissioned INCITE Titan and Eos computers (#204).
    • Removed support for the legacy Python-based submission script generation (#200).
    • Removed legacy compatibility layers for Python 2, signac < 1.0, and soft dependencies (#205).
    • Removed deprecated support for implied operation names with the run command (#205).
    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Sep 1, 2019)

    Added

    • Add feature for integrated profiling of status updates (status --profile) to aid with the optimization of a FlowProject implementation (#107, #110).
    • The status view is generated with Jinja2 templates and thus more easily customizable (#67, #111).
    • Automatically show an overview of the number of eligible jobs for each operation in status view (#134).
    • Allow the provision of multiple operation-functions to the pre.after and *.copy_from conditions (#120).
    • Add option to specify the operation execution order (#121).
    • Add a testing module to easily initialize a test project (#130).
    • Enable option to always show the full traceback with show_traceback = on within the [flow] section of the signac configuration (#61, #144).
    • Add full launcher support for job submission on XSEDE Stampede2 for large parallel single processor jobs (#85, #91).

    Fixes

    • Both the nranks and omp_num_threads directives properly support callables (#118).
    • Show submission error messages in combination with a TORQUE scheduler (#103, #104).
    • Fix issue that caused the "Fetching operation status" progress bar to be inaccurate (#108).
    • Fix erroneous line in the torque submission template (#126).
    • Ensure default parameter range detection in status printing succeeds for nested state points (#154).
    • Fix issue with the resource set calculation on INCITE Summit (#101).

    Changed

    • Packaged environments are now available by default. Set import_packaged_environments = off within the [flow] section of the signac configuration to revert to previous behavior.

    • The following methods of the FlowProject class have been deprecated and will trigger a DeprecationWarning upon use until their removal:

      • classify (use labels() instead)
      • next_operation (use next_operations() instead)
      • export_job_stati (replaced by export_job_statuses)
      • eligible_for_submission (removed without replacement)
      • update_aliases (removed without replacement)
    • The support for Python version 2.7 is deprecated.

    Removed

    • The support for Python version 3.4 has been dropped.
    • Support for signac version 0.9 has been dropped.
    Source code(tar.gz)
    Source code(zip)
  • v0.7.1(Mar 25, 2019)

    Added

    • Add function to automatically print all varying state point parameters in the detailed status view triggered by providing option -p/--parameters without arguments (#19, #87).
    • Add clear environment notification when submitting job scripts (#43, #88).

    Fixes

    • Fix issue where the scheduler status of job-operations would not be properly updated for ineligible operations (#96).

    Fixes (compute environments)

    • Fix issue with the TORQUE scheduler that occured when there was no job scheduled at all on the system (for any user) (#92, #93).

    Changed

    • The performance of status updates has been significantly improved (up to a factor of 1,000 for large data spaces) by applying a more efficient caching strategy (#94).
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Mar 14, 2019)

    Added

    • Add legend explaining the scheduler-related symbols to the detailed status view (#68).
    • Allow the specification of the number of tasks per resource set and additional jsrun arguments for Summit scripts.

    Fixes (general)

    • Fixes issue where callable cmd-directives were not evaluated (#47).
    • Fixes issue where the source file of wrapped functions was not determined correctly (#55).
    • Fix a Python 2.7 incompatibility and another unrelated issue with the TORQUE scheduler driver (#54, #81).
    • Fixes issue where providing the wrong argument type to Project.submit() would go undetected and lead to unexpected behavior (#58).
    • Fixes issue where using the buffered mode would lead to confusing error messages when condition-functions would raise an AttributeError exception.
    • Fixes issue with erroneous unused-directive-keys-warning.

    Fixes (compute environments)

    • Fixes issues with the Summit environment resource set calculation for parallel operations under specific conditions (#63).
    • Fix the node size specified in the template for the ORNL Eos system (#77).
    • Fixes issue with a missing --gres directive when using the GPU-shared partition on the XSEDE Bridges system (#59).
    • Fixed University of Michigan Flux hostname pattern to ignore the Flux Hadoop cluster (#82).
    • Remove the Ascent environment (host decommissioned).

    Note: The official support for Python 3.4 will be dropped beginning with version 0.8.0.

    Source code(tar.gz)
    Source code(zip)
Owner
Glotzer Group
We develop molecular simulation tools to study the self-assembly of complex materials and explore matter at the nanoscale.
Glotzer Group
This repo is dedicated to the data extraction and manipulation of the World Bank's database called STEP.

Overview Welcome to the Step-X repository. This repo is dedicated to the data extraction and manipulation of the World Bank's database called STEP. Be

Keanu Pang 0 Jan 20, 2022
A Big Data ETL project in PySpark on the historical NYC Taxi Rides data

Processing NYC Taxi Data using PySpark ETL pipeline Description This is an project to extract, transform, and load large amount of data from NYC Taxi

Unnikrishnan 2 Dec 12, 2021
A distributed block-based data storage and compute engine

Nebula is an extremely-fast end-to-end interactive big data analytics solution. Nebula is designed as a high-performance columnar data storage and tabular OLAP engine.

Columns AI 131 Dec 26, 2022
Pyspark project that able to do joins on the spark data frames.

SPARK JOINS This project is to perform inner, all outer joins and semi joins. create_df.py: load_data.py : helps to put data into Spark data frames. d

Joshua 1 Dec 14, 2021
Uses MIT/MEDSL, New York Times, and US Census datasources to analyze per-county COVID-19 deaths.

Covid County Executive summary Setup Install miniconda, then in the command line, run conda create -n covid-county conda activate covid-county conda i

Ahmed Fasih 1 Dec 22, 2021
Python package for analyzing behavioral data for Brain Observatory: Visual Behavior

Allen Institute Visual Behavior Analysis package This repository contains code for analyzing behavioral data from the Allen Brain Observatory: Visual

Allen Institute 16 Nov 04, 2022
Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

Larch: Data Analysis Tools for X-ray Spectroscopy and More Documentation: http://xraypy.github.io/xraylarch Code: http://github.com/xraypy/xraylarch L

xraypy 95 Dec 13, 2022
Intake is a lightweight package for finding, investigating, loading and disseminating data.

Intake: A general interface for loading data Intake is a lightweight set of tools for loading and sharing data in data science projects. Intake helps

Intake 851 Jan 01, 2023
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]

MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020] by Kaisiyuan Wang, Qianyi Wu, Linsen Song, Zhuoqian Yang, Wa

112 Dec 28, 2022
Show you how to integrate Zeppelin with Airflow

Introduction This repository is to show you how to integrate Zeppelin with Airflow. The philosophy behind the ingtegration is to make the transition f

Jeff Zhang 11 Dec 30, 2022
Weather analysis with Python, SQLite, SQLAlchemy, and Flask

Surf's Up Weather analysis with Python, SQLite, SQLAlchemy, and Flask Overview The purpose of this analysis was to examine weather trends (precipitati

Art Tucker 1 Sep 05, 2021
Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles

Correlation-Study-Climate-Change-EV-Adoption Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles I

Jonathan Feng 1 Jan 03, 2022
Exploring the Top ML and DL GitHub Repositories

This repository contains my work related to my project where I scraped data on the most popular machine learning and deep learning GitHub repositories in order to further visualize and analyze it.

Nico Van den Hooff 17 Aug 21, 2022
Analysis scripts for QG equations

qg-edgeofchaos Analysis scripts for QG equations FIle/Folder Structure eigensolvers.py - Spectral and finite-difference solvers for Rossby wave eigenf

Norman Cao 2 Sep 27, 2022
Extract data from a wide range of Internet sources into a pandas DataFrame.

pandas-datareader Up to date remote data access for pandas, works for multiple versions of pandas. Installation Install using pip pip install pandas-d

Python for Data 2.5k Jan 09, 2023
a tool that compiles a csv of all h1 program stats

h1stats - h1 Program Stats Scraper This python3 script will call out to HackerOne's graphql API and scrape all currently active programs for informati

Evan 40 Oct 27, 2022
An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify.

An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify. The ETL process flows from AWS's S3 into staging tables in AWS Redshift.

1 Feb 11, 2022
Incubator for useful bioinformatics code, primarily in Python and R

Collection of useful code related to biological analysis. Much of this is discussed with examples at Blue collar bioinformatics. All code, images and

Brad Chapman 560 Jan 03, 2023
Open source platform for Data Science Management automation

Hydrosphere examples This repo contains demo scenarios and pre-trained models to show Hydrosphere capabilities. Data and artifacts management Some mod

hydrosphere.io 6 Aug 10, 2021
Numerical Analysis toolkit centred around PDEs, for demonstration and understanding purposes not production

Numerics Numerical Analysis toolkit centred around PDEs, for demonstration and understanding purposes not production Use procedure: Initialise a new i

George Whittle 1 Nov 13, 2021