This tool parses log data and allows to define analysis pipelines for anomaly detection.

Overview

logdata-anomaly-miner Build Status DeepSource

This tool parses log data and allows to define analysis pipelines for anomaly detection. It was designed to run the analysis with limited resources and lowest possible permissions to make it suitable for production server use.

AECID Demo – Anomaly Detection with aminer and Reporting to IBM QRadar

Requirements

In order to install logdata-anomaly-miner a Linux system with python >= 3.6 is required. Debian-based distributions are currently recommended.

See requirements.txt for further module dependencies

Installation

Debian

There are Debian packages for logdata-anomaly-miner in the official Debian/Ubuntu repositories.

apt-get update && apt-get install logdata-anomaly-miner

From source

The following command will install the latest stable release:

cd $HOME
wget https://raw.githubusercontent.com/ait-aecid/logdata-anomaly-miner/main/scripts/aminer_install.sh
chmod +x aminer_install.sh
./aminer_install.sh

Docker

For installation with Docker see: Deployment with Docker

Getting started

Here are some resources to read in order to get started with configurations:

Publications

Publications and talks:

A complete list of publications can be found at https://aecid.ait.ac.at/further-information/.

Contribution

We're happily taking patches and other contributions. Please see the following links for how to get started:

Bugs

If you encounter any bugs, please create an issue on Github.

Security

If you discover any security-related issues read the SECURITY.md first and report the issues.

License

GPL-3.0

Comments
  • Multiline support

    Multiline support

    Since issue 372 was closed, I open a new issue for multiline support. See https://github.com/ait-aecid/logdata-anomaly-miner/issues/372

    As I mentioned in the issue, it would be good to have an optional EOL parameter in the config to support simple multiline logs that are clearly separable, e.g., by \n\n that otherwise does not occur. We could also think about supporting more advanced multiline logs, in particular, json formatted logs where each json object spans over several lines rather than a single line. This could be solved by counting brackets, i.e., the ByteStreamAtomizer increases a counter (initially set to 0) for every "{" and decreases it for every "}" (or any other user-defined characters), and passes a log_atom to the parser every time this counter reaches 0.

    enhancement 
    opened by landauermax 15
  • Allowlist and blocklist for detector path lists

    Allowlist and blocklist for detector path lists

    allowlisted_paths in ECD should be named blocklisted_paths, since these paths are not considered for detection.

    allowlisted_paths should also exist, but does the oppsite: Only when one of the paths in the logatom match dictionary contains one of the allowlisted_paths, analysis should be carried out.

    The attribute paths should overrule these lists.

    This feature should be available for all detectors that may be analyzing all available parser matches, such as the VTD.

    enhancement 
    opened by landauermax 15
  • Fix import warnings

    Fix import warnings

    /usr/lib/python3.6/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from spec or package, falling back on name and path

    return f(*args, **kwds)

    should not occur, when running the aminer.

    bug 
    opened by 4cti0nfi9ure 15
  • %z makes parsing way too slow

    %z makes parsing way too slow

    When using the %z in the parsing model (see slow.txt), I get around 50 lines per second. Without it I get around 1000 lines per second (see fast.txt). There is something wrong with parsing %z in the DateTimeModelElement.

    fast.txt slow.txt train.log config.py.txt

    bug high 
    opened by landauermax 12
  • added nullable functionality to JsonModelElements.

    added nullable functionality to JsonModelElements.

    Make sure these boxes are signed before submitting your Pull Request -- thank you.

    Must haves

    • [x] I have read and followed the contributing guide lines at https://github.com/ait-aecid/logdata-anomaly-miner/wiki/Git-development-workflow
    • [x] Issues exist for this PR
    • [x] I added related issues using the "Fixes #"-notations
    • [x] This Pull-Requests merges into the "development"-branch

    Fixes #1061 Fixes #1074

    Submission specific

    • [ ] This PR introduces breaking changes
    • [ ] My change requires a change to the documentation
    • [ ] I have updated the documentation accordingly
    • [ ] I have added tests to cover my changes
    • [ ] All new and existing tests passed

    Describe changes:

    opened by ernstleierzopf 11
  • Create backups of persistency

    Create backups of persistency

    There should be a parameter for the command line that backups the persistency in regular intervals. Also, there should be a command for the remote control that saves the persistency when executed.

    The persistency should be copied into a directory /var/lib/aminer/backup/yyyy-mm-dd-hh-mm-ss/...

    There should also be the possibility to restore configs, by remote control, config settings, etc.

    enhancement 
    opened by landauermax 11
  • Tabs in logs

    Tabs in logs

    My log file contains tabulators (e.g. System name:\tTESTNAME). However, the byte strings in the parsing models cannot interpret these tabulators (\t): FixedDataModelElement('fixed1', b'System name:\t'),

    How can I make it possible for the tabs to be interpreted correctly?

    opened by tschohanna 10
  • Add overall output for aminer

    Add overall output for aminer

    There should be a way to write everything that the AMiner outputs in a file. For example, in the beginning of the config, a parameter StandardOutput: "/etc/aminer/output.txt" can be set, where all the output (anomalies, errors, etc) is written to in addition to the usual output components. By default, it should be None and not write anything.

    enhancement 
    opened by landauermax 10
  • Warning if two detectors persist on same file

    Warning if two detectors persist on same file

    It is possible to define two detectors of the same type that will end up persisting in the same file - this can especially happen by accident, when the "Default" name is used. We should not prevent it completely, but at least print a warning when two or more detectors persist on the same file.

    enhancement 
    opened by landauermax 9
  • AtomFilterMatchAction YAML support

    AtomFilterMatchAction YAML support

    There should be a way to use a MatchRule so that only logs that match are forwarded to a specific detector, using the AtomFilterMatchAction. This can be done in python configs, but not in yaml configs. Also, tests and documentation is missing.

    enhancement high 
    opened by landauermax 8
  • Paths to JSON list elements

    Paths to JSON list elements

    I have this sample data:

    [email protected]:/home/ubuntu# cat file3.log 
    {"a": ["success", "a.png"]}
    {"a": ["success", "b.png"]}
    {"a": ["fail", "c.png"]}
    {"a": ["success", "c.png"]}
    

    The values in the list should be detected with a value detector. They should not be mixed, i.e., the first and second element in the list are independent.

    I use the following config to parse the file:

    LearnMode: True
    
    LogResourceList:
      - "file:///home/ubuntu/file3.log"
    
    Parser:  
           - id: x
             type: VariableByteDataModelElement
             name: 'x'
             args: '.abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGGHIJKLMNOPQRSTUVWXYZ'
    
           - id: json
             start: True
             type: JsonModelElement
             name: 'model'
             key_parser_dict:
               "a": 
                 - x
    
    Input:
            timestamp_paths: None
            verbose: True
            json_format: True
    
    Analysis:
            - id: vd
              type: NewMatchPathValueDetector
              paths:
                  - '/model/x'
              learn_mode: true
              persistence_id: test
    
    EventHandlers:
            - id: stpe
              json: true
              type: StreamPrinterEventHandler
    

    Note that I use a value detector on the list. The result is as follows:

    [email protected]:/home/ubuntu# cat /var/lib/aminer/NewMatchPathValueDetector/test 
    ["bytes:a.png", "bytes:c.png", "bytes:b.png"]
    

    Only the last value has been learned, but I also want to learn the first element in the array.

    I propose to model all elements of the lists as their own elements, so that the parser looks like this:

    Parser:
           - id: y
             type: FixedWordlistDataModelElement
             name: 'y'
             args:
               - 'success'
               - 'fail'
                 
           - id: x
             type: VariableByteDataModelElement
             name: 'x'
             args: '.abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGGHIJKLMNOPQRSTUVWXYZ'
    
           - id: json
             start: True
             type: JsonModelElement
             name: 'model'
             key_parser_dict:
               "a": 
                 - y
                 - x
    

    and the analysis could look like this, where each element can be addressed individually by an analysis component:

    Analysis:
            - id: vd
              type: NewMatchPathValueDetector
              paths:
                  - '/model/x'
              learn_mode: true
              persistence_id: test
    
            - id: vd
              type: NewMatchPathValueDetector
              paths:
                  - '/model/y'
              learn_mode: true
              persistence_id: test
    

    The current implementation uses a single element to model all elements of the list. This can also be convenient and should be possible by introducing a new element called ListOfElements. It should parse any number of elements in the list with the specified parsing model element. For example, the list of elements here is a list of variable byte elements:

    Parser:
           - id: loe
             type: ListOfElements
             name: 'loe'
             args: z
                 
           - id: z
             type: VariableByteDataModelElement
             name: 'z'
             args: '.abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGGHIJKLMNOPQRSTUVWXYZ'
    
           - id: json
             start: True
             type: JsonModelElement
             name: 'model'
             key_parser_dict:
               "a": 
                 - loe
    

    The ListOfElements element should then assign the index of the element in the JSON list at the end of the path. For example, the following paths can be used in the analysis section:

    Analysis:
            - id: vd
              type: NewMatchPathValueDetector
              paths:
                  - '/model/loe/0'
              learn_mode: true
              persistence_id: test
    
            - id: vd
              type: NewMatchPathValueDetector
              paths:
                  - '/model/loe/1'
              learn_mode: true
              persistence_id: test
    
    enhancement medium 
    opened by landauermax 8
  • extended FrequencyDetector wiki tests.

    extended FrequencyDetector wiki tests.

    Make sure these boxes are signed before submitting your Pull Request -- thank you.

    Must haves

    • [x] I have read and followed the contributing guide lines at https://github.com/ait-aecid/logdata-anomaly-miner/wiki/Git-development-workflow
    • [x] Issues exist for this PR
    • [x] I added related issues using the "Fixes #"-notations
    • [x] This Pull-Requests merges into the "development"-branch

    Fixes #1008 Fixes #1009

    Submission specific

    • [ ] This PR introduces breaking changes
    • [ ] My change requires a change to the documentation
    • [ ] I have updated the documentation accordingly
    • [ ] I have added tests to cover my changes
    • [ ] All new and existing tests passed

    Describe changes:

    opened by ernstleierzopf 0
  • fixed test26 so no fix definition number has to be added.

    fixed test26 so no fix definition number has to be added.

    Make sure these boxes are signed before submitting your Pull Request -- thank you.

    Must haves

    • [x] I have read and followed the contributing guide lines at https://github.com/ait-aecid/logdata-anomaly-miner/wiki/Git-development-workflow
    • [x] Issues exist for this PR
    • [x] I added related issues using the "Fixes #"-notations
    • [x] This Pull-Requests merges into the "development"-branch

    Fixes #1181

    Submission specific

    • [ ] This PR introduces breaking changes
    • [ ] My change requires a change to the documentation
    • [ ] I have updated the documentation accordingly
    • [ ] I have added tests to cover my changes
    • [ ] All new and existing tests passed

    Describe changes:

    opened by ernstleierzopf 0
  • Random test fails when new detector is added

    Random test fails when new detector is added

    When adding a new detector and running the tests, they usually fail at test26_filter_config_errors in YamlConfigTest.py as there is an integer that needs to be incremented. For example, see PR #1180 where this had to be fixed when adding a new detector. It is hard to spot why this test fails as it has nothing to do with the added detector and it is not an indicator of something that needs to be fixed. I therefore suggest to modify this test case so that no matter what integer comes after the "definition" keyword, the test passes. Then adding new detectors in the future should not make it necessary to always update this test.

    test medium 
    opened by landauermax 0
  • Add possibility to run some LogResources as json input and some as normal text input.

    Add possibility to run some LogResources as json input and some as normal text input.

    LogResourceList:
    
       - url: "file:///var/log/apache2/access.log"
       - url: "unix:///var/lib/akafka/aminer.sock"
         type: json  # Konfiguriert den ByteStream
         parser_id: kafka_audit_logs  # Konfiguriert den zugehörigen Parser
    
    
    Parser:
       - id: kafka_audit_logs
         type: AuditDingsParser
    
       - id: ApacheAccessModel
         start: true
    
    opened by ernstleierzopf 0
  • Shorten the build-time for docker builds

    Shorten the build-time for docker builds

    Currently the complete docker image is build at once. This takes a lot of time for each build. We could shorten the build time by inheriting from a pre-built image.

    enhancement 
    opened by whotwagner 0
Releases(V2.5.1)
  • V2.5.1(May 17, 2022)

    Bugfixes:

    • EFD: Fixed problem that appears with empty windows
    • Fixed index out of range if matches are empty in JsonModelElement array.
    • EFD: Fixed problem that appears with empty windows
    • EFD: Enabled immediate detection without training, if both limits are set
    • EFD: Fixed bug related to auto_include_flag
    • Remove spaces in aminer logo
    • ParserCounter: Fixed do_timer
    • Fixed code to allow the usage of AtomFilterMatchAction in yaml configs
    • Fixed JsonModelElement when json object is null
    • Fix incorrect message of charset detector
    • Fix match list handling for json objects
    • Fix incorrect message of charset detector

    Changes:

    • Added nullable functionality to JsonModelElements
    • Added include-directive to supervisord.conf
    • ETD: Output warning when count first exceeds range
    • EFD: Added option to output anomaly when the count first exceeds the range
    • VTD: Added variable type 'range'
    • EFD: Added the function reset_counter
    • EFD: Added option to set the lower and upper limit of the range interval
    • Enhance EFD to consider multiple time windows
    • VTD: Changed the value of parameter num_updates_until_var_reduction to track all variables from False to 0.
    • PAD: Used the binom_test of the scipy package as test if the model should be reinitialized if too few anomalies occur than are expected
    • Add ParsedLogAtom to aminer parser to ensure compatibility with lower versions
    • Added script to add build-id to the version-string
    • Support for installations from source in install-script
    • Fixed and stadardize the persistence time of various detectors
    • Refactoring
    • Improve performance
    • Improve output handling
    • Improved testing
    Source code(tar.gz)
    Source code(zip)
  • V2.5.0(Dec 6, 2021)

    Bugfixes:

    • Fixed bug in YamlConfig

    Changes:

    • Added supervisord to docker
    • Moved unparsed atom handlers to analysis(yamlconfig)
    • Moved new_match_path_detector to analysis(yamlconfig)
    • Refactor: merged all UnparsedHandlers into one python-file
    • Added remotecontrol-command for reopening eventhandlers
    • Added config-parameters for logrotation
    • Improved testing
    Source code(tar.gz)
    Source code(zip)
  • V2.4.2(Nov 24, 2021)

    Bugfixes:

    • PVTID: Fixed output format of previously appeared times
    • VTD: Fixed bugs (static -> discrete)
    • VTD: Fixed persistency-bugs
    • Fixed %z performance issues
    • Fixed error where optional keys with an array type are not parsed when being null
    • Fixed issues with JasonModelElement
    • Fixed persistence handling for ValueRangeDetector
    • PTSAD: Fixed a bug, which occurs, when the ETD stops saving the values of one analyzed path
    • ETD: Fixed the problem when entries of the match_dictionary are not of type MatchElement
    • Fixed error where json data instead of array was parsed successfully.

    Changes:

    • Added multiple parameters to VariableCorrelationDetector
    • Improved VTD
    • PVTID: Renamed parameter time_window_length to time_period_length
    • PVTID: Added check if atom time is None
    • Enhanced output of MTTD and PVTID
    • Improved docker-compose-configuration
    • Improved testing
    • Enhanced PathArimaDetector
    • Improved documentation
    • Improved KernelMsgParsingModel
    • Added pretty print for json output
    • Added the PathArimaDetector
    • TSA: Added functionality to discard arima models with too few log lines per time step
    • TSA: improved confidence calculation
    • TSA: Added the option to force the period length
    • TSA: Automatic selection of the pause area of the ACF
    • Extended EximGenericParsingModel
    • Extended AudispdParsingModel
    Source code(tar.gz)
    Source code(zip)
  • V2.4.1(Jul 23, 2021)

    Bugfixes:

    • Fixed issues with array of arrays in JsonParser
    • Fixed problems with invalid json-output
    • Fixed ValueError in DTME
    • Fixed error with parsing floats in scientific notation with the JsonModelElement.
    • Fixed issue with paths in JsonModelElement
    • Fixed error with \x encoded json
    • Fixed error where EMPTY_ARRAY and EMPTY_OBJECT could not be parsed from the yaml config
    • Fixed a bug in the TSA when encountering a new event type
    • Fixed systemd script
    • Fixed encoding errors when reading yaml configs

    Changes:

    • Add entropy detector
    • Add charset detector
    • Add value range detector
    • Improved ApacheAccessModel, AudispdParsingModel
    • Refactoring
    • Improved documentation
    • Improved testing
    • Improved schema for yaml-config
    • Added EMPTY_STRING option to the JsonModelElement
    • Implemented check to report unparsed atom if ALLOW_ALL is used with data with a type other than list or dict
    Source code(tar.gz)
    Source code(zip)
  • V2.4.0(Jun 10, 2021)

    Bugfixes:

    • Fixed error in JsonModelElement
    • Fixed problems with umlauts in JsonParser
    • Fixed problems with the start element of the ElementValueBranchModelElement
    • Fixed issues with the stat and debug command line parameters
    • Fixed issues if posix acl are not supported by the filesystem
    • Fixed issues with output for non ascii characters
    • Modified kafka-version

    Changes:

    • Improved command-line-options install-script
    • Added documentation
    • Improved VTD CM-Test
    • Improved unit-tests
    • Refactoring
    • Added TSAArimaDetector
    • Improved ParserCount
    • Added the PathValueTimeIntervalDetector
    • Implemented offline mode
    • Added PCA detector
    • Added timeout-paramter to ESD
    Source code(tar.gz)
    Source code(zip)
  • V2.3.1(Apr 8, 2021)

  • V2.3.0(Mar 31, 2021)

    Bugfixes:

    • Changed pyyaml-version to 5.4
    • NewMatchIdValueComboDetector: Fix allow multiple values per id path
    • ByteStreamLineAtomizer: fixed encoding error
    • Fixed too many open directory-handles
    • Added close() function to LogStream

    Changes:

    • Added EventFrequencyDetector
    • Added EventSequenceDetector
    • Added JsonModelElement
    • Added tests for Json-Handling
    • Added command line parameter for update checks
    • Improved testing
    • Splitted yaml-schemas into multiple files
    • Improved support for yaml-config
    • YamlConfig: set verbose default to true
    • Various refactoring
    Source code(tar.gz)
    Source code(zip)
  • V2.2.3(Feb 5, 2021)

  • V2.2.2(Jan 29, 2021)

  • V2.2.1(Jan 26, 2021)

    Bugfixes:

    • Fixed warnigs due to files in Persistency-Directory
    • Fixed ACL-problems in dockerfile and autocreate /var/lib/aminer/log

    Changes:

    • Added simple test for dockercontainer
    • Negate result of the timeout-command. 1 is okay. 0 must be an error
    • Added bullseye-tests
    • Make tmp-dir in debian-bullseye-test and debian-buster-test unique
    Source code(tar.gz)
    Source code(zip)
  • V2.2.0(Dec 23, 2020)

    Changes:

    • Added Dockerfile
    • Addes checks for acl of persistency directory
    • Added VariableCorrelationDetector
    • Added tool for managing multiple persistency files
    • Added supress-list for output
    • Added suspend-mode to remote-control
    • Added requirements.txt
    • Extended documentation
    • Extended yaml-configuration-support
    • Standardize command line parameters
    • Removed --Forground cli parameter
    • Fixed Security warnings by removing functions that allow race-condition
    • Refactoring
    • Ethical correct naming of variables
    • Enhanced testing
    • Added statistic outputs
    • Enhanced status info output
    • Changed global learn_mode behavior
    • Added RemoteControlSocket to yaml-config
    • Reimplemented the default mailnotificationhandler

    Bugfixes:

    • Fixed typos in documentation
    • Fixed issue with the AtomFilter in the yaml-config
    • Fixed order of ETD in yaml-config
    • Fixed various issues in persistency
    Source code(tar.gz)
    Source code(zip)
  • V2.1.0(Nov 5, 2020)

    • Changes:
      • Added VariableTypeDetector,EventTypeDetector and EventCorrelationDetector
      • Added support for unclean format strings in the DateTimeModelElement
      • Added timezones to the DateTimeModelElement
      • Enhanced ApacheAccessModel
      • Yamlconfig: added support for kafka stream
      • Removed cpu limit configuration
      • Various refactoring
      • Yamlconfig: added support for more detectors
      • Added new command-line-parameters
      • Renamed executables to aminer.py and aminerremotecontroly.py
      • Run Aminer in forgroundd-mode per default
      • Added various unit-tests
      • Improved yamlconfig and checks
      • Added start-config for parser to yamlconfig
      • Renamed config templates
      • Removed imports from init.py for better modularity
      • Created AnalysisComponentsPerformanceTests for the EventTypeDetector
      • Extended demo-config
      • Renamed whitelist to allowlist
      • Added warnings for non-existent resources
      • Changed default of auto_include_flag to false
    • Bugfixes:
      • Fixed some exit() in forks
      • Fixed debian files
      • Fixed JSON output of the AffectedLogAtomValues in all detectors
      • Fixed normal output of the NewMatchPathValueDetector
      • Fixed reoccuring alerting in MissingMatchPathValueDetector
    Source code(tar.gz)
    Source code(zip)
  • V2.0.2(Jul 17, 2020)

    • Changes:
      • Added help parameters
      • Added help-screen
      • Added version parameter
      • Adden path and value filter
      • Change time model of ApacheAccessModel for arbitrary time zones
      • Update link to documentation
      • Added SECURITY.md
      • Refactoring
      • Updated man-page
      • Added unit-tests for loadYamlconfig
    • Bugfixes:
      • Fixed header comment type in schema file
      • Fix debian files
    Source code(tar.gz)
    Source code(zip)
  • V2.0.1(Jun 24, 2020)

    • Changes:
      • Updated documentation
      • Updated testcases
      • Updated demos
      • Updated debian files
      • Added copyright headers
      • Added executable bit to AMiner
    Source code(tar.gz)
    Source code(zip)
  • V2.0.0(May 29, 2020)

    • Changes:
      • Updated documentation
      • Added functions getNameByComponent and getIdByComponent to AnalysisChild.py
      • Update DefaultMailNotificationEventHandler.py to python3
      • Extended AMinerRemoteControl
      • Added support for configuration in yaml format
      • Refactoring
      • Added KafkaEventHandler
      • Added JsonConverterHandler
      • Added NewMatchIdValueComboDetector
      • Enabled multiple default timestamp paths
      • Added debug feature ParserCount
      • Added unit and integration tests
      • Added installer script
      • Added VerboseUnparsedHandler
    • Bugfixes including:
      • Fixed dependencies in Debian packaging
      • Fixed typo in various analysis components
      • Fixed import of ModelElementInterface in various parsing components
      • Fixed issues with byte/string comparison
      • Fixed issue in DecimalIntegerValueModelElement, when parsing integer including sign and padding character
      • Fixed unnecessary long blocking time in SimpleMultisourceAtomSync
      • Changed minum matchLen in DelimitedDataModelElement to 1 byte
      • Fixed timezone offset in ModuloTimeMatchRule
      • Minor bugfixes
    Source code(tar.gz)
    Source code(zip)
Owner
AECID
Automatic Event Correlation for Incident Detection
AECID
DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages

aCe - Data-Centric Parallel Programming Decoupling domain science from performance optimization. DaCe is a parallel programming framework that takes c

SPCL 330 Dec 30, 2022
Validation and inference over LinkML instance data using souffle

Translates LinkML schemas into Datalog programs and executes them using Souffle, enabling advanced validation and inference over instance data

Linked data Modeling Language 7 Aug 07, 2022
Employee Turnover Analysis

Employee Turnover Analysis Submission to the DataCamp competition "Can you help reduce employee turnover?"

Jannik Wiedenhaupt 1 Feb 13, 2022
signac-flow - manage workflows with signac

signac-flow - manage workflows with signac The signac framework helps users manage and scale file-based workflows, facilitating data reuse, sharing, a

Glotzer Group 44 Oct 14, 2022
Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks

The following Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks (MOFs). The training set is extracted from the Cambridge S

1 Jan 09, 2022
Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).

PandasVault ⁠— Advanced Pandas Functions and Code Snippets The only Pandas utility package you would ever need. It has no exotic external dependencies

Derek Snow 374 Jan 07, 2023
Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles

Correlation-Study-Climate-Change-EV-Adoption Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles I

Jonathan Feng 1 Jan 03, 2022
A Python package for the mathematical modeling of infectious diseases via compartmental models

A Python package for the mathematical modeling of infectious diseases via compartmental models. Originally designed for epidemiologists, epispot can be adapted for almost any type of modeling scenari

epispot 12 Dec 28, 2022
Streamz helps you build pipelines to manage continuous streams of data

Streamz helps you build pipelines to manage continuous streams of data. It is simple to use in simple cases, but also supports complex pipelines that involve branching, joining, flow control, feedbac

Python Streamz 1.1k Dec 28, 2022
An ETL framework + Monitoring UI/API (experimental project for learning purposes)

Fastlane An ETL framework for building pipelines, and Flask based web API/UI for monitoring pipelines. Project structure fastlane |- fastlane: (ETL fr

Dan Katz 2 Jan 06, 2022
A fast, flexible, and performant feature selection package for python.

linselect A fast, flexible, and performant feature selection package for python. Package in a nutshell It's built on stepwise linear regression When p

88 Dec 06, 2022
A program that uses an API and a AI model to get info of sotcks

Stock-Market-AI-Analysis I dont mind anyone using this code but please give me credit A program that uses an API and a AI model to get info of stocks

1 Dec 17, 2021
Jupyter notebooks for the book "The Elements of Statistical Learning".

This repository contains Jupyter notebooks implementing the algorithms found in the book and summary of the textbook.

Madiyar 369 Dec 30, 2022
Building house price data pipelines with Apache Beam and Spark on GCP

This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.

1 Nov 22, 2021
Analysis of a dataset of 10000 passwords to find common trends and mistakes people generally make while setting up a password.

Analysis of a dataset of 10000 passwords to find common trends and mistakes people generally make while setting up a password.

Aryan Raj 7 Sep 04, 2022
Get mutations in cluster by querying from LAPIS API

Cluster Mutation Script Get mutations appearing within user-defined clusters. Usage Clusters are defined in the clusters dict in main.py: clusters = {

neherlab 1 Oct 22, 2021
Flexible HDF5 saving/loading and other data science tools from the University of Chicago

deepdish Flexible HDF5 saving/loading and other data science tools from the University of Chicago. This repository also host a Deep Learning blog: htt

UChicago - Department of Computer Science 255 Dec 10, 2022
Program that predicts the NBA mvp based on data from previous years.

NBA MVP Predictor A machine learning model using RandomForest Regression that predicts NBA MVP's using player data. Explore the docs » View Demo · Rep

Muhammad Rabee 1 Jan 21, 2022
Python-based Space Physics Environment Data Analysis Software

pySPEDAS pySPEDAS is an implementation of the SPEDAS framework for Python. The Space Physics Environment Data Analysis Software (SPEDAS) framework is

SPEDAS 98 Dec 22, 2022
PySpark bindings for H3, a hierarchical hexagonal geospatial indexing system

h3-pyspark: Uber's H3 Hexagonal Hierarchical Geospatial Indexing System in PySpark PySpark bindings for the H3 core library. For available functions,

Kevin Schaich 12 Dec 24, 2022