SIEM Logstash parsing for more than hundred technologies

Overview

LogIndexer Pipeline

Logstash Parsing Configurations for Elastisearch SIEM and OpenDistro for Elasticsearch SIEM

Why this project exists

The overhead of implementing Logstash parsing and applying Elastic Common Schema (ECS) across audit, security, and system logs can be a large drawback when using Elasticsearch as a SIEM (Security Incident and Event Management). The Cargill SIEM team has spent significant time on developing quality Logstash parsing processors for many well-known log vendors and wants to share this work with the community. In addition to Logstash processors, we have also included log collection programs for API-based log collection, as well as the setup scripts used to generate our pipeline-to-pipeline architecture.

Quick start Instructions

"Quick start" mostly depends on how your Logstash configuration is set up. If you have your own setup already established, it might be best to use the processors that apply to your organization's log collection (found in the "config" directory). If you are seeking to use the architecture in this repo, consult the README found in the build_scripts directory. We will be adding an elaborate setup guide soon.

Contributions

We welcome and encourage individual contributions to this repo. Please see the Contribution.md guide in the root of the repo. Please note that we reserve the right to close pull requests or issues that appear to be out of scope for our project, or for other reasons not specified.

Questions, Comments & Expected Level of Attention

Please create an issue and someone will try to respond to your issue within 5 business days. However, it should be noted that while we will try revisit the repository semi-regularly, we are not held beholden to this response time (life happens). We welcome other individuals' answers and input as well.

Licensing

Apache-2.0

Comments
  • improved cisco ACI processor

    improved cisco ACI processor

    Improved the cisco aci processor with the following changes:

    1. simplified grok parsing
    2. removed complex logic used to detected event and error messages
    3. fixed broken parsing of the device hostname sending logs
    4. tmp.rule does NOT rapresent an username , it's instead the even.reason as described by cisco, - The action or condition that caused the event, such as a component failure or a threshold crossing.

    sample messages used for testing

    <186>Dec 08 21:20:20.614 ABC-DCA-NPRD-ACILEF-104 %LOG_LOCAL7-2-SYSTEM_MSG [F0532][raised][interface-physical-down][critical][sys/phys-[eth1/47]/phys/fault-F0532] Port is down, reason being suspended(no LACP PDUs)(connected), used by EPG on node 104 of fabric ACI Fabric1 with hostname ABC-DCA-NPRD-ACILEF-104
    
    <190>Nov 24 18:20:53.237 ABC-DCB-ACIAPC-003 %LOG_LOCAL7-6-SYSTEM_MSG [E4206143][transition][info][fwrepo/fw-aci-apic-dk9.5.2.6e] Firmware aci-apic-dk9.5.2.6e created
    
    opened by anubisg1 3
  • [Help / Documentation] - how to classify incoming syslog messages

    [Help / Documentation] - how to classify incoming syslog messages

    As per title, how would we classify incoming syslog messages so that they end up in the proper process pipeline?

    Let's take a common use case where in the network we have Cisco IOS router and switches , Cisco ACI , Cisco WLC and ISE, then Checkpoint Firewalls , F5 load balancers etc ...

    generally those devices would all be sending logs to the syslog server IP port 514. but how would we classify from where each message is coming from in order to send it to the specific processor ?

    are we supposed to setup a different input queu for each processor (for example, different port ofn the syslog server so that for example, ACI goes to 192.168.10.10 port 5514 whole Checkpoint on port 5515? )

    or is there an ip filter that says, if source IP is X send to ACI processor if Y send to checkpoint ..

    or what other options are there?

    question 
    opened by anubisg1 2
  • host split enrichment error

    host split enrichment error

    For certain hostnames the host split enrichment is causing the pipeline to be blocked until grok timesout.

    [2022-06-10T15:54:58,563][WARN ][org.logstash.plugins.pipeline.PipelineBus][processor] Attempted to send event to 'enrichments' but that address was unavailable. Maybe the destination pipeline is down or stopping? Will Retry. [2022-06-10T15:57:22,451][WARN ][logstash.filters.grok ][enrichments] Timeout executing grok '^(?<[host][tmp]>.?).(?<[host][domain]>.?)$' against field '[host][hostname]' with value 'abc-name123-xyz.domain.com'!

    opened by nnovaes 2
  • Fix deprecation warnings

    Fix deprecation warnings

    User Story - details

    For translate we should use source, target instead of field, destination. On boot logstash 15 shows these warnings:

    [2021-11-09T16:53:33,518][WARN ][logstash.filters.translate] You are using a deprecated config setting "destination" set in translate. Deprecated settings will continue to work, but are scheduled for removal from logstash in the future. Use `target` option instead. If you have any questions about this, please visit the #logstash channel on freenode irc.
    [2021-11-09T16:53:33,519][WARN ][logstash.filters.translate] You are using a deprecated config setting "field" set in translate. Deprecated settings will continue to work, but are scheduled for removal from logstash in the future. Use `source` option instead. If you have any questions about this, please visit the #logstash channel on freenode irc.
    

    Atleast upto Logstash 13 new fields are not supported so let's make this change when we upgrade.

    Tasks

    • [ ]
    • [ ]

    X-Reference Issues

    Related Code

    << Any related code here... >>
    
    opened by KrishnanandSingh 2
  • native vlan mismatch and other improvements

    native vlan mismatch and other improvements

    Description

    • Parsing for Native VLAN mismatch error messages 2021-10-14T13:28:06.497Z {name=abc.com} <188>132685: Oct 14 21:28:07.975 GMT: %CDP-4-NATIVE_VLAN_MISMATCH: Native VLAN mismatch discovered on FastEthernet0/1 (1), with xyz GigabitEthernet1/0/1 (36).
    • Lowercase [actual_msg] field
    • fix typo on timestamp
    • add the timezone to [tmp][devicetimestamp]
    • removed the old parser code for native vlan mismatch
    • removed a catch all condition in the old parser
    • lowercase [rule.category]
    opened by nnovaes 2
  • Feature Request: Add known applications + risk score field based off destination.port fields

    Feature Request: Add known applications + risk score field based off destination.port fields

    User Story - details

    As a SIEM engineer I want to know port numbers associated with the destination.port field. This will allow me to quickly identify potential applications communicating on the session and also the risk of the traffic Im observing

    Tasks

    • Create a port lookup translation.
    • Add risk category score to application (scale of 1-10 or severity name).

    Examples:

    3389 -> Remote Desktop Protocol (high risk)
    22 - Secure Shell (high risk)
    3306 - MySQL (medium risk)
    6881-6889 - Bit Torrent (high risk)
    
    opened by ryanpodonnell1 2
  • Cisco IOS (cisco.router and cisco.switch) new rules

    Cisco IOS (cisco.router and cisco.switch) new rules

    Description

    new parsing rules for cisco.router and cisco.switch. The old version of this processor needs some rework. However, there are functioning bits of it that i have preserved, since they kind of work. the new rules provide some good foundation for future "full" parsing and also covers bgp and interface up/down msgs. the lookup database for translate filters is static.

    opened by nnovaes 2
  • Update syslog_log_security_sdwan.app.conf

    Update syslog_log_security_sdwan.app.conf

    Description

    These updates correct assignment of versa fields to the ECS model. It also adds back versa specific fields that do not map to ECS into a separate [labels][all] field that works like tags. I couldn't find clean way to implement it without using the add_tag command, so i have saved the event tags to another field and then restored back

    @Akhila-Y please review as well.

    opened by nnovaes 1
  • added space, testing new IDE

    added space, testing new IDE

    Description

    Please provide a description of your proposed changes - providing obfuscated log/code examples is highly encouraged.

    Related Issues

    Are there any Issues to this PR?

    Todos

    Are there any additional items that must be completed before this PR gets merged in?

    • [ ]
    • [ ]
    opened by MehaSal 1
  • added new ECS fields to .csv file

    added new ECS fields to .csv file

    Description

    Please provide a description of your proposed changes - providing obfuscated log/code examples is highly encouraged.

    Related Issues

    Are there any Issues to this PR?

    Todos

    Are there any additional items that must be completed before this PR gets merged in?

    • [ ]
    • [ ]
    opened by MehaSal 1
  • added missing fields for coverge reporting to aws cloudtrail

    added missing fields for coverge reporting to aws cloudtrail

    Description

    Please provide a description of your proposed changes - providing obfuscated log/code examples is highly encouraged.

    Related Issues

    Are there any Issues to this PR?

    Todos

    Are there any additional items that must be completed before this PR gets merged in?

    • [ ]
    • [ ]
    opened by MehaSal 1
  • [[enrichments]>worker22] ruby - Ruby exception occurred: no implicit conversion of nil into String

    [[enrichments]>worker22] ruby - Ruby exception occurred: no implicit conversion of nil into String

    Describe the bug

    [ERROR] 2022-11-26 07:54:05.540 [[enrichments]>worker14] ruby - Ruby exception occurred: no implicit conversion of nil into String {:class=>"TypeError", :backtrace=>["(ruby filter code):68:in `block in filter_method'", "org/jruby/RubyArray.java:1865:in `each'", "(ruby filter code):67:in `block in filter_method'", "/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-filter-ruby-3.1.8/lib/logstash/filters/ruby.rb:96:in `inline_script'", "/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-filter-ruby-3.1.8/lib/logstash/filters/ruby.rb:89:in `filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:159:in `do_filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:178:in `block in multi_filter'", "org/jruby/RubyArray.java:1865:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:175:in `multi_filter'", "org/logstash/config/ir/compiler/AbstractFilterDelegatorExt.java:134:in `multi_filter'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:301:in `block in start_workers'"]}
    

    X-Reference issues

    (Cross reference any user stories that this bug might be affecting)

    Steps To Reproduce

    start the enrichment pipeline. I'm using logstash 8.5.2

    Expected behavior

    no error should be seen

    Additional context

    The following components in enrichment make use of ruby filer, but i don't understand what is the culprit

    ./02_ecs_data_type.conf
    ./04_timestamp.conf
    ./11_related_hosts.conf
    ./12_related_user.conf
    ./13_related_ip.conf
    ./14_related_hash.conf
    ./16_related_mac.conf
    ./93_mitre.conf
    ./94_remove_empty_n_truncate.conf
    
    bug wontfix 
    opened by anubisg1 2
  • cisco processor fails because of missing hostname and lowercase date

    cisco processor fails because of missing hostname and lowercase date

    I'm working with syslog_audit_cisco.switch.conf and i found the following issues:

    1. the syslog message is assumed here https://github.com/Cargill/OpenSIEM-Logstash-Parsing/blob/1.0/config/processors/syslog_audit_cisco.switch.conf#L52 as
      # {timesdtamp} {facility} {severity} {mnemonic} {description}
      # seq no:timestamp: %facility-severity-MNEMONIC:description
    

    in reality most people would configure "logging origin-id hostname" which will change the log format into

      # {hostname} {timesdtamp} {facility} {severity} {mnemonic} {description}
      # seq no: hostname: timestamp: %facility-severity-MNEMONIC:description
    
    1. the parser at line https://github.com/Cargill/OpenSIEM-Logstash-Parsing/blob/1.0/config/processors/syslog_audit_cisco.switch.conf#L33 is modifying the hostname field before that field is parsed (maybe this is assumed from kafka, instead of being taken from the logs?

    2. in line https://github.com/Cargill/OpenSIEM-Logstash-Parsing/blob/1.0/config/processors/syslog_audit_cisco.switch.conf#L48 the message is converted to lower case, but that causes date parse failures later on, becuase of case missmatch .

    Nov 17 11:44:46.490 UTC matches, but when i have nov 17 11:44:46.490 utc it fails on the date parsing here: https://github.com/Cargill/OpenSIEM-Logstash-Parsing/blob/1.0/config/processors/syslog_audit_cisco.switch.conf#L77

    Sample log entry for reference:

    <14>4643: Switch-core01: Nov 17 11:44:46.490 UTC: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/27, changed state to up

    opened by anubisg1 0
  • GeoLitePrivate2-City.mmdb doesn't exist

    GeoLitePrivate2-City.mmdb doesn't exist

    to use the geoip enrichment, you need to files, specifically

              database => "/mnt/s3fs_geoip/GeoLite2-City.mmdb"
              database => "/mnt/s3fs_geoip/GeoLitePrivate2-City.mmdb"
    

    unfortunately seems like GeoLitePrivate2-City.mmdb doesn't exist anywhere in the internet and maxmind only provides

    • GeoLite2-ASN.mmdb
    • GeoLite2-City.mmdb
    • GeoLite2-Country.mmdb

    i'd expect that either more information on where to find GeoLitePrivate2-City.mmdb is added to the documentation or the enrichment pipeline is updated to function without that file

    documentation question 
    opened by anubisg1 3
  • Validate ECS fields

    Validate ECS fields

    User Story - details

    There should be an enrichment checking that only permitted values are stored in ECS fields that have a predefined set of values, so those fields can be compliant with ECS. See https://www.elastic.co/guide/en/ecs/1.9/ecs-event.html for more info. I believe event.xyz are the only fields that have their values defined. If that's true the sample code below should take care of doing this validation.

    Tasks

    • [ ]
    • [ ]

    X-Reference Issues

    Related Code

    the sample configuration below picks the event.type value that came from the processors and populates ecs_status with valid or event.type-invalid_field_value. therefore, if the ecs_status is not valid, it will add a tag that will have event.type-invalid_field_value. i.e. if event.type is "process", because "process" is not among the allowed values for event.type, a event.type-invalid_field_value: process will be added.

     translate {
                field => "event.type"
                dictionary => [
                "access", "valid", 
                "admin", "valid", 
                "allowed", "valid", 
                "change", "valid", 
                "connection", "valid", 
                "creation", "valid", 
                "deletion", "valid", 
                "denied", "valid", 
                "end", "valid", 
                "error", "valid", 
                "group", "valid", 
                "info", "valid", 
                "installation", "valid", 
                "protocol", "valid", 
                "start", "valid", 
                "user", "valid"
                ]
                exact => true
                # [field]-[error]
                fallback => "event.type-invalid_field_value"
                destination => "ecs_status"
            }
        if [ecs_status] !~ "valid" {
            mutate {
                add_tag => [ "%{ecs_status}: %{event.type}" ]
                remove_field => [ "ecs_status", "event.type"]
            }
        }
    
        #EVENT.CATEGORY
        translate {
                field => "event.category"
                dictionary => [
                "authentication", "valid", 
                "configuration", "valid", 
                "driver", "valid", 
                "database", "valid", 
                "file", "valid", 
                "host", "valid", 
                "iam", "valid", 
                "intrusion_detection", "valid", 
                "malware", "valid", 
                "network", "valid", 
                "package", "valid", 
                "process", "valid", 
                "web", "valid"
                ]
                exact => true
                # [field]-[error]
                fallback => "event.category-invalid_field_value"
                destination => "ecs_status"
            }
        if [ecs_status] !~ "valid" {
            mutate {
                add_tag => [ "%{ecs_status}: %{event.category}" ]
                remove_field => [ "ecs_status", "event.category"]
    
            }
        }
    
        # event.kind
         translate {
                field => "event.kind"
                dictionary => [
                "alert", "valid", 
                "event", "valid", 
                "metric", "valid", 
                "state", "valid", 
                "pipeline_error", "valid", 
                "signal", "valid"
                ]
                exact => true
                # [field]-[error]
                fallback => "event.kind-invalid_field_value"
                destination => "ecs_status"
            }
        if [ecs_status] !~ "valid" {
            mutate {
                add_tag => [ "%{ecs_status}: %{event.kind}" ]
                remove_field => [ "ecs_status", "event.kind"]
    
            }
        }
    
    
        # event.outcome
         translate {
                field => "event.outcome"
                dictionary => [
                "failure", "valid", 
                "success", "valid", 
                "unknown", "valid"
                ]
                exact => true
                # [field]-[error]
                fallback => "event.outcome-invalid_field_value"
                destination => "ecs_status"
            }
        if [ecs_status] !~ "valid" {
            mutate {
                add_tag => [ "%{ecs_status}: %{event.outcome}" ]
                remove_field => [ "ecs_status", "event.outcome"]
    
            }
        }
    
    opened by nnovaes 0
Releases(v0.1-beta)
  • v0.1-beta(May 19, 2021)

    This release lack an elaborate usage documentation so marking this as beta. Users can still work with it by going through the python script. Soon documentation would be added.

    Source code(tar.gz)
    Source code(zip)
Owner
Working to nourish the world. Committed to helping the world thrive
We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Multi-Modal Self-Supervision using GDT and StiCa This is an official pytorch implementation of papers: Multi-modal Self-Supervision from Generalized D

Facebook Research 42 Dec 09, 2022
Winning solution of the Indoor Location & Navigation Kaggle competition

This repository contains the code to generate the winning solution of the Kaggle competition on indoor location and navigation organized by Microsoft

Tom Van de Wiele 62 Dec 28, 2022
Streamlit App For Product Analysis - Streamlit App For Product Analysis

Streamlit_App_For_Product_Analysis Здравствуйте! Перед вами дашборд, позволяющий

Grigory Sirotkin 1 Jan 10, 2022
PyTorch implementation of SwAV (Swapping Assignments between Views)

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments This code provides a PyTorch implementation and pretrained models for SwAV

Meta Research 1.7k Jan 04, 2023
Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

Auto-Seg-Loss By Hao Li, Chenxin Tao, Xizhou Zhu, Xiaogang Wang, Gao Huang, Jifeng Dai This is the official implementation of the ICLR 2021 paper Auto

61 Dec 21, 2022
catch-22: CAnonical Time-series CHaracteristics

catch22 - CAnonical Time-series CHaracteristics About catch22 is a collection of 22 time-series features coded in C that can be run from Python, R, Ma

Carl H Lubba 229 Oct 21, 2022
TensorFlow2 Classification Model Zoo playing with TensorFlow2 on the CIFAR-10 dataset.

Training CIFAR-10 with TensorFlow2(TF2) TensorFlow2 Classification Model Zoo. I'm playing with TensorFlow2 on the CIFAR-10 dataset. Architectures LeNe

Chia-Hung Yuan 16 Sep 27, 2022
Unofficial PyTorch Implementation of AHDRNet (CVPR 2019)

AHDRNet-PyTorch This is the PyTorch implementation of Attention-guided Network for Ghost-free High Dynamic Range Imaging (CVPR 2019). The official cod

Yutong Zhang 4 Sep 08, 2022
Resilient projection-based consensus actor-critic (RPBCAC) algorithm

Resilient projection-based consensus actor-critic (RPBCAC) algorithm We implement the RPBCAC algorithm with nonlinear approximation from [1] and focus

Martin Figura 5 Jul 12, 2022
Pun Detection and Location

Pun Detection and Location “The Boating Store Had Its Best Sail Ever”: Pronunciation-attentive Contextualized Pun Recognition Yichao Zhou, Jyun-yu Jia

lawson 3 May 13, 2022
A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Note: This is an alpha (preview) version which is still under refining. nn-Meter is a novel and efficient system to accurately predict the inference l

Microsoft 244 Jan 06, 2023
The codes and models in 'Gaze Estimation using Transformer'.

GazeTR We provide the code of GazeTR-Hybrid in "Gaze Estimation using Transformer". We recommend you to use data processing codes provided in GazeHub.

65 Dec 27, 2022
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

WECHSEL Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models. arXiv: https://arx

Institute of Computational Perception 45 Dec 29, 2022
This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by Divam Gupta, Wei Pu, Trenton Tabor, Jeff Schneider

SBEVNet: End-to-End Deep Stereo Layout Estimation This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by D

Divam Gupta 19 Dec 17, 2022
[SIGGRAPH 2022 Journal Track] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars Fangzhou Hong1*  Mingyuan Zhang1*  Liang Pan1  Zhongang Cai1,2,3  Lei Yang2 

Fangzhou Hong 749 Jan 04, 2023
DLFlow is a deep learning framework.

DLFlow是一套深度学习pipeline,它结合了Spark的大规模特征处理能力和Tensorflow模型构建能力。利用DLFlow可以快速处理原始特征、训练模型并进行大规模分布式预测,十分适合离线环境下的生产任务。利用DLFlow,用户只需专注于模型开发,而无需关心原始特征处理、pipeline构建、生产部署等工作。

DiDi 152 Oct 27, 2022
code for Multi-scale Matching Networks for Semantic Correspondence, ICCV

MMNet This repo is the official implementation of ICCV 2021 paper "Multi-scale Matching Networks for Semantic Correspondence.". Pre-requisite conda cr

joey zhao 25 Dec 12, 2022
Weakly Supervised Learning of Rigid 3D Scene Flow

Weakly Supervised Learning of Rigid 3D Scene Flow This repository provides code and data to train and evaluate a weakly supervised method for rigid 3D

Zan Gojcic 124 Dec 27, 2022
2021-AIAC-QQ-Browser-Hyperparameter-Optimization-Rank6

2021-AIAC-QQ-Browser-Hyperparameter-Optimization-Rank6

Aigege 8 Mar 31, 2022
This repository is an implementation of paper : Improving the Training of Graph Neural Networks with Consistency Regularization

CRGNN Paper : Improving the Training of Graph Neural Networks with Consistency Regularization Environments Implementing environment: GeForce RTX™ 3090

THUDM 28 Dec 09, 2022