Kestrel Threat Hunting Language

Overview

Kestrel Threat Hunting Language

https://img.shields.io/pypi/pyversions/kestrel-lang https://img.shields.io/pypi/v/kestrel-lang https://img.shields.io/pypi/dm/kestrel-lang Documentation Status

What is Kestrel? Why we need it? How to hunt with XDR support? What is the science behind it?

You can find all the answers at Kestrel documentation hub. A quick primer is below.

Overview

Kestrel threat hunting language provides an abstraction for threat hunters to focus on what to hunt instead of how to hunt. The abstraction makes it possible to codify resuable hunting knowledge in a composable and sharable manner. And Kestrel runtime figures out how to hunt for hunters to make cyber threat hunting less tedious and more efficient.

Kestrel overview.

  • Kestrel language: a threat hunting language for a human to express what to hunt.
    • expressing the knowledge of what in patterns, analytics, and hunt flows.
    • composing reusable hunting flows from individual hunting steps.
    • reasoning with human-friendly entity-based data representation abstraction.
    • thinking across heterogeneous data and threat intelligence sources.
    • applying existing public and proprietary detection logic as analytics.
    • reusing and sharing individual hunting steps and entire hunt books.
  • Kestrel runtime: a machine interpreter that deals with how to hunt.
    • compiling the what against specific hunting platform instructions.
    • executing the compiled code locally and remotely.
    • assembling raw logs and records into entities for entity-based reasoning.
    • caching intermediate data and related records for fast response.
    • prefetching related logs and records for link construction between entities.
    • defining extensible interfaces for data sources and analytics execution.

Installation

Kestrel requires Python 3.x to run. Check Python installation guide if you do not have Python. It is preferred to install Kestrel runtime using pip, and it is preferred to install Kestrel runtime in a Python virtual environment.

  1. Update Python installer.
$ pip install --upgrade pip setuptools wheel
  1. Install Kestrel runtime.
$ pip install kestrel-lang
  1. Install Kestrel Jupyter kernel if you use Jupyter Notebook to hunt.
$ pip install kestrel-jupyter
$ python -m kestrel_jupyter_kernel.setup
  1. (Optional) download Kestrel analytics examples for the APPLY hunt steps.
$ git clone https://github.com/IBM/kestrel-analytics.git

Hello World Hunt

  1. Copy the following 3-step hunt flow into your favorite text editor:
# create four process entities in Kestrel and store them in the variable `proclist`
proclist = NEW process [ {"name": "cmd.exe", "pid": "123"}
                       , {"name": "explorer.exe", "pid": "99"}
                       , {"name": "firefox.exe", "pid": "201"}
                       , {"name": "chrome.exe", "pid": "205"}
                       ]

# match a pattern of browser processes, and put the matched entities in variable `browsers`
browsers = GET process FROM proclist WHERE [process:name IN ('firefox.exe', 'chrome.exe')]

# display the information (attributes name, pid) of the entities in variable `browsers`
DISP browsers ATTR name, pid
  1. Save to a file helloworld.hf.
  2. Execute the hunt flow in a terminal (in Python venv if virtual environment is used):
$ kestrel helloworld.hf

Now you captured browser processes in a Kestrel variable browsers from all processes created:

       name pid
 chrome.exe 205
firefox.exe 201

[SUMMARY] block executed in 1 seconds
VARIABLE    TYPE  #(ENTITIES)  #(RECORDS)  process*
proclist process            4           4         0
browsers process            2           2         0
*Number of related records cached.

Hunting In The Real World

  1. How to develop hunts interactively in Jupyter Notebook?
  2. How to connect to one and more real-world data sources?
  3. How to write and match a TTP pattern?
  4. How to find child processes of a process?
  5. How to find network traffic from a process?
  6. How to apply pre-built analytics?
  7. How to fork and merge hunt flows?

Find more at Kestrel documentation hub.

Connecting With The Community

Quick questions? Like to meet other users? Want to contribute? Join our Kestrel slack workspace.

Comments
  • Syntax simplification

    Syntax simplification

    Is your feature request related to a problem? Please describe. Discussion and planning for syntax revision. Some ideas under discussion:

    1. redundant entity type in GET
    x = GET process FROM datasource WHERE [process:pid = 123]
    

    Simplified

    x = GET process FROM datasource WHERE pid = 123
    
    1. redundant entity type in GET from variable
    w = GET process FROM z WHERE [process:pid = 123]
    

    Simplified

    w = z WHERE pid = 123
    
    1. expression
    <var> [FILTER] [ AGG [ FILTER]] [SORT] [OFFSET] [LIMIT]
    

    may use in DISP and COPY

    enhancement 
    opened by subbyte 4
  • Unable to query data from elasticsearch

    Unable to query data from elasticsearch

    Describe the bug Hi, I am trying to follow the tutorial from the documentation hub using an ELK stack. However, I am getting a KestrelSyntaxError when querying. I tried it with Python 3.6 and 3.9; both have the same error results.

    Details of the bug

    • What is the hunt flow/script you are executing? Hunt flow from the tutorial.
    • What is the command that failed?
    var = GET process FROM stixshifter://host101
    
    • What is the error message?
    [ERROR] KestrelSyntaxError: invalid token "" at line 1 column 24. rewrite the failed statement.
    

    To Reproduce Steps to reproduce the behavior:

    1. Setup Symon & Elasticsearch
    2. Create API key on Elasticsearch for access
    3. Test Elasticsearch access using API key
    4. Configure environment variables
    $ export STIXSHIFTER_HOST101_CONNECTOR=elastic_ecs
    $ export STIXSHIFTER_HOST101_CONNECTION='{"host":"REDACTED.elastic-cloud.com", "port":9243, "indices":"winlogbeat-7.14.0-2021.08.04-000001"}'
    $ export STIXSHIFTER_HOST101_CONFIG='{"auth":{"id":"REDACTED", "api_key":"REDACTED"}}'
    
    1. Test using stix-shifter:
    $ stix-shifter transmit elastic_ecs '{"host":"REDACTED.elastic-cloud.com", "port":9243, "indices":"winlogbeat-7.14.0-2021.08.04-000001"}' '{"auth":{"id":"REDACTED", "api_key":"REDACTED"}}' ping
    
    {
    
        "success": true,
    
        "data": "{\n  \"cluster_name\" : \"66a63ad60eae4e2b9fb38f524b8defcc\",\n  \"status\" : \"green\",\n  \"timed_out\" : false,\n  \"number_of_nodes\" : 3,\n  \"number_of_data_nodes\" : 2,\n  \"active_primary_shards\" : 86,\n  \"active_shards\" : 172,\n  \"relocating_shards\" : 0,\n  \"initializing_shards\" : 0,\n  \"unassigned_shards\" : 0,\n  \"delayed_unassigned_shards\" : 0,\n  \"number_of_pending_tasks\" : 0,\n  \"number_of_in_flight_fetch\" : 0,\n  \"task_max_waiting_in_queue_millis\" : 0,\n  \"active_shards_percent_as_number\" : 100.0\n}\n"
    
    }
    
    1. Run jupyter notebook with command
    var = GET process FROM stixshifter://host101
    [ERROR] KestrelSyntaxError: invalid token "" at line 1 column 24. rewrite the failed statement.
    

    Expected behavior Results from query

    Environment (please complete the following information):

    • OS: Ubuntu 20.04
    • Python version: Python 3.9.5, Python 3.6.9
    • Python install environment: Python virtual environment
    • STIX-Shifter version: 3.5.0
    bug 
    opened by kinzhong 4
  • ValueError: Unrecognised argument(s): force

    ValueError: Unrecognised argument(s): force

    Describe the bug ValueError when running Hello World Hunt using Python 3.6.9. Installed using pip install kestrel-lang.

    Details of the bug

    • What is the hunt flow/script you are executing? Hello World Hunt from readme
    • What is the command that failed?
    $ kestrel helloworld.hf
    
    • What is the error message?
    $ kestrel helloworld.hf
    Traceback (most recent call last):
      File "/usr/local/bin/kestrel", line 8, in <module>
        runpy.run_module('kestrel', run_name='__main__')
      File "/usr/lib/python3.6/runpy.py", line 208, in run_module
        return _run_code(code, {}, init_globals, run_name, mod_spec)
      File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/usr/local/lib/python3.6/dist-packages/kestrel/__main__.py", line 49, in <module>
        logging_setup(None, args.verbose, args.debug)
      File "/usr/local/lib/python3.6/dist-packages/kestrel/__main__.py", line 33, in logging_setup
        force=True,
      File "/usr/lib/python3.6/logging/__init__.py", line 1829, in basicConfig
        raise ValueError('Unrecognised argument(s): %s' % keys)
    ValueError: Unrecognised argument(s): force
    

    To Reproduce

    1. pip install --upgrade pip setuptools wheel
    2. pip install kestrel-lang
    3. kestrel helloworld.hf

    Expected behavior Output from hunt flow.

    Environment (please complete the following information):

    • OS: Ubuntu 18.04
    • Python version: Python 3.6.9
    • Python install environment: Python virtual environment
    • STIX-Shifter version: 3.5.0
    bug 
    opened by kinzhong 4
  • In-STIX pattern variable auto complete does not work

    In-STIX pattern variable auto complete does not work

    Describe the bug If one tries to auto-complete a variable name in STIX pattern for a parameterized pattern, it does not work.

    Details of the bug This is limited to the STIX pattern parser we current use. Need to upgrade parser.

    bug 
    opened by subbyte 2
  • File paths can't have spaces

    File paths can't have spaces

    Describe the bug A parsing error is thrown when a file path has a space in it

    Details of the bug GET process FROM file:///a/path/with/a space/in_the_name/bundle.json

    Results in:

    lark.exceptions.UnexpectedCharacters: No terminal matches 's' in the current parser context, .....
    /a/path/with/a space/in_the_name/bundle.json`
                            ^
    Expected on of:
                    * WHERE
    

    To Reproduce Try to run GET on a file:// bundle with a space in the name

    Expected behavior A clear and concise description of what you expected to happen.

    Screenshots If applicable, add screenshots to help explain your problem.

    Environment (please complete the following information):

    • OS: macOS 11.6
    • Python version: 3.7.7
    • Python install environment:
    • STIX-Shifter version: latest github develop branch.

    Additional context Add any other context about the problem here.

    bug 
    opened by imolloy 2
  • Error reporting from analytics

    Error reporting from analytics

    Some analytics (regardless of which interface they use) may call third party APIs, particularly those doing threat intel enrichment. Sometimes those APIs may fail, either due to authentication issues, temporary network problems, etc. There is currently no way for the user to be notified of such problems.

    There should be some way for analytics to capture such error information and report it back up. The implementation may differ per analytics interface (e.g. a native python interface where the analytics run under that same python interpreter as the core can probably just raise an exception, while the docker interface may need to write the information to a file).

    enhancement 
    opened by pcoccoli 2
  • sqlite3.OperationalError: near

    sqlite3.OperationalError: near "ON": syntax error

    When applying helloworld.hf file as a parameter to kestrel via cli, the following error message appears:

    [[email protected] ~]$ kestrel helloworld.hf --debug 16:19:00 DEBUG kestrel.session Establish session with session_id: None, runtime_dir: None, store_path:None, debug_mode:True 16:19:00 DEBUG kestrel.session Configuration file /kestrel/kestrel.toml does not exist. 16:19:00 DEBUG kestrel.session Configuration file etc/kestrel/kestrel.toml does not exist. 16:19:00 DEBUG kestrel.session Configuration file /home/docker/.local/etc/kestrel/kestrel.toml loaded successfully. 16:19:00 DEBUG kestrel.session Configuration file /home/docker/.config/kestrel/kestrel.toml does not exist. 16:19:00 DEBUG kestrel.session Configuration loaded: {'session': {'local_database_path': 'local.db', 'debug_env_var_name': 'KESTREL_DEBUG'}, 'language': {'default_variable': '_', 'default_sort_order': 'desc'}, 'stixquery': {'timerange_start_offset': -300, 'timerange_stop_offset': 300, 'support_id': False}, 'prefetch': {'get': True, 'find': True, 'process_name_change_timerange_start_offset': -5, 'process_name_change_timerange_stop_offset': 5, 'process_lifespan_start_offset': -10800, 'process_lifespan_stop_offset': 10800}} 16:19:00 DEBUG kestrel.session create new session runtime_directory: /tmp/kestrel-session-212ddaa5-c492-41c7-8c1c-0639a1eb82cd. 16:19:00 DEBUG firepit.sqlitestorage Connection to SQLite DB /tmp/kestrel-session-212ddaa5-c492-41c7-8c1c-0639a1eb82cd/local.db successful 16:19:00 DEBUG firepit.sqlitestorage Executing query: CREATE TABLE IF NOT EXISTS "__symtable" (name TEXT, type TEXT, appdata TEXT); 16:19:00 DEBUG firepit.sqlitestorage Executing query: CREATE TABLE IF NOT EXISTS "__membership" (sco_id TEXT, var TEXT); 16:19:00 DEBUG firepit.sqlitestorage Executing query: CREATE TABLE IF NOT EXISTS "__queries" (sco_id TEXT, query_id TEXT); 16:19:01 DEBUG kestrel.codegen.commands Executing 'new' with statement: {'command': 'new', 'type': 'process', 'data': '[ {"name": "cmd.exe", "pid": "123"}\n , {"name": "explorer.exe", "pid": "99"}\n , {"name": "firefox.exe", "pid": "201"}\n , {"name": "chrome.exe", "pid": "205"}\n ]', 'output': 'proclist'} 16:19:01 DEBUG firepit.splitter _create_table: "CREATE TABLE "process" ("name" TEXT,"pid" TEXT,"type" TEXT,"id" TEXT UNIQUE);" 16:19:01 DEBUG firepit.sqlitestorage Executing query: CREATE TABLE "process" ("name" TEXT,"pid" TEXT,"type" TEXT,"id" TEXT UNIQUE); 16:19:01 DEBUG firepit.sqlitestorage Executing query: CREATE INDEX "process_id" ON "process" ("id"); 16:19:01 DEBUG firepit.sqlstorage _upsert: "INSERT INTO "process" ("name", "pid", "type", "id") VALUES (?, ?, ?, ?) ON CONFLICT (id) DO UPDATE SET "name" = EXCLUDED."name", "pid" = EXCLUDED."pid", "type" = EXCLUDED."type";" Traceback (most recent call last): File "/home/docker/.local/bin/kestrel", line 8, in runpy.run_module('kestrel', run_name='main') File "/usr/lib64/python3.6/runpy.py", line 208, in run_module return _run_code(code, {}, init_globals, run_name, mod_spec) File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/docker/.local/lib/python3.6/site-packages/kestrel/main.py", line 49, in outputs = session.execute(huntflow) File "/home/docker/.local/lib/python3.6/site-packages/kestrel/session.py", line 262, in execute return self._execute_ast(ast) File "/home/docker/.local/lib/python3.6/site-packages/kestrel/session.py", line 437, in _execute_ast output_var_struct, display = execute_cmd(stmt, self) File "/home/docker/.local/lib/python3.6/site-packages/kestrel/codegen/commands.py", line 92, in wrapper return func(stmt, session) File "/home/docker/.local/lib/python3.6/site-packages/kestrel/codegen/commands.py", line 60, in wrapper ret = func(stmt, session) File "/home/docker/.local/lib/python3.6/site-packages/kestrel/codegen/commands.py", line 123, in new stmt["type"] = load_data(session.store, stmt["output"], stmt["data"], stmt["type"]) File "/home/docker/.local/lib/python3.6/site-packages/kestrel/codegen/data.py", line 30, in load_data store.load(output_entity_table, data, entity_type, query_id) File "/home/docker/.local/lib/python3.6/site-packages/firepit/sqlstorage.py", line 294, in load splitter.close() File "/home/docker/.local/lib/python3.6/site-packages/firepit/splitter.py", line 228, in close self.writer.write_records(obj_type, recs, self.schemas[obj_type], self.replace, self.query_id) File "/home/docker/.local/lib/python3.6/site-packages/firepit/splitter.py", line 153, in write_records self.store.upsert(cursor, tablename, obj, query_id) File "/home/docker/.local/lib/python3.6/site-packages/firepit/sqlstorage.py", line 224, in upsert cursor.execute(stmt, values) sqlite3.OperationalError: near "ON": syntax error 16:19:01 DEBUG firepit.sqlitestorage Closing SQLite DB connection

    bug question 
    opened by RukhsarRiazKhan 2
  • implement entity id attr pick-up mech and fix #31

    implement entity id attr pick-up mech and fix #31

    The PR fully addresses #31 and partially addresses #32.

    1. implement a new function get_entity_id_attribute() in src/kestrel/codegen/relations.py to compute the appropriate attribute used as identifier attribute for entities.
    2. update src/kestrel/codegen/commands.py to use get_entity_id_attribute().
    3. replace the or_pattern() for post-prefetch merge in src/kestrel/codegen/commands.py with firepit.merge() (partially address #32).
    4. update get_variable_entity_count() in src/kestrel/codegen/summary.py to use get_entity_id_attribute().
    5. update _get_variable_query_ids() in src/kestrel/codegen/summary.py since merged variable in firepit do not have __membership records.
    6. update gen_variable_summary() in src/kestrel/codegen/summary.py to only give cached records when there is a data source query.
    opened by subbyte 2
  • github pypi CI/CD workflow

    github pypi CI/CD workflow

    define GitHub Action for automatic package release to pypi https://packaging.python.org/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/

    enhancement 
    opened by subbyte 2
  • fix version issue for automatic connector install

    fix version issue for automatic connector install

    This is a patch from Kestrel side. A upstream patch on stix-shifter will be better preventing the issue: https://github.com/opencybersecurityalliance/stix-shifter/issues/1087

    opened by subbyte 1
  • Don't require schemes for `FROM` and `APPLY`

    Don't require schemes for `FROM` and `APPLY`

    Is your feature request related to a problem? Please describe. The hunter doesn't care if an analytic is using docker or python.

    Describe the solution you'd like APPLY my_analytic should work whether it's a python- or docker-based analytic. Similarly FROM my_datasource should work without having to specify stixshifter://my_datasource.

    Maybe we have something in config that lists interface preference order? E.g. python,docker so it checks python first, then docker. Stop at first match.

    You could still supply the scheme, to force the one you want in case of name collisions.

    Describe alternatives you've considered Leave it like it is now.

    Additional context N/A

    enhancement 
    opened by pcoccoli 0
  • relax single quote requirement for attribute with dash

    relax single quote requirement for attribute with dash

    Is your feature request related to a problem? Please describe. In STIX pattern, a property or partial property that has dash - in it needs to be wrapped with single quotes, such as [file:hashes.'SHA-256' = 'xxxxxxxxx...']. This mean in Kestrel, one needs to write GET file WHERE hashes.'SHA-256' = 'xxxxxx...'. This rule may not be expected by most users. Thinking to relax it so users can write GET file WHERE hashes.SHA-256 = 'xxxxxx...' and Kestrel will assemble the STIX pattern with single quotes if needed.

    Note that Kestrel is STIX compatible, so if we implement this, it will still allow users to have single quotes like hashes.'SHA-256', in which case Kestrel will not modify the string when assembling the STIX pattern.

    Describe the solution you'd like firepit also needs the single quotes. So we can possibly add the single quotes if not there around substrings in attributes with dashes in the parser (transformer).

    Describe alternatives you've considered Do the modification in to_stix() and to_firepit() in ECGP.

    Additional context Additional consideration is whether this (difference from STIX) makes extra confusion for users who are familiar with STIX. However, since the planned solution supports both (just relaxing the strict single quote requirement), this could be fine.

    enhancement 
    opened by subbyte 0
  • autocomplete function doesn't behave correctly

    autocomplete function doesn't behave correctly

    Description: The autocompletion function doesn't properly address partially complete fields correctly. Fields with the same starter characters as commands also return commands matching last_word as suggestions (which they should not be doing). No error messages are outputted, but the behavior does not match what is expected. Examples included below for clarity. A possible solution mentioned is completely revamping the logic of the docomplete() function, looking at the parsing portion of the existing code in particular.

    Environment: I verified that the variable autocompletion error appears in the Tutorial Huntbook environment, so I believe any Kestrel runtime environment through Jupyter Notebook should display this issue. For attribute autocompletion, I was running my Kestrel environment from Jupyter Notebook locally hosted through a Python 3 virtual environment on Windows 11 WSL Ubuntu 20.04.5 LTS. (I have no idea if the grammar there was correct, sorry...)

    Details & How to Reproduce: Huntflow taken directly from the Kestrel tutorial (0. Hello World Hunt). In the code block below, <tab> represents hitting the tab button, which calls the autocomplete function (do_complete()) linked above.

    proclist = NEW process [ {"name": "cmd.exe", "pid": "123"}
                           , {"name": "explorer.exe", "pid": "99"}
                           , {"name": "firefox.exe", "pid": "201"}
                           , {"name": "chrome.exe", "pid": "205"}
                           ]
    browsers = GET process FROM proclist WHERE name IN ('firefox.exe', 'chrome.exe')
    
    # --  scenario 1
    DISP <tab>                       # case 1
    DISP b<tab>                      # case 2
    DISP browsers<tab>               # case 3
    DISP browsers <tab>              # case 4
    
    # -- add and run this as a new block before calling DISP
    abc = browsers
    
    # --  scenario 2
    DISP a<tab>                      # case 2
    

    Expected Behavior: For scenario 1, all cases behave as expected for autocompletion. (suggestions is the list returned by the do_complete() function)

    1. suggestions = ['TIMESTAMPED', '_', 'browsers', 'proclist']
    2. suggestions = ['browsers']
    3. suggestions = ['']
    4. suggestions = ['APPLY', 'ATTR', 'DISP', 'FIND', 'GET', 'GROUP', 'INFO', 'JOIN', 'LIMIT', 'LOAD', 'NEW', 'OFFSET', 'SAVE', 'SORT', 'TIMESTAMPED', 'WHERE', '_', 'browsers', 'proclist']

    (My question: Why are browsers and proclist considered valid suggestions for case 4?)

    Scenario 2 can be generalized to all scenarios where a variable shares the starter characters for autocompletion as a command. Most cases behave as expected, EXCEPT Case 2 which returns suggestions = ['bc', 'pply', 'ttr']. The expected behavior would be suggestions = ['bc'].


    The following details are in regards to how this issue relates to another open issue (https://github.com/opencybersecurityalliance/kestrel-lang/issues/79), which details expanding the autocompletion feature to support attributes.

    # attribute autocompletion
    DISP browsers ATTR <tab>         # case 1
    DISP browsers ATTR n<tab>        # case 2
    DISP browsers ATTR name<tab>     # case 3
    DISP browsers ATTR name <tab>    # case 4
    

    My implementation of the attribute autocompletion feature can be found here. For case 2, the parser treats 'n' as a completed attribute field, but also as the value of last_word when searching for suggestions for the next field. As such, we end up with suggestions = ['ew'], which is wrong (and confusingly weird). The other cases behave as expected, though it might just be coincidental for case 3 in particular (the same applies for variable autocompletion). It seems that this behavior is the same as variable partial completion, so this issue must be addressed before progress can be made on the other.

    bug 
    opened by vereimyst 0
  • attribute may not be variable

    attribute may not be variable

    Describe the bug

    procs = GET process
            FROM file:///tmp/lab101.json
            WHERE parent_ref.name = 'svchost.exe'
            START 2021-04-03T00:00:00Z STOP 2021-04-03T02:00:00Z
            
    procs_grps = GROUP procs BY binary_ref.name WITH COUNT(pid) AS number_of_procs
    
    APPLY python://attribute-plot ON procs_grps WITH XPARAM=binary_ref.name, YPARAM=number_of_procs
    

    error:

    [ERROR] KestrelSyntaxError: invalid token "'binary_ref.name'" at line 6 column 29, expects one of ['BIN', 'ATTRIBUTE']
    rewrite the failed statement.
    

    Kestrel version: v1.5.1

    bug documentation 
    opened by subbyte 1
  • Explore/Test Kestrel deployment on MS Windows

    Explore/Test Kestrel deployment on MS Windows

    Is your feature request related to a problem? Please describe. Currently Kestrel is supported on Linux and macOS. It could be useful to explore deployment on Microsoft Windows, writing doc on how to set it up (if special instruction is needed similar to the macOS requirement), and fixing issues in code if needed.

    Describe the solution you'd like A first step is to test/support Kestrel running in Windows Subsystem for Linux. The second step is to test/support Kestrel running as a native Windows application (with Python environment installed).

    documentation enhancement Hacktoberfest 
    opened by subbyte 0
Releases(v1.5.3)
  • v1.5.3(Nov 24, 2022)

    1.5.3 (2022-11-23)

    Added

    • Multiple test cases for escaped string parsed with main/ECGP parsers

    Fixed

    • Escaped string in value for both ECGP and argument
    • Token prefix not handled in

    Changed

    • Use firepit time function for timestamp parsing
    • Update Lark rule transform to vtrans to avoid Lark special function misfire

    Removed

    • Explicit dependency python-dateutil
    Source code(tar.gz)
    Source code(zip)
  • v1.5.2(Oct 26, 2022)

    Added

    • Relative path support for environment variable starting with KESTREL #248
    • Relative path support for path in LOAD/SAVE
    • Relative path support for local uri, i.e., file://xxx or file://./xxx in GET
    • Unit test on relative path in environment variable
    • Unit test on relative path in LOAD
    • Unit test on relative path in data source in GET
    Source code(tar.gz)
    Source code(zip)
  • v1.5.1(Oct 25, 2022)

    Added

    • Type checking in kestrel.semantics.reference
    • New exception MissingDataSource
    • Unit test on variable reference in GET
    • Unit test on last data source reuse

    Fixed

    • Missing data source if not specified #257
    • SymbolTable type error in code generation

    Removed

    • Obsoleted exception UnsupportedStixSyntax
    Source code(tar.gz)
    Source code(zip)
  • v1.5.0(Oct 24, 2022)

    To be more friendly in the WHERE clause than strict STIX pattern, we introduce Extended Centered Graph Pattern (ECGP) in v1.5.0, plus complete Kestrel parser upgrade with multiple fixes (closing all issues in the Parser Upgrade milestone).

    • ECGP is STIX compatible, which means one can use STIX in WHERE clause as before.

    • The example of ECGP in WHERE (note that the host/endpoint is specified in a datasource, e.g., Elastic index, to avoid unnecessary data to retrieve by user or system generated queries):

    drawing
    • Documentation on ECGP will come in v1.5.1

    • Full changelog:

    Added

    • Introduce ExtendedCenteredGraphPattern (ECGP) for WHERE clause

      • Support optional SCO/entity type for centered graph (STIX compatible)
      • Support optional square brackets (STIX compatible)
      • Support Single or double quotes (STIX compatible)
      • Support nested list as value (STIX compatible)
      • Support Kestrel variable as reference
      • Support escaped characters in quoted value
      • Support ECGP to string/STIX/firepit transformation
      • Support ECGP pruning (centered or extended components)
      • Support ECGP merge/extend with another ECGP
      • Parse into STIX (now ECGP) #14
      • Normalize WHERE clause between GET and expression
      • Add WHERE clause to command FIND
    • Upgrade arguments (in APPLY command)

      • Support quoted string in arguments #170
      • dereferring variables in arguments
    • Upgrade path (in GET/APPLY/LOAD/SAVE command)

      • Support escaped characters in quoted datasrc/analytics/path
    • Upgrade JSON parser for command NEW

    • Upgrade operators in syntax to be case insensitive

    • Upgrade timespan

      • absolute timespan without t and quotes
      • relative timespan for FIND
    • Upgrade prefetch with WHERE clause to eliminate unnecessary query

    • Multiple test cases for new syntax and features

    • Add macOS (arm64) install requirement to documentation

    Changed

    • Limit STIXPATH to ATTRIBUTE

      • command: SORT, GROUP, JOIN
      • expression clause: sort, attr
    • Use explicit list like (1,2,3) or [1,2,3] for multi-value argument

    • Formalize semantics processor in parser-semantics-codegen procedure

      • variable dereferencing in semantics processor
      • variable timerange extraction in semantics processor
    Source code(tar.gz)
    Source code(zip)
  • v1.4.2(Sep 26, 2022)

    Added

    • links to Black Hat 2022 website, recording, and demo/lab
    • Kestrel logo in PNG
    • link to the Kestrel binder service blog post

    Fixed

    • consistent stix-shifter and connector versions

    Changed

    • lowercase grammar strings
    Source code(tar.gz)
    Source code(zip)
  • v1.4.1(Jul 28, 2022)

    Added

    • multi-user cache folder support in debug mode #236
    • ppid used in process identification (post-prefetch) #238
    • process identification upgraded to a two-step approach
    • fine-grained process identification time offsets
    • per entity type prefetch config support #241
    • support for automatically converting input files to STIX in stixbundle interface

    Fixed

    • prefetch when parent_ref not in process table
    • false positives in generic relation resolution
    • second execution of a failed query should raise exception
    • master runtime directory test case fix
    • ~ support in config file path (env var)
    Source code(tar.gz)
    Source code(zip)
  • v1.4.0(Jun 16, 2022)

    This release adds 2 new language features: relative timespans in place of exact timestamps in STIX patterns, and the ability to "bin" (aka "bucket") grouping attributes. "Binning" is a means of aggregating multiple entities into a single aggregate using a range of values (e.g. 5 minutes instead of grouping b exact timestamps).

    Fixed

    • Fix NameError: name 'DataSourceError' is not defined
    • Pass stix-shifter profile options into translation #230

    Added

    • Relative timespans instead of START/STOP #181
      • e.g. LAST 5 MINUTES
    • Group by "binned" (or "bucketed") attributes
      • e.g. GROUP foo BY BIN(first_observed, 5m)

    Changed

    • bump min Python version to 3.7
    • update OCA slack invitation link
    Source code(tar.gz)
    Source code(zip)
  • v1.3.4(May 16, 2022)

    Kestrel binder service now supports dynamically adding data sources.

    Fixed

    • broken /tmp/kestrel symbol link will crash a new session
    • double close (double release resources) with context manager and aexit
    • AttributeError with timestamped grouped variable #224
    • subsequent GET would return no results #228

    Added

    • documentation on macOS debug folder path
    • interface figure updated with new planned interfaces
    • dynamically load stix-shifter YAML profiles #227
    • new exception: MissingEntityAttribute
    • unit test: disp timestamped group by

    Changed

    • codecov GitHub App enabled instead of codecov-bot
    • stixshifter interface module connector split from interface.
    Source code(tar.gz)
    Source code(zip)
  • v1.3.3(Apr 29, 2022)

  • v1.3.2(Apr 22, 2022)

    Summary

    Stabilize v1.3 with many bug fixes; improve auto-completion; add code coverage.

    Details

    See CHANGELOG.rst for complete info.

    Added

    • runtime warning generation for invalid entity type #200
    • auto-complete relation in FIND
    • auto-complete BY and variable in FIND
    • add logo to readthedocs
    • upgrade auto-complete keywords to be case sensitive #213
    • add testing coverage into github workflows
    • add codecov badge to README
    • 31 unit tests for auto-completion
    • the first unit test for JOIN
    • two unit tests for ASSIGN
    • five unit tests for EXPRESSION
    • use tmp dir for generated testing data
    • auto-deref with mixed ipv4/ipv6 in network-traffic

    Fixed

    • missing _refs handling for 2 cases out of 4 #205
    • incorrectly derefering attributes after GROUP BY
    • incorrectly yielding variable when auto-completing relation in FIND
    • pylint errors about undefined-variables

    Changed

    • update grammar to separate commands yielding (or not) a variable
    • change FUNCNAME from a terminal to an inlined rule
    • differentiate the terminal "by"i between FIND and SORT/GROUP
    Source code(tar.gz)
    Source code(zip)
  • v1.3.1(Apr 17, 2022)

    Fix PyPI releasing issues, and update GitHub Action scripts to Python 3.10.

    Changed

    Fixed

    • The description failed to render when uploading to PyPI.
    • README.rst misses images when rendered at non-github sites, e.g., PyPI.
    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(Apr 15, 2022)

    Added

    • internal data model upgraded to firepit 2.0.0 with full graph-like database schema:

      • new firepit data normalized schema: https://firepit.readthedocs.io/en/latest/database.html
      • the normalized schema extracts/recognizes entities/SCOs from STIX observations and stores them and their relations.
      • the normalized schema fully enables a Kestrel variable to refer to a list of homogeneous entities as a view in a relational-DB table.
      • older hunts will need to be re-executed.
    • syntax upgrade: introducing the language construct expression to process a variable, e.g., adding a WHERE clause, and the processed variable can be

      • assigned to another variable, so one does not need another GET command with a STIX pattern to do filtering.
      • passed to DISP, so DISP is naturally upgraded to support many clauses such as SORT, LIMIT, etc.
    • new syntax for initial events handling besides entities:

      • entities in a variable do not have timestamps anymore; previously all observations of the entities were listed in a variable with timestamps.
      • use the function TIMESTAMPED() to wrap a variable into an expression when the user needs timestamps of the observations/events in which the entities appeared. This is useful for analyzing and visualizing events of entities through time, e.g., time series analysis of visited ipv4-addr entities in a variable.
    • unit tests:

      • 5 more unit tests for command FIND.
      • 2 more unit tests for command SAVE.
      • 2 unit tests for expression TIMESTAMPED().
    • new syntax added to language reference documentation

      • TIMESTAMPED
      • DISP
      • assign
    • repo updates:

      • Kestrel logo created.
      • GOVERNANCE.rst including versioning, release procedure, vulnerability disclosure, and more.

    Removed

    • the copy command is removed (replaced by the more generic assign command).

    Changed

    • repo front-page restructured to make it shorter but providing more information/links.
    • the overview page of Kestrel doc is turned into a directory of sections. The URL of the page is changed from overview.html to overview.
    Source code(tar.gz)
    Source code(zip)
  • v1.2.3(Mar 23, 2022)

    Added

    • error message improvement: suggestion when a Python analytics is not found
    • performance improvement: cache STIX bundle for any downloaded bundle in the stix-bundle data source interface
    • performance improvement: pre-compile STIX pattern before matching in the stix-bundle data source interface
    • performance improvement: skip prefetch when the generated prefetch STIX pattern is the same as the user-specified pattern
    • documentation improvement: add building instructions for documentation
    • documentation improvement: add data source setup under Installation And Setup
    • documentation improvement: add analytics setup under Installation And Setup

    Fixed

    • STIX bundle downloaded without Last-Modified field in response header #187
    • case sensitive support for Python analytics profile name #189
    Source code(tar.gz)
    Source code(zip)
  • v1.2.2(Mar 2, 2022)

    Added

    • remote data store support
    • unit test: Python analytics: APPLY after GET
    • unit test: Python analytics: APPLY on multiple variables

    Fixed

    • bump firepit version to fix transaction errors
    • bug fix: verify_package_origin() takes 1 argument

    Removed

    • unit test: Python 3.6 EOL and removed from GitHub Actions
    Source code(tar.gz)
    Source code(zip)
  • v1.2.1(Feb 24, 2022)

  • v1.2.0(Feb 10, 2022)

    We are delighted to grow Kestrel with Python analytics interface in this release.

    Important New Features

    1. Python analytics interface, which supports all existing Kestrel analytics in the kestrel-analytics repo.
    2. Automatic STIX-shifter connector install, which verifies and installs STIX-shifter connectors when needed.
    3. New documentation on Python analytics and Kestrel debug mode.

    Detailed Changelog

    • Added
      • Kestrel main package
        • matplotlib figure support in Kestrel Display Objects
        • analytics interface upgraded with config shared to Kestrel
      • Python analytics interface
        • minimal requirement design for writing a Python analytics
        • analytics function environment setup and destroy
        • support for a variety of display object outputs
        • parameters support
        • stack tracing for exception inside a Python analytics
      • STIX-shifter data source interface
        • automatic STIX-shifter connector install
          • connector name guess
          • connector origin verification
          • comprehensive error and suggestion if automatic install failed
        • pretty print for exception inside a Docker analytics
      • documentation
        • Python analytics interface
        • Kestrel debug page
        • flag to disable certificate verification in STIX-shifter profile example
    • Changed
      • abstract interface manager between datasource/analytics for code reuse
    • Fixed
      • auto-complete with data source #163
      • exception for empty STIX-shifter profile
      • STIX-shifter profile name should be case insensitive
      • exception inappropriately caught when dereferencing vars with no time range
    • Removed
      • documentation about STIX-shifter connector install
    Source code(tar.gz)
    Source code(zip)
  • v1.1.7(Jan 27, 2022)

    This release focuses on upgrading Kestrel configuration management, solving #116 and #160 and paving road for #138.

    Added

    • standalone Kestrel config module to support modular and simplified Kestrel config loading flow
    • shareable-state of config between Kestrel session and any Kestrel data source interfaces
    • stix-shifter interface upgraded with shareable-state of config support
    • stix-shifter DEBUG level env var KESTREL_STIXSHIFTER_DEBUG
    • stix-shifter config/profile loading from disk ~/.config/kestrel/stixshifter.yaml
    • debug message logging in kestrel_datasource_stixshifter
    • documentation for Kestrel main config with default config linked/shown

    Changed

    • default Kestrel config not managed by pip any more
    • turn main Kestrel from TOML into YAML ~/.config/kestrel/kestrel.yaml
    • upgrade Kestrel data source interfaces API with new config parameter
    • default stix-shifter debug level to INFO
    • documentation upgrade for kestrel_datasource_stixshifter

    Fixed

    • Kestrel config upgrade inconsistency #116
    Source code(tar.gz)
    Source code(zip)
  • v1.1.6(Dec 15, 2021)

    Detect Log4Shell with Kestrel, see README for details

    Added

    • advanced code auto-completion with parser support

    Fixed

    • dollar sign incorrectly display in Jupyter Notebook (dataframe to html)

    Changed

    • installation documentation upgrade
    Source code(tar.gz)
    Source code(zip)
  • v1.1.4(Oct 27, 2021)

  • v1.1.3(Oct 9, 2021)

    We introduce comprehensive GROUP BY syntax, implementation, test, and documentation in this release, together with firepit upgrades.

    • GROUP BY multiple attributes
    • Aggregation function in GROUP BY
    • Support alias in GROUP BY
    • New test cases for GROUP BY
    • Documentation update for GROUP BY
    Source code(tar.gz)
    Source code(zip)
  • v1.1.2(Sep 13, 2021)

  • v1.1.1(Sep 3, 2021)

    Added

    • Minimal dependent package versions #67
    • Configration option to disable execution summary display #86
    • Auto-removal of obsolete session caches #34
    • SQLite requirement in installation documentation

    Fixed

    • Python 3.6 support on command line utility #97

    Changed

    • Adjusting logging message levels to avoid confusion
    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Aug 18, 2021)

    Composability Upgrade

    Now GROUP and SORT are like other commands and can be followed by any other commands such as GET and APPLY.

    Parser Upgrade

    Integer/float is now supported as values in the JSON given to command NEW.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.14(Aug 18, 2021)

  • v1.0.13(Aug 14, 2021)

    Fixed

    • Single quotes support in STIX patterns to fix #95
    • Variable summary deduplication

    Added

    • Expected components in syntax error messages
    Source code(tar.gz)
    Source code(zip)
  • v1.0.12(Aug 3, 2021)

  • v1.0.11(Aug 3, 2021)

  • v1.0.10(Jul 19, 2021)

    Fixed

    • Missing log in command line mode #84
    • Typo in documentation
    • Incorrect config file path

    Added

    • Select config file via environment variable #82
    Source code(tar.gz)
    Source code(zip)
  • v1.0.9(Jul 7, 2021)

Owner
Open Cybersecurity Alliance
The Open Cybersecurity Alliance (OCA) fosters a cybersecurity ecosystem for exchanging information, orchestrated responses, etc. OCA is an OASIS Open Project.
Open Cybersecurity Alliance
EMNLP 2021 paper "Pre-train or Annotate? Domain Adaptation with a Constrained Budget".

Pre-train or Annotate? Domain Adaptation with a Constrained Budget This repo contains code and data associated with EMNLP 2021 paper "Pre-train or Ann

Fan Bai 8 Dec 17, 2021
NLP applications using deep learning.

NLP-Natural-Language-Processing NLP applications using deep learning like text generation etc. 1- Poetry Generation: Using a collection of Irish Poem

KASHISH 1 Jan 27, 2022
Ongoing research training transformer language models at scale, including: BERT & GPT-2

Megatron (1 and 2) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA.

NVIDIA Corporation 3.5k Dec 30, 2022
A retro text-to-speech bot for Discord

hawking A retro text-to-speech bot for Discord, designed to work with all of the stuff you might've seen in Moonbase Alpha, using the existing command

Nick Schorr 23 Dec 25, 2022
Blender addon - Scrub timeline from viewport with a shortcut

Viewport scrub timeline Move in the timeline directly in viewport and snap to nearest keyframe Note : This standalone feature will be added in the nat

Samuel Bernou 40 Nov 07, 2022
Natural Language Processing with transformers

we want to create a repo to illustrate usage of transformers in chinese

Datawhale 763 Dec 27, 2022
Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Indobenchmark Toolkit Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources fo

Samuel Cahyawijaya 11 Aug 26, 2022
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

Hugging Face 6.2k Dec 31, 2022
MPNet: Masked and Permuted Pre-training for Language Understanding

MPNet MPNet: Masked and Permuted Pre-training for Language Understanding, by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu, is a novel pre-tr

Microsoft 228 Nov 21, 2022
Transformer related optimization, including BERT, GPT

This repository provides a script and recipe to run the highly optimized transformer-based encoder and decoder component, and it is tested and maintained by NVIDIA.

NVIDIA Corporation 1.7k Jan 04, 2023
nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

Bernhard Liebl 2 Jun 10, 2022
The official implementation of "BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?, ACL 2021 main conference"

BERT is to NLP what AlexNet is to CV This is the official implementation of BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Iden

Asahi Ushio 20 Nov 03, 2022
Model for recasing and repunctuating ASR transcripts

Recasing and punctuation model based on Bert Benoit Favre 2021 This system converts a sequence of lowercase tokens without punctuation to a sequence o

Benoit Favre 88 Dec 29, 2022
2021海华AI挑战赛·中文阅读理解·技术组·第三名

文字是人类用以记录和表达的最基本工具,也是信息传播的重要媒介。透过文字与符号,我们可以追寻人类文明的起源,可以传播知识与经验,读懂文字是认识与了解的第一步。对于人工智能而言,它的核心问题之一就是认知,而认知的核心则是语义理解。

21 Dec 26, 2022
Trex is a tool to match semantically similar functions based on transfer learning.

Trex is a tool to match semantically similar functions based on transfer learning.

62 Dec 28, 2022
Sentello is python script that simulates the anti-evasion and anti-analysis techniques used by malware.

sentello Sentello is a python script that simulates the anti-evasion and anti-analysis techniques used by malware. For techniques that are difficult t

Malwation 62 Oct 02, 2022
Awesome-NLP-Research (ANLP)

Awesome-NLP-Research (ANLP)

Language, Information, and Learning at Yale 72 Dec 19, 2022
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag

Abel 211 Dec 28, 2022
NLPShala , the best IDE for all Natural language processing tasks.

The revolutionary IDE for all NLP (Natural language processing) stuffs on the internet.

Abhi 3 Aug 08, 2021
Creating a Feed of MISP Events from ThreatFox (by abuse.ch)

ThreatFox2Misp Creating a Feed of MISP Events from ThreatFox (by abuse.ch) What will it do? This will fetch IOCs from ThreatFox by Abuse.ch, convert t

17 Nov 22, 2022