LynxKite: a complete graph data science platform for very large graphs and other datasets.

Overview

LynxKite

LynxKite is a complete graph data science platform for very large graphs and other datasets. It seamlessly combines the benefits of a friendly graphical interface and a powerful Python API.

  • Hundreds of scalable graph operations, including graph metrics like PageRank, embeddedness, and centrality, machine learning methods including GCNs, graph segmentations like modular clustering, and various transformation tools like aggregations on neighborhoods.
  • The two main data types are graphs and relational tables. Switch back and forth between the two as needed to describe complex logical flows. Run SQL on both.
  • A friendly web UI for building powerful pipelines of operation boxes. Define your own custom boxes to structure your logic.
  • Tight integration with Python lets you implement custom transformations or create whole workflows through a simple API.
  • Integrates with the Hadoop ecosystem. Import and export from CSV, JSON, Parquet, ORC, JDBC, Hive, or Neo4j.
  • Fully documented.
  • Proven in production on large clusters and real datasets.
  • Fully configurable graph visualizations and statistical plots. Experimental 3D and ray-traced graph renderings.

LynxKite is under active development. Check out our Roadmap to see what we have planned for future releases.

Getting started

Quick try:

docker run --rm -p2200:2200 lynxkite/lynxkite

Setup with persistent data:

docker run \
  -p 2200:2200 \
  -v ~/lynxkite/meta:/metadata -v ~/lynxkite/data:/data \
  -e KITE_MASTER_MEMORY_MB=1024 \
  --name lynxkite lynxkite/lynxkite

Contributing

If you find any bugs, have any questions, feature requests or comments, please file an issue or email us at [email protected].

You can install LynxKite's dependencies (Scala, Node.js, Go) with Conda.

Before the first build:

tools/git/setup.sh # Sets up pre-commit hooks.
conda env create --name lk --file conda-env.yml
conda activate lk
cp conf/kiterc_template ~/.kiterc

We use make for building the whole project.

make
target/universal/stage/bin/lynxkite interactive

Tests

We have test suites for the different parts of the system:

  • Backend tests are unit tests for the Scala code. They can also be executed with Sphynx as the backend. If you run make backend-test it will do both. Or you can start sbt and run testOnly *SomethingTest to run just one test. Run ./test_backend.sh -si to start sbt with Sphynx as the backend.

  • Frontend tests use Protractor to simulate a user's actions on the UI. make frontend-test will build everything, start a temporary LynxKite instance and run the tests against that. Use xvfb-run for headless execution. If you already have a running LynxKite instance and you don't mind erasing all data from it, run npx gulp test in the web directory. You can start up a dev proxy that watches the frontend source code for changes with npx gulp serve. Run the test suite against the dev proxy with npx gulp test:serve.

  • Python API tests are started with make remote_api-test. If you already have a running LynxKite that is okay to test on, run python/remote_api/test.sh. This script can also run a subset of the test suite: python/remote_api/test.sh -p *something*

License

Comments
  • R in LynxKite

    R in LynxKite

    It's working!

    image

    TODO:

    • [x] The same for edges.
    • [x] Add "derive table" and "create graph".
    • [x] Docs.
    • [x] Tests.
    • [x] Better type support. In the screenshot as.numeric() is needed because Sphynx only supports int64 and float64, but nchar() returns int32. I don't think I want to add more types to Sphynx. Rather I think we can automatically cast to the declared type.
    • [x] Make the type declarations more idiomatic. float, str, etc are from Python.
    • [x] Try some fancy R package, like https://github.com/digitalcytometry/ecotyper.
    • [ ] Check whether the Docker image needs any changes for this.
    • [ ] Add test for Long. (Python too.)
    opened by darabos 11
  • Upgrade to Spark 3.1.1, Scala 2.12, and Play 2.8.7

    Upgrade to Spark 3.1.1, Scala 2.12, and Play 2.8.7

    Major highlights so far:

    • Removed Vegas.
    • Removed Ammonite.
    • Play switched to dependency injection. Controllers are classes instead of objects now. It was not obvious how to convert the one test that was affected so I just deleted it.
    • Scalatest renamed org.scalatest.FunSuite to org.scalatest.funsuit.AnyFunSuite. (Funnily this didn't happen in 3.0.0 but in 3.1.0.) This affected 100+ files.
    • The Play JSON API changed a bit. It's not very exciting but affected a lot of files.
    • Looks like HADOOP_HOME must be set now even in single-node usage. I'll come back to look at it a bit more later but for now I just set it to an empty directory and it's fine.
    • A lot of other API changes and version conflicts, but nothing terribly interesting I think.

    LynxKite appears to be working now! I computed stuff on the example graph, looked at histograms, and used SQL.

    Next step is to fix the failing tests:

    [error] Failed: Total 724, Failed 217, Errors 0, Passed 507, Ignored 4
    
    opened by darabos 9
  • NetworKit integration

    NetworKit integration

    Super early state, but I can finally call NetworKit from Go. It's similar to Jano's solution from a year ago, but doesn't require hand-crafted wrappers. SWIG generates them just fine!

    For now I only communicate "scalars" between the two systems. Passing arrays was another hurdle in Jano's PR. We will see.

    (Internal link for his PR: https://github.com/biggraph/biggraph/pull/8676)

    opened by darabos 9
  • GitHub actions for testing

    GitHub actions for testing

    For #8. It's hard to test this locally. I'm using https://github.com/nektos/act but I'm getting weird errors and caching doesn't work, so each attempt takes ages. Will this PR trigger a run, I wonder? If not, I may merge this and try to see if I can trigger it that way.

    opened by darabos 8
  • Zero copy import when the schema is known

    Zero copy import when the schema is known

    Resolves #258.

    image

    No import button! The corresponding Python code is:

    lk.importParquet(eager='no', filename='/home/darabos/eg.parquet', schema='name: String, age: Double')
    

    Outstanding issues:

    • Currently you can only "import" a file this way once. LynxKite assumes it will never change. This could be avoided with a version parameter, same as its done with export operations.
    • Add the three parameters: imported_columns, limit, and sql.
    • Tests, documentation.
    opened by darabos 7
  • Neo4j export

    Neo4j export

    This is part 1: exporting attributes for existing nodes.

    There's an option to set node.keys and let them build the query. But if I use that, the label is a must. If I write the same query manually, I can leave it off. (http://5.9.211.195:8000/neo4j-spark-docs/1.0.0/writing.html#bookmark-write-node)

    Open tasks:

    • [x] Make sure this works if the keys are not defined everywhere.
    • [x] Attribute export for edges.
    • [ ] Edge export for existing nodes. (I don't think this is important.)
    • [x] Export whole graph as new stuff.
    • [x] Documentation.
    • [x] Tests. (Maybe when the final Neo4j Spark Connector is released.)
    opened by darabos 7
  • Ditch ordered mapping

    Ditch ordered mapping

    The idea (from @xandrew-lynx) being that MappingToOrdered takes up a lot of memory. The tests seem to be passing locally. I haven't measured the impact on memory use yet. I also haven't thought backward compatibility entirely through.

    opened by darabos 5
  • Upgrade to Spark 3.0

    Upgrade to Spark 3.0

    It seems despite the new major version, "No major code changes are required to adopt this version of Apache Spark."

    It seems to have quite a few improvements. It would also allow for GPU acceleration as point out by Gyorgy Mezo.

    opened by xandrew-lynx 5
  • Allow starting and stopping LynxKite from Scala

    Allow starting and stopping LynxKite from Scala

    The idea is that you have a JVM which already has a Spark session. You want to run LynxKite in this session. And you want to use it from Python too while it's running. This is a common situation in a Databricks notebook, which allows mixing Scala and Python cells.

    Instead of a kiterc you can set environment variables or provide overrides like this:

    com.lynxanalytics.lynxkite.Environment.set(
      "KITE_ENABLE_CUDA" -> "yes",
      "KITE_CONFIGURE_SPARK" -> "no",
      "KITE_META_DIR" -> "/home/darabos/kite/meta",
      "KITE_DATA_DIR" -> "file:/home/darabos/kite/data",
      "KITE_ALLOW_PYTHON" -> "yes",
      "KITE_ALLOW_NON_PREFIXED_PATHS" -> "true",
      "SPHYNX_HOST" -> "localhost",
      "SPHYNX_PORT" -> "5551",
      "ORDERED_SPHYNX_DATA_DIR" -> "/home/darabos/kite/sphynx/ordered",
      "UNORDERED_SPHYNX_DATA_DIR" -> "/home/darabos/kite/sphynx/unordered",
    )
    com.lynxanalytics.lynxkite.Main.start()
    // ...
    com.lynxanalytics.lynxkite.Main.stop()
    

    All this wouldn't be too bad. But it's the first time we're really exposing the LynxKite package name. I wanted it to be com.lynxanalytics.lynxkite rather than com.lynxanalytics.biggraph. So there are a bit more diffs than strictly necessary.

    But we should have renamed it already anyway! What's a "biggraph"? Nobody knows.

    opened by darabos 4
  • sphynx: bump dependency versions

    sphynx: bump dependency versions

    Hi, This change just bumps dependency versions of sphynx. After ./build.sh, there is an error (with/without this change) though:

    networkit_wrap.cxx: In function ‘std::vector<double>* _wrap_Centrality_TLX_DEPRECATED_networkit_77eaa497b00f90e1(NetworKit::Centrality*, void*)’:
    networkit_wrap.cxx:2001:12: error: ‘arg2’ was not declared in this scope; did you mean ‘arg1’?
    

    I dont know how to fix it :) Cheers..

    opened by jfcg 4
  • Segmentation metrics from NetworKit

    Segmentation metrics from NetworKit

    There are 7 more per-segment metrics like this one. One of them takes two segmentations as input. I think I'll skip that one and just add the 6 that have the same interface.

    There are also 5 segmentation metrics that are just a single scalar for a whole segmentation. An example is modularity. (I originally missed these because they don't derive from the Algorithm class.) I'll add these too.

    I would be fine putting these all into the new "Segmentation attributes" box category. Or do you have a better idea for organization?

    Also not sure about separate boxes vs one box with a dropdown. But I like separate boxes. It leaves more room for documentation, you more easily find them in the box search, saves the user from picking from a dropdown. So I'll go that way if you don't stop me.

    opened by darabos 4
  • Bump fast-json-patch from 3.0.0-1 to 3.1.1 in /web

    Bump fast-json-patch from 3.0.0-1 to 3.1.1 in /web

    Bumps fast-json-patch from 3.0.0-1 to 3.1.1.

    Release notes

    Sourced from fast-json-patch's releases.

    3.1.1

    Security Fix for Prototype Pollution - huntr.dev #262

    Bug fixes and ES6 modules

    Use ES6 Modules

    • package now exports non-bundled ES module Starcounter-Jack/JSON-Patch#232
    • main still points to CommonJS module for backward compatibility
    • README recommends use of named ES imports

    List of changes https://github.com/Starcounter-Jack/JSON-Patch/compare/v2.2.1...3.0.0-0

    Commits
    Maintainer changes

    This version was pushed to npm by mountain-jack, a new releaser for fast-json-patch since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 0
  • Pass DataFrames to/from managed LynxKite

    Pass DataFrames to/from managed LynxKite

    When LynxKite is running in a user-provided SparkSession, it should be possible to pass Spark DataFrames between the user's Python code and LynxKite. This would be very efficient and very powerful.

    opened by darabos 0
  • Better errors if edge src/dst indexing is wrong

    Better errors if edge src/dst indexing is wrong

    I "Create graph in R" (and maybe in Python too) if you set an out of bounds edge src/dst then Sphynx will just crash. You get "UNAVAILABLE: Network closed for unknown reason". Let's add a better error.

    good first issue 
    opened by darabos 0
  • Clicking a box doesn't open its popup until it's saved

    Clicking a box doesn't open its popup until it's saved

    This came up in https://github.com/lynxkite/lynxkite/pull/307#discussion_r1032239653 but I think I've also experienced it when using a LynxKite instance on a different continent. Maybe we could fix it?

    bug 
    opened by darabos 0
Releases(5.2.0)
  • 5.2.0(Dec 1, 2022)

    LynxKite 5.2.0 brings a large number of cool new features! In addition to Python, Scala, and SQL, we now have boxes for running R in LynxKite. We've made it possible to output custom plots from these new R boxes and also from the existing Python boxes. You can output static plots (as with Matplotlib) or even dynamic visualizations (as with Deck.gl).

    On the other hand, if you're running LynxKite as part of an automated workflow, our Python API can now start and stop LynxKite automatically to avoid wasting resources when LynxKite is idle.

    The changes in detail:

    • The Python API can now be used without a running LynxKite instance. If you pass in a SparkSession to LynxKite (lk = lynx.kite.LynxKite(spark=spark)), LynxKite will run in that SparkSession. #294 Useful if you want to run LynxKite as part of a pipeline, rather than as permanent fixture.
    • The LynxKite() constructor in the Python API now defaults to connecting to http://localhost:2200. #291
    • Added "Compute in R" and "Create graph in R" boxes that behave the same as their Python counterparts, but let you use R. #292
    • Set up an Earthly build. #296 This should make builds very reliable for everyone.
    • "Compute in Python" boxes can now output plots. Just set the output to matplotlib, or html. #297
    Source code(tar.gz)
    Source code(zip)
    lynxkite-5.2.0.jar(213.53 MB)
  • 5.1.0(Sep 28, 2022)

    LynxKite 5.1.0 brings a major change in how LynxKite is started. It also includes a high-performance Neo4j import box, support for Google's BigQuery, and several other improvements.

    Changes to how LynxKite is started

    Until now, the script generated by Play Framework was in charge of starting LynxKite. We added a significant amount of code to it with tools/call_spark_submit.sh. You would run this script as lynxkite/bin/lynxkite interactive. And this script started spark-submit with parameters based on .kiterc.

    All that is gone now. LynxKite is distributed as a single jar file. You can run it with spark-submit lynxkite-5.1.0.jar. Most of the settings from your .kiterc still apply, but you now have to load these into the environment.

    . ~/.kiterc
    spark-3.3.0/bin/spark-submit lynxkite-5.1.0.jar
    

    The benefit of this change is that LynxKite is now started like any other Spark application. Any environment that is set up to run Spark applications will be able to run LynxKite too.

    Our Docker images have been updated with this change. If you are running LynxKite in Docker, you don't have to change anything.

    Detailed changelist

    • Upgraded to Apache Spark 3.3.0. #272
    • LynxKite is now started more simply, with spark-submit. #269 This makes deployment much simpler in Hadoop environments.
    • The new box "Import from Neo4j files" can be used to import Neo4j data directly from files instead of reading from a running Neo4j instance. This can reduce the memory requirements from terabytes to gigabytes on large datasets. #268
    • Added two new "Import from BigQuery" boxes. #245
    • Changed the font styling on legends to make them more readable over maps. #267
    • The "Import from Parquet" box now has an option for using the source files directly instead of pulling the data into LynxKite. #261 This avoids an unnecessary copy and is more convenient to use through the Python API.
    • The "Weighted aggregate on neighbors" box now supports weighting by edge attributes. #257
    • The "Add rank attribute" box now supports ranking edges by edge attributes. #255

    Congratulations to @tuckging and @lacca0 for their first LynxKite commits in this release! 🎉

    Source code(tar.gz)
    Source code(zip)
    lynxkite-5.1.0.jar(220.53 MB)
  • 5.0.0(Jun 13, 2022)

    LynxKite 5.0.0 is a big release giving us fast GPU-accelerated algorithms, a new internal storage format, and other improvements.

    Download the attached release file or follow the instructions for running our Docker image.

    • Added GPU implementations of several algorithms using RAPIDS cuGraph. #241 Enable GPU usage by setting KITE_ENABLE_CUDA=yes in .kiterc. The list of algorithms includes PageRank, connected components, betweenness and Katz centrality, the Louvain method, k-core decomposition, and ForceAtlas2, a new option in Place vertices with edge lengths.
    • Switched the internal storage of graph entities from custom SequenceFiles to Parquet. #237 This is an incompatible change, but the migration is simple: delete $KITE_DATA/partitioned. Everything will be recomputed when accessed, and will be stored in the new format.
    • Added methods in the Python API for conversion between PySpark DataFrames and LynxKite tables. #240
    • Domain preference is now configurable. #236 This is useful if you want the distributed Spark backend to take precedence over the local Sphynx backend.

    Migration from LynxKite 4.x

    #237 changed the data format for graph data. You will have to delete your $KITE_DATA/partitioned directory. The data will be regenerated in the new format.

    Source code(tar.gz)
    Source code(zip)
    lynxkite-5.0.0.tgz(169.99 MB)
  • 4.4.0(May 24, 2022)

    LynxKite 4.4.0 is a maintenance release with optimizations, bug fixes, and dependency upgrades.

    • Upgraded to PyTorch Geometric (PyG) 2.0.1. #206
    • Upgraded to NetworKit 10.0. #234
    • The workspace interface is much faster now. #220
    • Now using Conda for managing all dependencies. #209
    • Fixed an issue with Python boxes returning errors unnecessarily. #225
    • Fixed an issue with GCS. #224
    • Fixed CUDA issues with GCN and Node2vec boxes. #234
    Source code(tar.gz)
    Source code(zip)
    lynxkite-4.4.0.tgz(170.03 MB)
  • 4.3.0(Sep 10, 2021)

    LynxKite 4.3.0 is a massive maintenance release. We have long wanted to upgrade to Spark 3.x, but this required upgrading to Scala 2.12, which in turn required upgrading Play Framework and other things. And now it's all done!

    We found the time to include some user-visible improvements too. Check out the full list of changes below:

    • Upgraded to Apache Spark 3.1.2. This also brought us up to Scala 2.12, Java 11, Play Framework 2.8.7, and new versions of some other dependencies. #178 #184
    • The "Custom plot" box now lets you use the latest version of Vega-Lite by directly writing JSON instead of going through the Vegas Scala DSL.
    • Logistic regression models can now be configured to use elastic net regularization.
    • Boxes used as steps in a wizard are highlighted in the workspace view by a faint glow. #183
    • "Compute in Python" boxes can be used on tables. #160
    • Added a "Draw ROC curve" built-in custom box. #197
    • Performance and compatibility improvements. #188 #194
    Source code(tar.gz)
    Source code(zip)
    lynxkite-4.3.0.tgz(173.04 MB)
  • 4.2.2(Apr 30, 2021)

  • 4.2.1(Apr 15, 2021)

  • 4.2.0(Jan 29, 2021)

    LynxKite 4.2.0 comes with a series of minor bugfixes and a much expanded collection of graph algorithms.

    • 42 algorithms from NetworKit have been integrated into LynxKite. They include new centrality measures, random graph generators, community detection methods, graph metrics (diameter, effective diameter, assortativity), optimal spanning trees and more. (#102, #106, #111, #123)
    • Users can now opt in to sharing anonymous usage statistics with the LynxKite team. (#128)
    • Environment variables can be used to override .kiterc settings. (#110)
    • Added a built-in for parametric parameters (workspaceName) that can be used to force recomputation in wizards. (#131)
    Source code(tar.gz)
    Source code(zip)
    lynxkite-4.2.0.tgz(248.60 MB)
  • 4.1.0(Oct 5, 2020)

    LynxKite 4.1.0 comes with a big update for our Neo4j support. This has been the most frequently raised point by our new users. Thanks for all the feedback!

    • Neo4j 4.x support.
    • Revamped Neo4j import. Instead of importing tables, you can now import a whole graph. (#90)
    • Added Neo4j export. You can export vertex or edge attribute or the whole graph. (#91)
    • AVRO and Delta Lake import and export. (#63, #86)
    • Added the "Filter with SQL" box as a more flexible alternative to "Filter by attributes".
    • Visualization option to not display edges. Great in large geographic datasets.
    • "Use table as vertex/edge attributes" boxes are more friendly and handle name conflicts better now.
    • Added aggregation support for Vector attributes. (Elementwise average, sum, etc.)
    • Added an option to disable generated suffixes for aggregated variables.
    • Fix for edge coloring. (#84)
    Source code(tar.gz)
    Source code(zip)
    lynxkite-4.1.0.tgz(245.43 MB)
  • 4.0.1(Jul 3, 2020)

    • Fixed issue with interactive tutorials. (#30)
    • Fixed issue with graph attributes in “Create graph in Python”. (#25)
    • Fixed issue with non-String attributes in “Use table as graph”. (#26)
    • Replaced trademarked box icons (it was an accident!) with free ones. Also switched to FontAwesome 5 everywhere to get a better selection of icons. (#37)
    • Improved the User Guide. (#38, #39)
    Source code(tar.gz)
    Source code(zip)
    lynxkite-4.0.1.tgz(221.39 MB)
  • 4.0.0(Jun 22, 2020)

    We've open-sourced LynxKite!

    We took this opportunity to make many changes that break compatibility with the LynxKite 3.x series. We can help migrate existing workspaces to LynxKite 4.0 if necessary.

    • Replaced the separate Long, Int, Double attribute types with number.
    • Instead of the (Double, Double) attribute type, 2D positions are now represented as Vector[number]. This type is widely supported and more flexible. Use "Bundle vertex attributes into a Vector" instead of "Convert vertex attributes to position", which is now gone.
    • Renamed "scalars" to "graph attributes". Renamed "projects" to "graphs". These mysterious names were largely used for historical reasons.
    • Removed "Predict with a graph neural network" operation. (It was an early prototype, long since succeeded by the "Predict with GCN" box.)
    • Removed "Predict attribute by viral modeling" box. It is more flexible to do the same thing through a series of more elemental boxes. A built-in box ("Predict from communities") has been added to serve as a starting point.
    • Made it easier to use graph convolutional boxes: added "Bundle vertex attributes into a Vector" and "One-hot encode attribute" boxes.
    • Replaced the "Reduce vertex attributes to two dimensions" and "Embed with t-SNE" boxes with the new "Reduce attribute dimensions" box which offers both PCA and t-SNE.
    • "Compute in Python" boxes now support Vector[Double] attributes.
    • "Create Graph in Python" box added.
    • Inputs and outputs for "Compute in Python" can now be inferred from the code.

    See our changelog for release notes for older releases.

    Source code(tar.gz)
    Source code(zip)
    lynxkite-4.0.0.tgz(220.81 MB)
Python dataset creator to construct datasets composed of OpenFace extracted features and Shimmer3 GSR+ Sensor datas

Python dataset creator to construct datasets composed of OpenFace extracted features and Shimmer3 GSR+ Sensor datas

Gabriele 3 Jul 05, 2022
Pypeln is a simple yet powerful Python library for creating concurrent data pipelines.

Pypeln Pypeln (pronounced as "pypeline") is a simple yet powerful Python library for creating concurrent data pipelines. Main Features Simple: Pypeln

Cristian Garcia 1.4k Dec 31, 2022
🌍 Create 3d-printable STLs from satellite elevation data 🌏

mapa 🌍 Create 3d-printable STLs from satellite elevation data Installation pip install mapa Usage mapa uses numpy and numba under the hood to crunch

Fabian Gebhart 13 Dec 15, 2022
MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI

MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI Hallo

Florent Zahoui 1 Feb 07, 2022
ped-crash-techvol: Texas Ped Crash Tech Volume Pack

ped-crash-techvol: Texas Ped Crash Tech Volume Pack In conjunction with the Final Report "Identifying Risk Factors that Lead to Increase in Fatal Pede

Network Modeling Center; Center for Transportation Research; The University of Texas at Austin 2 Sep 28, 2022
Geospatial data-science analysis on reasons behind delay in Grab ride-share services

Grab x Pulis Detailed analysis done to investigate possible reasons for delay in Grab services for NUS Data Analytics Competition 2022, to be found in

Keng Hwee 6 Jun 07, 2022
AWS Glue ETL Code Samples

AWS Glue ETL Code Samples This repository has samples that demonstrate various aspects of the new AWS Glue service, as well as various AWS Glue utilit

AWS Samples 1.2k Jan 03, 2023
Binance Kline Data With Python

Binance Kline Data by seunghan(gingerthorp) reference https://github.com/binance/binance-public-data/ All intervals are supported: 1m, 3m, 5m, 15m, 30

shquant 5 Jul 13, 2022
A forecasting system dedicated to smart city data

smart-city-predictions System prognostyczny dedykowany dla danych inteligentnych miast Praca inżynierska realizowana przez Michała Stawikowskiego and

Kevin Lai 1 Nov 08, 2021
Describing statistical models in Python using symbolic formulas

Patsy is a Python library for describing statistical models (especially linear models, or models that have a linear component) and building design mat

Python for Data 866 Dec 16, 2022
PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

Emmanuel Boateng Sifah 1 Jan 19, 2022
This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

This repo contains a powerful tool made using python which is used to visualize, analyse and finally assess the quality of the product depending upon the given observations

SasiVatsal 8 Oct 18, 2022
Python Implementation of Scalable In-Memory Updatable Bitmap Indexing

PyUpBit CS490 Large Scale Data Analytics — Implementation of Updatable Compressed Bitmap Indexing Paper Table of Contents About The Project Usage Cont

Hyeong Kyun (Daniel) Park 1 Jun 28, 2022
Single machine, multiple cards training; mix-precision training; DALI data loader.

Template Script Category Description Category script comparison script train.py, loader.py for single-machine-multiple-cards training train_DP.py, tra

2 Jun 27, 2022
Udacity-api-reporting-pipeline - Udacity api reporting pipeline

udacity-api-reporting-pipeline In this exercise, you'll use portions of each of

Fabio Barbazza 1 Feb 15, 2022
Generates a simple report about the current Covid-19 cases and deaths in Malaysia

Generates a simple report about the current Covid-19 cases and deaths in Malaysia. Results are delay one day, data provided by the Ministry of Health Malaysia Covid-19 public data.

Yap Khai Chuen 7 Dec 15, 2022
Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

Cloudera 759 Jan 07, 2023
statDistros is a Python library for dealing with various statistical distributions

StatisticalDistributions statDistros statDistros is a Python library for dealing with various statistical distributions. Now it provides various stati

1 Oct 03, 2021
We're Team Arson and we're using the power of predictive modeling to combat wildfires.

We're Team Arson and we're using the power of predictive modeling to combat wildfires. Arson Map Inspiration There’s been a lot of wildfires in Califo

Jerry Lee 3 Oct 17, 2021
Probabilistic reasoning and statistical analysis in TensorFlow

TensorFlow Probability TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. As part of the TensorFl

3.8k Jan 05, 2023