LynxKite: a complete graph data science platform for very large graphs and other datasets.

Overview

LynxKite

LynxKite is a complete graph data science platform for very large graphs and other datasets. It seamlessly combines the benefits of a friendly graphical interface and a powerful Python API.

  • Hundreds of scalable graph operations, including graph metrics like PageRank, embeddedness, and centrality, machine learning methods including GCNs, graph segmentations like modular clustering, and various transformation tools like aggregations on neighborhoods.
  • The two main data types are graphs and relational tables. Switch back and forth between the two as needed to describe complex logical flows. Run SQL on both.
  • A friendly web UI for building powerful pipelines of operation boxes. Define your own custom boxes to structure your logic.
  • Tight integration with Python lets you implement custom transformations or create whole workflows through a simple API.
  • Integrates with the Hadoop ecosystem. Import and export from CSV, JSON, Parquet, ORC, JDBC, Hive, or Neo4j.
  • Fully documented.
  • Proven in production on large clusters and real datasets.
  • Fully configurable graph visualizations and statistical plots. Experimental 3D and ray-traced graph renderings.

LynxKite is under active development. Check out our Roadmap to see what we have planned for future releases.

Getting started

Quick try:

docker run --rm -p2200:2200 lynxkite/lynxkite

Setup with persistent data:

docker run \
  -p 2200:2200 \
  -v ~/lynxkite/meta:/metadata -v ~/lynxkite/data:/data \
  -e KITE_MASTER_MEMORY_MB=1024 \
  --name lynxkite lynxkite/lynxkite

Contributing

If you find any bugs, have any questions, feature requests or comments, please file an issue or email us at [email protected].

You can install LynxKite's dependencies (Scala, Node.js, Go) with Conda.

Before the first build:

tools/git/setup.sh # Sets up pre-commit hooks.
conda env create --name lk --file conda-env.yml
conda activate lk
cp conf/kiterc_template ~/.kiterc

We use make for building the whole project.

make
target/universal/stage/bin/lynxkite interactive

Tests

We have test suites for the different parts of the system:

  • Backend tests are unit tests for the Scala code. They can also be executed with Sphynx as the backend. If you run make backend-test it will do both. Or you can start sbt and run testOnly *SomethingTest to run just one test. Run ./test_backend.sh -si to start sbt with Sphynx as the backend.

  • Frontend tests use Protractor to simulate a user's actions on the UI. make frontend-test will build everything, start a temporary LynxKite instance and run the tests against that. Use xvfb-run for headless execution. If you already have a running LynxKite instance and you don't mind erasing all data from it, run npx gulp test in the web directory. You can start up a dev proxy that watches the frontend source code for changes with npx gulp serve. Run the test suite against the dev proxy with npx gulp test:serve.

  • Python API tests are started with make remote_api-test. If you already have a running LynxKite that is okay to test on, run python/remote_api/test.sh. This script can also run a subset of the test suite: python/remote_api/test.sh -p *something*

License

Comments
  • R in LynxKite

    R in LynxKite

    It's working!

    image

    TODO:

    • [x] The same for edges.
    • [x] Add "derive table" and "create graph".
    • [x] Docs.
    • [x] Tests.
    • [x] Better type support. In the screenshot as.numeric() is needed because Sphynx only supports int64 and float64, but nchar() returns int32. I don't think I want to add more types to Sphynx. Rather I think we can automatically cast to the declared type.
    • [x] Make the type declarations more idiomatic. float, str, etc are from Python.
    • [x] Try some fancy R package, like https://github.com/digitalcytometry/ecotyper.
    • [ ] Check whether the Docker image needs any changes for this.
    • [ ] Add test for Long. (Python too.)
    opened by darabos 11
  • Upgrade to Spark 3.1.1, Scala 2.12, and Play 2.8.7

    Upgrade to Spark 3.1.1, Scala 2.12, and Play 2.8.7

    Major highlights so far:

    • Removed Vegas.
    • Removed Ammonite.
    • Play switched to dependency injection. Controllers are classes instead of objects now. It was not obvious how to convert the one test that was affected so I just deleted it.
    • Scalatest renamed org.scalatest.FunSuite to org.scalatest.funsuit.AnyFunSuite. (Funnily this didn't happen in 3.0.0 but in 3.1.0.) This affected 100+ files.
    • The Play JSON API changed a bit. It's not very exciting but affected a lot of files.
    • Looks like HADOOP_HOME must be set now even in single-node usage. I'll come back to look at it a bit more later but for now I just set it to an empty directory and it's fine.
    • A lot of other API changes and version conflicts, but nothing terribly interesting I think.

    LynxKite appears to be working now! I computed stuff on the example graph, looked at histograms, and used SQL.

    Next step is to fix the failing tests:

    [error] Failed: Total 724, Failed 217, Errors 0, Passed 507, Ignored 4
    
    opened by darabos 9
  • NetworKit integration

    NetworKit integration

    Super early state, but I can finally call NetworKit from Go. It's similar to Jano's solution from a year ago, but doesn't require hand-crafted wrappers. SWIG generates them just fine!

    For now I only communicate "scalars" between the two systems. Passing arrays was another hurdle in Jano's PR. We will see.

    (Internal link for his PR: https://github.com/biggraph/biggraph/pull/8676)

    opened by darabos 9
  • GitHub actions for testing

    GitHub actions for testing

    For #8. It's hard to test this locally. I'm using https://github.com/nektos/act but I'm getting weird errors and caching doesn't work, so each attempt takes ages. Will this PR trigger a run, I wonder? If not, I may merge this and try to see if I can trigger it that way.

    opened by darabos 8
  • Zero copy import when the schema is known

    Zero copy import when the schema is known

    Resolves #258.

    image

    No import button! The corresponding Python code is:

    lk.importParquet(eager='no', filename='/home/darabos/eg.parquet', schema='name: String, age: Double')
    

    Outstanding issues:

    • Currently you can only "import" a file this way once. LynxKite assumes it will never change. This could be avoided with a version parameter, same as its done with export operations.
    • Add the three parameters: imported_columns, limit, and sql.
    • Tests, documentation.
    opened by darabos 7
  • Neo4j export

    Neo4j export

    This is part 1: exporting attributes for existing nodes.

    There's an option to set node.keys and let them build the query. But if I use that, the label is a must. If I write the same query manually, I can leave it off. (http://5.9.211.195:8000/neo4j-spark-docs/1.0.0/writing.html#bookmark-write-node)

    Open tasks:

    • [x] Make sure this works if the keys are not defined everywhere.
    • [x] Attribute export for edges.
    • [ ] Edge export for existing nodes. (I don't think this is important.)
    • [x] Export whole graph as new stuff.
    • [x] Documentation.
    • [x] Tests. (Maybe when the final Neo4j Spark Connector is released.)
    opened by darabos 7
  • Ditch ordered mapping

    Ditch ordered mapping

    The idea (from @xandrew-lynx) being that MappingToOrdered takes up a lot of memory. The tests seem to be passing locally. I haven't measured the impact on memory use yet. I also haven't thought backward compatibility entirely through.

    opened by darabos 5
  • Upgrade to Spark 3.0

    Upgrade to Spark 3.0

    It seems despite the new major version, "No major code changes are required to adopt this version of Apache Spark."

    It seems to have quite a few improvements. It would also allow for GPU acceleration as point out by Gyorgy Mezo.

    opened by xandrew-lynx 5
  • Allow starting and stopping LynxKite from Scala

    Allow starting and stopping LynxKite from Scala

    The idea is that you have a JVM which already has a Spark session. You want to run LynxKite in this session. And you want to use it from Python too while it's running. This is a common situation in a Databricks notebook, which allows mixing Scala and Python cells.

    Instead of a kiterc you can set environment variables or provide overrides like this:

    com.lynxanalytics.lynxkite.Environment.set(
      "KITE_ENABLE_CUDA" -> "yes",
      "KITE_CONFIGURE_SPARK" -> "no",
      "KITE_META_DIR" -> "/home/darabos/kite/meta",
      "KITE_DATA_DIR" -> "file:/home/darabos/kite/data",
      "KITE_ALLOW_PYTHON" -> "yes",
      "KITE_ALLOW_NON_PREFIXED_PATHS" -> "true",
      "SPHYNX_HOST" -> "localhost",
      "SPHYNX_PORT" -> "5551",
      "ORDERED_SPHYNX_DATA_DIR" -> "/home/darabos/kite/sphynx/ordered",
      "UNORDERED_SPHYNX_DATA_DIR" -> "/home/darabos/kite/sphynx/unordered",
    )
    com.lynxanalytics.lynxkite.Main.start()
    // ...
    com.lynxanalytics.lynxkite.Main.stop()
    

    All this wouldn't be too bad. But it's the first time we're really exposing the LynxKite package name. I wanted it to be com.lynxanalytics.lynxkite rather than com.lynxanalytics.biggraph. So there are a bit more diffs than strictly necessary.

    But we should have renamed it already anyway! What's a "biggraph"? Nobody knows.

    opened by darabos 4
  • sphynx: bump dependency versions

    sphynx: bump dependency versions

    Hi, This change just bumps dependency versions of sphynx. After ./build.sh, there is an error (with/without this change) though:

    networkit_wrap.cxx: In function ‘std::vector<double>* _wrap_Centrality_TLX_DEPRECATED_networkit_77eaa497b00f90e1(NetworKit::Centrality*, void*)’:
    networkit_wrap.cxx:2001:12: error: ‘arg2’ was not declared in this scope; did you mean ‘arg1’?
    

    I dont know how to fix it :) Cheers..

    opened by jfcg 4
  • Segmentation metrics from NetworKit

    Segmentation metrics from NetworKit

    There are 7 more per-segment metrics like this one. One of them takes two segmentations as input. I think I'll skip that one and just add the 6 that have the same interface.

    There are also 5 segmentation metrics that are just a single scalar for a whole segmentation. An example is modularity. (I originally missed these because they don't derive from the Algorithm class.) I'll add these too.

    I would be fine putting these all into the new "Segmentation attributes" box category. Or do you have a better idea for organization?

    Also not sure about separate boxes vs one box with a dropdown. But I like separate boxes. It leaves more room for documentation, you more easily find them in the box search, saves the user from picking from a dropdown. So I'll go that way if you don't stop me.

    opened by darabos 4
  • Bump fast-json-patch from 3.0.0-1 to 3.1.1 in /web

    Bump fast-json-patch from 3.0.0-1 to 3.1.1 in /web

    Bumps fast-json-patch from 3.0.0-1 to 3.1.1.

    Release notes

    Sourced from fast-json-patch's releases.

    3.1.1

    Security Fix for Prototype Pollution - huntr.dev #262

    Bug fixes and ES6 modules

    Use ES6 Modules

    • package now exports non-bundled ES module Starcounter-Jack/JSON-Patch#232
    • main still points to CommonJS module for backward compatibility
    • README recommends use of named ES imports

    List of changes https://github.com/Starcounter-Jack/JSON-Patch/compare/v2.2.1...3.0.0-0

    Commits
    Maintainer changes

    This version was pushed to npm by mountain-jack, a new releaser for fast-json-patch since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 0
  • Pass DataFrames to/from managed LynxKite

    Pass DataFrames to/from managed LynxKite

    When LynxKite is running in a user-provided SparkSession, it should be possible to pass Spark DataFrames between the user's Python code and LynxKite. This would be very efficient and very powerful.

    opened by darabos 0
  • Better errors if edge src/dst indexing is wrong

    Better errors if edge src/dst indexing is wrong

    I "Create graph in R" (and maybe in Python too) if you set an out of bounds edge src/dst then Sphynx will just crash. You get "UNAVAILABLE: Network closed for unknown reason". Let's add a better error.

    good first issue 
    opened by darabos 0
  • Clicking a box doesn't open its popup until it's saved

    Clicking a box doesn't open its popup until it's saved

    This came up in https://github.com/lynxkite/lynxkite/pull/307#discussion_r1032239653 but I think I've also experienced it when using a LynxKite instance on a different continent. Maybe we could fix it?

    bug 
    opened by darabos 0
Releases(5.2.0)
  • 5.2.0(Dec 1, 2022)

    LynxKite 5.2.0 brings a large number of cool new features! In addition to Python, Scala, and SQL, we now have boxes for running R in LynxKite. We've made it possible to output custom plots from these new R boxes and also from the existing Python boxes. You can output static plots (as with Matplotlib) or even dynamic visualizations (as with Deck.gl).

    On the other hand, if you're running LynxKite as part of an automated workflow, our Python API can now start and stop LynxKite automatically to avoid wasting resources when LynxKite is idle.

    The changes in detail:

    • The Python API can now be used without a running LynxKite instance. If you pass in a SparkSession to LynxKite (lk = lynx.kite.LynxKite(spark=spark)), LynxKite will run in that SparkSession. #294 Useful if you want to run LynxKite as part of a pipeline, rather than as permanent fixture.
    • The LynxKite() constructor in the Python API now defaults to connecting to http://localhost:2200. #291
    • Added "Compute in R" and "Create graph in R" boxes that behave the same as their Python counterparts, but let you use R. #292
    • Set up an Earthly build. #296 This should make builds very reliable for everyone.
    • "Compute in Python" boxes can now output plots. Just set the output to matplotlib, or html. #297
    Source code(tar.gz)
    Source code(zip)
    lynxkite-5.2.0.jar(213.53 MB)
  • 5.1.0(Sep 28, 2022)

    LynxKite 5.1.0 brings a major change in how LynxKite is started. It also includes a high-performance Neo4j import box, support for Google's BigQuery, and several other improvements.

    Changes to how LynxKite is started

    Until now, the script generated by Play Framework was in charge of starting LynxKite. We added a significant amount of code to it with tools/call_spark_submit.sh. You would run this script as lynxkite/bin/lynxkite interactive. And this script started spark-submit with parameters based on .kiterc.

    All that is gone now. LynxKite is distributed as a single jar file. You can run it with spark-submit lynxkite-5.1.0.jar. Most of the settings from your .kiterc still apply, but you now have to load these into the environment.

    . ~/.kiterc
    spark-3.3.0/bin/spark-submit lynxkite-5.1.0.jar
    

    The benefit of this change is that LynxKite is now started like any other Spark application. Any environment that is set up to run Spark applications will be able to run LynxKite too.

    Our Docker images have been updated with this change. If you are running LynxKite in Docker, you don't have to change anything.

    Detailed changelist

    • Upgraded to Apache Spark 3.3.0. #272
    • LynxKite is now started more simply, with spark-submit. #269 This makes deployment much simpler in Hadoop environments.
    • The new box "Import from Neo4j files" can be used to import Neo4j data directly from files instead of reading from a running Neo4j instance. This can reduce the memory requirements from terabytes to gigabytes on large datasets. #268
    • Added two new "Import from BigQuery" boxes. #245
    • Changed the font styling on legends to make them more readable over maps. #267
    • The "Import from Parquet" box now has an option for using the source files directly instead of pulling the data into LynxKite. #261 This avoids an unnecessary copy and is more convenient to use through the Python API.
    • The "Weighted aggregate on neighbors" box now supports weighting by edge attributes. #257
    • The "Add rank attribute" box now supports ranking edges by edge attributes. #255

    Congratulations to @tuckging and @lacca0 for their first LynxKite commits in this release! 🎉

    Source code(tar.gz)
    Source code(zip)
    lynxkite-5.1.0.jar(220.53 MB)
  • 5.0.0(Jun 13, 2022)

    LynxKite 5.0.0 is a big release giving us fast GPU-accelerated algorithms, a new internal storage format, and other improvements.

    Download the attached release file or follow the instructions for running our Docker image.

    • Added GPU implementations of several algorithms using RAPIDS cuGraph. #241 Enable GPU usage by setting KITE_ENABLE_CUDA=yes in .kiterc. The list of algorithms includes PageRank, connected components, betweenness and Katz centrality, the Louvain method, k-core decomposition, and ForceAtlas2, a new option in Place vertices with edge lengths.
    • Switched the internal storage of graph entities from custom SequenceFiles to Parquet. #237 This is an incompatible change, but the migration is simple: delete $KITE_DATA/partitioned. Everything will be recomputed when accessed, and will be stored in the new format.
    • Added methods in the Python API for conversion between PySpark DataFrames and LynxKite tables. #240
    • Domain preference is now configurable. #236 This is useful if you want the distributed Spark backend to take precedence over the local Sphynx backend.

    Migration from LynxKite 4.x

    #237 changed the data format for graph data. You will have to delete your $KITE_DATA/partitioned directory. The data will be regenerated in the new format.

    Source code(tar.gz)
    Source code(zip)
    lynxkite-5.0.0.tgz(169.99 MB)
  • 4.4.0(May 24, 2022)

    LynxKite 4.4.0 is a maintenance release with optimizations, bug fixes, and dependency upgrades.

    • Upgraded to PyTorch Geometric (PyG) 2.0.1. #206
    • Upgraded to NetworKit 10.0. #234
    • The workspace interface is much faster now. #220
    • Now using Conda for managing all dependencies. #209
    • Fixed an issue with Python boxes returning errors unnecessarily. #225
    • Fixed an issue with GCS. #224
    • Fixed CUDA issues with GCN and Node2vec boxes. #234
    Source code(tar.gz)
    Source code(zip)
    lynxkite-4.4.0.tgz(170.03 MB)
  • 4.3.0(Sep 10, 2021)

    LynxKite 4.3.0 is a massive maintenance release. We have long wanted to upgrade to Spark 3.x, but this required upgrading to Scala 2.12, which in turn required upgrading Play Framework and other things. And now it's all done!

    We found the time to include some user-visible improvements too. Check out the full list of changes below:

    • Upgraded to Apache Spark 3.1.2. This also brought us up to Scala 2.12, Java 11, Play Framework 2.8.7, and new versions of some other dependencies. #178 #184
    • The "Custom plot" box now lets you use the latest version of Vega-Lite by directly writing JSON instead of going through the Vegas Scala DSL.
    • Logistic regression models can now be configured to use elastic net regularization.
    • Boxes used as steps in a wizard are highlighted in the workspace view by a faint glow. #183
    • "Compute in Python" boxes can be used on tables. #160
    • Added a "Draw ROC curve" built-in custom box. #197
    • Performance and compatibility improvements. #188 #194
    Source code(tar.gz)
    Source code(zip)
    lynxkite-4.3.0.tgz(173.04 MB)
  • 4.2.2(Apr 30, 2021)

  • 4.2.1(Apr 15, 2021)

  • 4.2.0(Jan 29, 2021)

    LynxKite 4.2.0 comes with a series of minor bugfixes and a much expanded collection of graph algorithms.

    • 42 algorithms from NetworKit have been integrated into LynxKite. They include new centrality measures, random graph generators, community detection methods, graph metrics (diameter, effective diameter, assortativity), optimal spanning trees and more. (#102, #106, #111, #123)
    • Users can now opt in to sharing anonymous usage statistics with the LynxKite team. (#128)
    • Environment variables can be used to override .kiterc settings. (#110)
    • Added a built-in for parametric parameters (workspaceName) that can be used to force recomputation in wizards. (#131)
    Source code(tar.gz)
    Source code(zip)
    lynxkite-4.2.0.tgz(248.60 MB)
  • 4.1.0(Oct 5, 2020)

    LynxKite 4.1.0 comes with a big update for our Neo4j support. This has been the most frequently raised point by our new users. Thanks for all the feedback!

    • Neo4j 4.x support.
    • Revamped Neo4j import. Instead of importing tables, you can now import a whole graph. (#90)
    • Added Neo4j export. You can export vertex or edge attribute or the whole graph. (#91)
    • AVRO and Delta Lake import and export. (#63, #86)
    • Added the "Filter with SQL" box as a more flexible alternative to "Filter by attributes".
    • Visualization option to not display edges. Great in large geographic datasets.
    • "Use table as vertex/edge attributes" boxes are more friendly and handle name conflicts better now.
    • Added aggregation support for Vector attributes. (Elementwise average, sum, etc.)
    • Added an option to disable generated suffixes for aggregated variables.
    • Fix for edge coloring. (#84)
    Source code(tar.gz)
    Source code(zip)
    lynxkite-4.1.0.tgz(245.43 MB)
  • 4.0.1(Jul 3, 2020)

    • Fixed issue with interactive tutorials. (#30)
    • Fixed issue with graph attributes in “Create graph in Python”. (#25)
    • Fixed issue with non-String attributes in “Use table as graph”. (#26)
    • Replaced trademarked box icons (it was an accident!) with free ones. Also switched to FontAwesome 5 everywhere to get a better selection of icons. (#37)
    • Improved the User Guide. (#38, #39)
    Source code(tar.gz)
    Source code(zip)
    lynxkite-4.0.1.tgz(221.39 MB)
  • 4.0.0(Jun 22, 2020)

    We've open-sourced LynxKite!

    We took this opportunity to make many changes that break compatibility with the LynxKite 3.x series. We can help migrate existing workspaces to LynxKite 4.0 if necessary.

    • Replaced the separate Long, Int, Double attribute types with number.
    • Instead of the (Double, Double) attribute type, 2D positions are now represented as Vector[number]. This type is widely supported and more flexible. Use "Bundle vertex attributes into a Vector" instead of "Convert vertex attributes to position", which is now gone.
    • Renamed "scalars" to "graph attributes". Renamed "projects" to "graphs". These mysterious names were largely used for historical reasons.
    • Removed "Predict with a graph neural network" operation. (It was an early prototype, long since succeeded by the "Predict with GCN" box.)
    • Removed "Predict attribute by viral modeling" box. It is more flexible to do the same thing through a series of more elemental boxes. A built-in box ("Predict from communities") has been added to serve as a starting point.
    • Made it easier to use graph convolutional boxes: added "Bundle vertex attributes into a Vector" and "One-hot encode attribute" boxes.
    • Replaced the "Reduce vertex attributes to two dimensions" and "Embed with t-SNE" boxes with the new "Reduce attribute dimensions" box which offers both PCA and t-SNE.
    • "Compute in Python" boxes now support Vector[Double] attributes.
    • "Create Graph in Python" box added.
    • Inputs and outputs for "Compute in Python" can now be inferred from the code.

    See our changelog for release notes for older releases.

    Source code(tar.gz)
    Source code(zip)
    lynxkite-4.0.0.tgz(220.81 MB)
My solution to the book A Collection of Data Science Take-Home Challenges

DS-Take-Home Solution to the book "A Collection of Data Science Take-Home Challenges". Note: Please don't contact me for the dataset. This repository

Jifu Zhao 1.5k Jan 03, 2023
💬 Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.

Chatistics Python 3 scripts to convert chat logs from various messaging platforms into Pandas DataFrames. Can also generate histograms and word clouds

Florian 893 Jan 02, 2023
A script to "SHUA" H1-2 map of Mercenaries mode of Hearthstone

lushi_script Introduction This script is to "SHUA" H1-2 map of Mercenaries mode of Hearthstone Installation Make sure you installed python=3.6. To in

210 Jan 02, 2023
PyPSA: Python for Power System Analysis

1 Python for Power System Analysis Contents 1 Python for Power System Analysis 1.1 About 1.2 Documentation 1.3 Functionality 1.4 Example scripts as Ju

758 Dec 30, 2022
Random dataframe and database table generator

Random database/dataframe generator Authored and maintained by Dr. Tirthajyoti Sarkar, Fremont, USA Introduction Often, beginners in SQL or data scien

Tirthajyoti Sarkar 249 Jan 08, 2023
Renato 214 Jan 02, 2023
Predictive Modeling & Analytics on Home Equity Line of Credit

Predictive Modeling & Analytics on Home Equity Line of Credit Data (Python) HMEQ Data Set In this assignment we will use Python to examine a data set

Dhaval Patel 1 Jan 09, 2022
ICLR 2022 Paper submission trend analysis

Visualize ICLR 2022 OpenReview Data

Jintang Li 75 Dec 06, 2022
DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages

aCe - Data-Centric Parallel Programming Decoupling domain science from performance optimization. DaCe is a parallel programming framework that takes c

SPCL 330 Dec 30, 2022
Automated Exploration Data Analysis on a financial dataset

Automated EDA on financial dataset Just a simple way to get automated Exploration Data Analysis from financial dataset (OHLCV) using Streamlit and ta.

Darío López Padial 28 Nov 27, 2022
A model checker for verifying properties in epistemic models

Epistemic Model Checker This is a model checker for verifying properties in epistemic models. The goal of the model checker is to check for Pluralisti

Thomas Träff 2 Dec 22, 2021
Flexible HDF5 saving/loading and other data science tools from the University of Chicago

deepdish Flexible HDF5 saving/loading and other data science tools from the University of Chicago. This repository also host a Deep Learning blog: htt

UChicago - Department of Computer Science 255 Dec 10, 2022
MeSH2Matrix - A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

SisonkeBiotik 6 Nov 30, 2022
ETL flow framework based on Yaml configs in Python

ETL framework based on Yaml configs in Python A light framework for creating data streams. Setting up streams through configuration in the Yaml file.

Павел Максимов 18 Jul 06, 2022
LynxKite: a complete graph data science platform for very large graphs and other datasets.

LynxKite is a complete graph data science platform for very large graphs and other datasets. It seamlessly combines the benefits of a friendly graphical interface and a powerful Python API.

124 Dec 14, 2022
nrgpy is the Python package for processing NRG Data Files

nrgpy nrgpy is the Python package for processing NRG Data Files Website and source: https://github.com/nrgpy/nrgpy Documentation: https://nrgpy.github

NRG Tech Services 23 Dec 08, 2022
Python tools for querying and manipulating BIDS datasets.

PyBIDS is a Python library to centralize interactions with datasets conforming BIDS (Brain Imaging Data Structure) format.

Brain Imaging Data Structure 180 Dec 18, 2022
Visions provides an extensible suite of tools to support common data analysis operations

Visions And these visions of data types, they kept us up past the dawn. Visions provides an extensible suite of tools to support common data analysis

168 Dec 28, 2022
A DSL for data-driven computational pipelines

"Dataflow variables are spectacularly expressive in concurrent programming" Henri E. Bal , Jennifer G. Steiner , Andrew S. Tanenbaum Quick overview Ne

1.9k Jan 03, 2023
Full automated data pipeline using docker images

Create postgres tables from CSV files This first section is only relate to creating tables from CSV files using postgres container alone. Just one of

1 Nov 21, 2021