LynxKite: a complete graph data science platform for very large graphs and other datasets.

Last update: Dec 14, 2022

Overview

LynxKite

LynxKite is a complete graph data science platform for very large graphs and other datasets. It seamlessly combines the benefits of a friendly graphical interface and a powerful Python API.

Hundreds of scalable graph operations, including graph metrics like PageRank, embeddedness, and centrality, machine learning methods including GCNs, graph segmentations like modular clustering, and various transformation tools like aggregations on neighborhoods.
The two main data types are graphs and relational tables. Switch back and forth between the two as needed to describe complex logical flows. Run SQL on both.
A friendly web UI for building powerful pipelines of operation boxes. Define your own custom boxes to structure your logic.
Tight integration with Python lets you implement custom transformations or create whole workflows through a simple API.
Integrates with the Hadoop ecosystem. Import and export from CSV, JSON, Parquet, ORC, JDBC, Hive, or Neo4j.
Fully documented.
Proven in production on large clusters and real datasets.
Fully configurable graph visualizations and statistical plots. Experimental 3D and ray-traced graph renderings.

LynxKite is under active development. Check out our Roadmap to see what we have planned for future releases.

Getting started

Quick try:

docker run --rm -p2200:2200 lynxkite/lynxkite

Setup with persistent data:

docker run \
  -p 2200:2200 \
  -v ~/lynxkite/meta:/metadata -v ~/lynxkite/data:/data \
  -e KITE_MASTER_MEMORY_MB=1024 \
  --name lynxkite lynxkite/lynxkite

Contributing

If you find any bugs, have any questions, feature requests or comments, please file an issue or email us at [email protected].

You can install LynxKite's dependencies (Scala, Node.js, Go) with Conda.

Before the first build:

tools/git/setup.sh # Sets up pre-commit hooks.
conda env create --name lk --file conda-env.yml
conda activate lk
cp conf/kiterc_template ~/.kiterc

We use make for building the whole project.

make
target/universal/stage/bin/lynxkite interactive

Tests

We have test suites for the different parts of the system:

Backend tests are unit tests for the Scala code. They can also be executed with Sphynx as the backend. If you run make backend-test it will do both. Or you can start sbt and run testOnly *SomethingTest to run just one test. Run ./test_backend.sh -si to start sbt with Sphynx as the backend.
Frontend tests use Protractor to simulate a user's actions on the UI. make frontend-test will build everything, start a temporary LynxKite instance and run the tests against that. Use xvfb-run for headless execution. If you already have a running LynxKite instance and you don't mind erasing all data from it, run npx gulp test in the web directory. You can start up a dev proxy that watches the frontend source code for changes with npx gulp serve. Run the test suite against the dev proxy with npx gulp test:serve.
Python API tests are started with make remote_api-test. If you already have a running LynxKite that is okay to test on, run python/remote_api/test.sh. This script can also run a subset of the test suite: python/remote_api/test.sh -p *something*

License

GNU Affero General Public License v3.0

Comments

R in LynxKite
It's working!

TODO:

[x] The same for edges.

[x] Add "derive table" and "create graph".

[x] Docs.

[x] Tests.

[x] Better type support. In the screenshot as.numeric() is needed because Sphynx only supports int64 and float64, but nchar() returns int32. I don't think I want to add more types to Sphynx. Rather I think we can automatically cast to the declared type.

[x] Make the type declarations more idiomatic. float, str, etc are from Python.

[x] Try some fancy R package, like https://github.com/digitalcytometry/ecotyper.

[ ] Check whether the Docker image needs any changes for this.

[ ] Add test for Long. (Python too.)
opened by darabos 11
Upgrade to Spark 3.1.1, Scala 2.12, and Play 2.8.7
Major highlights so far:

Removed Vegas.

Removed Ammonite.

Play switched to dependency injection. Controllers are classes instead of objects now. It was not obvious how to convert the one test that was affected so I just deleted it.

Scalatest renamed org.scalatest.FunSuite to org.scalatest.funsuit.AnyFunSuite. (Funnily this didn't happen in 3.0.0 but in 3.1.0.) This affected 100+ files.

The Play JSON API changed a bit. It's not very exciting but affected a lot of files.

Looks like HADOOP_HOME must be set now even in single-node usage. I'll come back to look at it a bit more later but for now I just set it to an empty directory and it's fine.

A lot of other API changes and version conflicts, but nothing terribly interesting I think.

LynxKite appears to be working now! I computed stuff on the example graph, looked at histograms, and used SQL.

Next step is to fix the failing tests:

[error] Failed: Total 724, Failed 217, Errors 0, Passed 507, Ignored 4
opened by darabos 9
NetworKit integration

Super early state, but I can finally call NetworKit from Go. It's similar to Jano's solution from a year ago, but doesn't require hand-crafted wrappers. SWIG generates them just fine!

For now I only communicate "scalars" between the two systems. Passing arrays was another hurdle in Jano's PR. We will see.

(Internal link for his PR: https://github.com/biggraph/biggraph/pull/8676)

opened by darabos 9
GitHub actions for testing

For #8. It's hard to test this locally. I'm using https://github.com/nektos/act but I'm getting weird errors and caching doesn't work, so each attempt takes ages. Will this PR trigger a run, I wonder? If not, I may merge this and try to see if I can trigger it that way.

opened by darabos 8
Zero copy import when the schema is known
Resolves #258.

No import button! The corresponding Python code is:

lk.importParquet(eager='no', filename='/home/darabos/eg.parquet', schema='name: String, age: Double')

Outstanding issues:

Currently you can only "import" a file this way once. LynxKite assumes it will never change. This could be avoided with a version parameter, same as its done with export operations.

Add the three parameters: imported_columns, limit, and sql.

Tests, documentation.
opened by darabos 7
Neo4j export
This is part 1: exporting attributes for existing nodes.

There's an option to set node.keys and let them build the query. But if I use that, the label is a must. If I write the same query manually, I can leave it off. (http://5.9.211.195:8000/neo4j-spark-docs/1.0.0/writing.html#bookmark-write-node)

Open tasks:

[x] Make sure this works if the keys are not defined everywhere.

[x] Attribute export for edges.

[ ] Edge export for existing nodes. (I don't think this is important.)

[x] Export whole graph as new stuff.

[x] Documentation.

[x] Tests. (Maybe when the final Neo4j Spark Connector is released.)
opened by darabos 7
Ditch ordered mapping

The idea (from @xandrew-lynx) being that MappingToOrdered takes up a lot of memory. The tests seem to be passing locally. I haven't measured the impact on memory use yet. I also haven't thought backward compatibility entirely through.

opened by darabos 5
Upgrade to Spark 3.0

It seems despite the new major version, "No major code changes are required to adopt this version of Apache Spark."

It seems to have quite a few improvements. It would also allow for GPU acceleration as point out by Gyorgy Mezo.

opened by xandrew-lynx 5
Allow starting and stopping LynxKite from Scala
The idea is that you have a JVM which already has a Spark session. You want to run LynxKite in this session. And you want to use it from Python too while it's running. This is a common situation in a Databricks notebook, which allows mixing Scala and Python cells.

Instead of a kiterc you can set environment variables or provide overrides like this:

com.lynxanalytics.lynxkite.Environment.set( "KITE_ENABLE_CUDA" -> "yes", "KITE_CONFIGURE_SPARK" -> "no", "KITE_META_DIR" -> "/home/darabos/kite/meta", "KITE_DATA_DIR" -> "file:/home/darabos/kite/data", "KITE_ALLOW_PYTHON" -> "yes", "KITE_ALLOW_NON_PREFIXED_PATHS" -> "true", "SPHYNX_HOST" -> "localhost", "SPHYNX_PORT" -> "5551", "ORDERED_SPHYNX_DATA_DIR" -> "/home/darabos/kite/sphynx/ordered", "UNORDERED_SPHYNX_DATA_DIR" -> "/home/darabos/kite/sphynx/unordered", ) com.lynxanalytics.lynxkite.Main.start() // ... com.lynxanalytics.lynxkite.Main.stop()

All this wouldn't be too bad. But it's the first time we're really exposing the LynxKite package name. I wanted it to be com.lynxanalytics.lynxkite rather than com.lynxanalytics.biggraph. So there are a bit more diffs than strictly necessary.

But we should have renamed it already anyway! What's a "biggraph"? Nobody knows.
opened by darabos 4

sphynx: bump dependency versions

Hi, This change just bumps dependency versions of sphynx. After ./build.sh, there is an error (with/without this change) though:

networkit_wrap.cxx: In function ‘std::vector<double>* _wrap_Centrality_TLX_DEPRECATED_networkit_77eaa497b00f90e1(NetworKit::Centrality*, void*)’:
networkit_wrap.cxx:2001:12: error: ‘arg2’ was not declared in this scope; did you mean ‘arg1’?

I dont know how to fix it :) Cheers..

opened by jfcg 4

Segmentation metrics from NetworKit

There are 7 more per-segment metrics like this one. One of them takes two segmentations as input. I think I'll skip that one and just add the 6 that have the same interface.

There are also 5 segmentation metrics that are just a single scalar for a whole segmentation. An example is modularity. (I originally missed these because they don't derive from the Algorithm class.) I'll add these too.

I would be fine putting these all into the new "Segmentation attributes" box category. Or do you have a better idea for organization?

Also not sure about separate boxes vs one box with a dropdown. But I like separate boxes. It leaves more room for documentation, you more easily find them in the box search, saves the user from picking from a dropdown. So I'll go that way if you don't stop me.

opened by darabos 4
Bump fast-json-patch from 3.0.0-1 to 3.1.1 in /web
Bumps fast-json-patch from 3.0.0-1 to 3.1.1.

Release notes

Sourced from fast-json-patch's releases.

3.1.1

Security Fix for Prototype Pollution - huntr.dev #262

Bug fixes and ES6 modules

Use ES6 Modules

package now exports non-bundled ES module Starcounter-Jack/JSON-Patch#232

main still points to CommonJS module for backward compatibility

README recommends use of named ES imports

List of changes https://github.com/Starcounter-Jack/JSON-Patch/compare/v2.2.1...3.0.0-0

Commits

9d313ac fix(tests): Updated tests to reflect new error message

e4f4eb3 3.1.1

d7903fb fix: typescript codegen changes

5f04488 Bumping version number

7e9fe13 Typescript provided

097864a Documentation updated

51964ed feat: Cleaned up vars vs consts

8a6a360 New build

adeb422 Update .gitignore

59336fe Merge pull request #292 from Starcounter-Jack/dependabot/npm_and_yarn/ajv-6.12.6

Additional commits viewable in compare view

Maintainer changes

This version was pushed to npm by mountain-jack, a new releaser for fast-json-patch since your current version.

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies javascript
opened by dependabot[bot] 0
Pass DataFrames to/from managed LynxKite

When LynxKite is running in a user-provided SparkSession, it should be possible to pass Spark DataFrames between the user's Python code and LynxKite. This would be very efficient and very powerful.

opened by darabos 0
Better errors if edge src/dst indexing is wrong

I "Create graph in R" (and maybe in Python too) if you set an out of bounds edge src/dst then Sphynx will just crash. You get "UNAVAILABLE: Network closed for unknown reason". Let's add a better error.
good first issue

opened by darabos 0
Clicking a box doesn't open its popup until it's saved

This came up in https://github.com/lynxkite/lynxkite/pull/307#discussion_r1032239653 but I think I've also experienced it when using a LynxKite instance on a different continent. Maybe we could fix it?
bug

opened by darabos 0

Releases(5.2.0)

5.2.0(Dec 1, 2022)
LynxKite 5.2.0 brings a large number of cool new features! In addition to Python, Scala, and SQL, we now have boxes for running R in LynxKite. We've made it possible to output custom plots from these new R boxes and also from the existing Python boxes. You can output static plots (as with Matplotlib) or even dynamic visualizations (as with Deck.gl).

On the other hand, if you're running LynxKite as part of an automated workflow, our Python API can now start and stop LynxKite automatically to avoid wasting resources when LynxKite is idle.

The changes in detail:

The Python API can now be used without a running LynxKite instance. If you pass in a SparkSession to LynxKite (lk = lynx.kite.LynxKite(spark=spark)), LynxKite will run in that SparkSession. #294 Useful if you want to run LynxKite as part of a pipeline, rather than as permanent fixture.

The LynxKite() constructor in the Python API now defaults to connecting to http://localhost:2200. #291

Added "Compute in R" and "Create graph in R" boxes that behave the same as their Python counterparts, but let you use R. #292

Set up an Earthly build. #296 This should make builds very reliable for everyone.

"Compute in Python" boxes can now output plots. Just set the output to matplotlib, or html. #297

Source code(tar.gz)
Source code(zip)
lynxkite-5.2.0.jar(213.53 MB)
5.1.0(Sep 28, 2022)
LynxKite 5.1.0 brings a major change in how LynxKite is started. It also includes a high-performance Neo4j import box, support for Google's BigQuery, and several other improvements.

Changes to how LynxKite is started

Until now, the script generated by Play Framework was in charge of starting LynxKite. We added a significant amount of code to it with tools/call_spark_submit.sh. You would run this script as lynxkite/bin/lynxkite interactive. And this script started spark-submit with parameters based on .kiterc.

All that is gone now. LynxKite is distributed as a single jar file. You can run it with spark-submit lynxkite-5.1.0.jar. Most of the settings from your .kiterc still apply, but you now have to load these into the environment.

. ~/.kiterc spark-3.3.0/bin/spark-submit lynxkite-5.1.0.jar

The benefit of this change is that LynxKite is now started like any other Spark application. Any environment that is set up to run Spark applications will be able to run LynxKite too.

Our Docker images have been updated with this change. If you are running LynxKite in Docker, you don't have to change anything.

Detailed changelist

Upgraded to Apache Spark 3.3.0. #272

LynxKite is now started more simply, with spark-submit. #269 This makes deployment much simpler in Hadoop environments.

The new box "Import from Neo4j files" can be used to import Neo4j data directly from files instead of reading from a running Neo4j instance. This can reduce the memory requirements from terabytes to gigabytes on large datasets. #268

Added two new "Import from BigQuery" boxes. #245

Changed the font styling on legends to make them more readable over maps. #267

The "Import from Parquet" box now has an option for using the source files directly instead of pulling the data into LynxKite. #261 This avoids an unnecessary copy and is more convenient to use through the Python API.

The "Weighted aggregate on neighbors" box now supports weighting by edge attributes. #257

The "Add rank attribute" box now supports ranking edges by edge attributes. #255

Congratulations to @tuckging and @lacca0 for their first LynxKite commits in this release! 🎉
Source code(tar.gz)
Source code(zip)
lynxkite-5.1.0.jar(220.53 MB)
5.0.0(Jun 13, 2022)
LynxKite 5.0.0 is a big release giving us fast GPU-accelerated algorithms, a new internal storage format, and other improvements.

Download the attached release file or follow the instructions for running our Docker image.

Added GPU implementations of several algorithms using RAPIDS cuGraph. #241 Enable GPU usage by setting KITE_ENABLE_CUDA=yes in .kiterc. The list of algorithms includes PageRank, connected components, betweenness and Katz centrality, the Louvain method, k-core decomposition, and ForceAtlas2, a new option in Place vertices with edge lengths.

Switched the internal storage of graph entities from custom SequenceFiles to Parquet. #237 This is an incompatible change, but the migration is simple: delete $KITE_DATA/partitioned. Everything will be recomputed when accessed, and will be stored in the new format.

Added methods in the Python API for conversion between PySpark DataFrames and LynxKite tables. #240

Domain preference is now configurable. #236 This is useful if you want the distributed Spark backend to take precedence over the local Sphynx backend.

Migration from LynxKite 4.x

#237 changed the data format for graph data. You will have to delete your $KITE_DATA/partitioned directory. The data will be regenerated in the new format.
Source code(tar.gz)
Source code(zip)
lynxkite-5.0.0.tgz(169.99 MB)
4.4.0(May 24, 2022)
LynxKite 4.4.0 is a maintenance release with optimizations, bug fixes, and dependency upgrades.

Upgraded to PyTorch Geometric (PyG) 2.0.1. #206

Upgraded to NetworKit 10.0. #234

The workspace interface is much faster now. #220

Now using Conda for managing all dependencies. #209

Fixed an issue with Python boxes returning errors unnecessarily. #225

Fixed an issue with GCS. #224

Fixed CUDA issues with GCN and Node2vec boxes. #234

Source code(tar.gz)
Source code(zip)
lynxkite-4.4.0.tgz(170.03 MB)
4.3.0(Sep 10, 2021)
LynxKite 4.3.0 is a massive maintenance release. We have long wanted to upgrade to Spark 3.x, but this required upgrading to Scala 2.12, which in turn required upgrading Play Framework and other things. And now it's all done!

We found the time to include some user-visible improvements too. Check out the full list of changes below:

Upgraded to Apache Spark 3.1.2. This also brought us up to Scala 2.12, Java 11, Play Framework 2.8.7, and new versions of some other dependencies. #178 #184

The "Custom plot" box now lets you use the latest version of Vega-Lite by directly writing JSON instead of going through the Vegas Scala DSL.

Logistic regression models can now be configured to use elastic net regularization.

Boxes used as steps in a wizard are highlighted in the workspace view by a faint glow. #183

"Compute in Python" boxes can be used on tables. #160

Added a "Draw ROC curve" built-in custom box. #197

Performance and compatibility improvements. #188 #194

Source code(tar.gz)
Source code(zip)
lynxkite-4.3.0.tgz(173.04 MB)
4.2.2(Apr 30, 2021)
Fixes a regression introduced in LynxKite 4.2.1. If you encountered the bug, please hit "Reimport" on your import boxes after upgrading to ensure the corrupted data gets recomputed.

Fix for attributes becoming undefined. #176

Source code(tar.gz)
Source code(zip)
lynxkite-4.2.2.tgz(252.72 MB)
4.2.1(Apr 15, 2021)
LynxKite 4.2.1 is a minor release to fix a breaking issue with the newest Google Chrome release. A few other small fixes are included.

Fix for Chrome 90. #162

Fixed a few other UI bugs. #164

Reduced memory use in Sphynx. #141

Source code(tar.gz)
Source code(zip)
lynxkite-4.2.1.tgz(249.09 MB)
4.2.0(Jan 29, 2021)
LynxKite 4.2.0 comes with a series of minor bugfixes and a much expanded collection of graph algorithms.

42 algorithms from NetworKit have been integrated into LynxKite. They include new centrality measures, random graph generators, community detection methods, graph metrics (diameter, effective diameter, assortativity), optimal spanning trees and more. (#102, #106, #111, #123)

Users can now opt in to sharing anonymous usage statistics with the LynxKite team. (#128)

Environment variables can be used to override .kiterc settings. (#110)

Added a built-in for parametric parameters (workspaceName) that can be used to force recomputation in wizards. (#131)

Source code(tar.gz)
Source code(zip)
lynxkite-4.2.0.tgz(248.60 MB)
4.1.0(Oct 5, 2020)
LynxKite 4.1.0 comes with a big update for our Neo4j support. This has been the most frequently raised point by our new users. Thanks for all the feedback!

Neo4j 4.x support.

Revamped Neo4j import. Instead of importing tables, you can now import a whole graph. (#90)

Added Neo4j export. You can export vertex or edge attribute or the whole graph. (#91)

AVRO and Delta Lake import and export. (#63, #86)

Added the "Filter with SQL" box as a more flexible alternative to "Filter by attributes".

Visualization option to not display edges. Great in large geographic datasets.

"Use table as vertex/edge attributes" boxes are more friendly and handle name conflicts better now.

Added aggregation support for Vector attributes. (Elementwise average, sum, etc.)

Added an option to disable generated suffixes for aggregated variables.

Fix for edge coloring. (#84)

Source code(tar.gz)
Source code(zip)
lynxkite-4.1.0.tgz(245.43 MB)
4.0.1(Jul 3, 2020)
Fixed issue with interactive tutorials. (#30)

Fixed issue with graph attributes in “Create graph in Python”. (#25)

Fixed issue with non-String attributes in “Use table as graph”. (#26)

Replaced trademarked box icons (it was an accident!) with free ones. Also switched to FontAwesome 5 everywhere to get a better selection of icons. (#37)

Improved the User Guide. (#38, #39)

Source code(tar.gz)
Source code(zip)
lynxkite-4.0.1.tgz(221.39 MB)
4.0.0(Jun 22, 2020)
We've open-sourced LynxKite!

We took this opportunity to make many changes that break compatibility with the LynxKite 3.x series. We can help migrate existing workspaces to LynxKite 4.0 if necessary.

Replaced the separate Long, Int, Double attribute types with number.

Instead of the (Double, Double) attribute type, 2D positions are now represented as Vector[number]. This type is widely supported and more flexible. Use "Bundle vertex attributes into a Vector" instead of "Convert vertex attributes to position", which is now gone.

Renamed "scalars" to "graph attributes". Renamed "projects" to "graphs". These mysterious names were largely used for historical reasons.

Removed "Predict with a graph neural network" operation. (It was an early prototype, long since succeeded by the "Predict with GCN" box.)

Removed "Predict attribute by viral modeling" box. It is more flexible to do the same thing through a series of more elemental boxes. A built-in box ("Predict from communities") has been added to serve as a starting point.

Made it easier to use graph convolutional boxes: added "Bundle vertex attributes into a Vector" and "One-hot encode attribute" boxes.

Replaced the "Reduce vertex attributes to two dimensions" and "Embed with t-SNE" boxes with the new "Reduce attribute dimensions" box which offers both PCA and t-SNE.

"Compute in Python" boxes now support Vector[Double] attributes.

"Create Graph in Python" box added.

Inputs and outputs for "Compute in Python" can now be inferred from the code.

See our changelog for release notes for older releases.
Source code(tar.gz)
Source code(zip)
lynxkite-4.0.0.tgz(220.81 MB)

Owner

GitHub Repository https://lynxkite.com/

My solution to the book A Collection of Data Science Take-Home Challenges

DS-Take-Home Solution to the book "A Collection of Data Science Take-Home Challenges". Note: Please don't contact me for the dataset. This repository

1.5k Jan 03, 2023

💬 Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.

Chatistics Python 3 scripts to convert chat logs from various messaging platforms into Pandas DataFrames. Can also generate histograms and word clouds

893 Jan 02, 2023

A script to "SHUA" H1-2 map of Mercenaries mode of Hearthstone

lushi_script Introduction This script is to "SHUA" H1-2 map of Mercenaries mode of Hearthstone Installation Make sure you installed python=3.6. To in

210 Jan 02, 2023

PyPSA: Python for Power System Analysis

1 Python for Power System Analysis Contents 1 Python for Power System Analysis 1.1 About 1.2 Documentation 1.3 Functionality 1.4 Example scripts as Ju

758 Dec 30, 2022

Random dataframe and database table generator

Random database/dataframe generator Authored and maintained by Dr. Tirthajyoti Sarkar, Fremont, USA Introduction Often, beginners in SQL or data scien

249 Jan 08, 2023

Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.

ETL Pipeline with Airflow, Spark, s3, MongoDB and Amazon Redshift

214 Jan 02, 2023

Predictive Modeling & Analytics on Home Equity Line of Credit

Predictive Modeling & Analytics on Home Equity Line of Credit Data (Python) HMEQ Data Set In this assignment we will use Python to examine a data set

1 Jan 09, 2022

ICLR 2022 Paper submission trend analysis

Visualize ICLR 2022 OpenReview Data

75 Dec 06, 2022

DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages

aCe - Data-Centric Parallel Programming Decoupling domain science from performance optimization. DaCe is a parallel programming framework that takes c

330 Dec 30, 2022

Automated Exploration Data Analysis on a financial dataset

Automated EDA on financial dataset Just a simple way to get automated Exploration Data Analysis from financial dataset (OHLCV) using Streamlit and ta.

28 Nov 27, 2022

A model checker for verifying properties in epistemic models

Epistemic Model Checker This is a model checker for verifying properties in epistemic models. The goal of the model checker is to check for Pluralisti

2 Dec 22, 2021

Flexible HDF5 saving/loading and other data science tools from the University of Chicago

deepdish Flexible HDF5 saving/loading and other data science tools from the University of Chicago. This repository also host a Deep Learning blog: htt

255 Dec 10, 2022

MeSH2Matrix - A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

6 Nov 30, 2022

LynxKite: a complete graph data science platform for very large graphs and other datasets.

Related tags

Overview

LynxKite

Getting started

Contributing

Tests

License

Comments

3.1.1

Bug fixes and ES6 modules

Releases(5.2.0)

5.2.0(Dec 1, 2022)

5.1.0(Sep 28, 2022)

Changes to how LynxKite is started

Detailed changelist

5.0.0(Jun 13, 2022)

Migration from LynxKite 4.x

4.4.0(May 24, 2022)

4.3.0(Sep 10, 2021)

4.2.2(Apr 30, 2021)

4.2.1(Apr 15, 2021)

4.2.0(Jan 29, 2021)

4.1.0(Oct 5, 2020)

4.0.1(Jul 3, 2020)

4.0.0(Jun 22, 2020)

Owner

My solution to the book A Collection of Data Science Take-Home Challenges

💬 Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.

A script to "SHUA" H1-2 map of Mercenaries mode of Hearthstone

PyPSA: Python for Power System Analysis

Random dataframe and database table generator

Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.

Predictive Modeling & Analytics on Home Equity Line of Credit

ICLR 2022 Paper submission trend analysis

DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages

Automated Exploration Data Analysis on a financial dataset

A model checker for verifying properties in epistemic models

Flexible HDF5 saving/loading and other data science tools from the University of Chicago

MeSH2Matrix - A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

ETL flow framework based on Yaml configs in Python

LynxKite: a complete graph data science platform for very large graphs and other datasets.

nrgpy is the Python package for processing NRG Data Files

Python tools for querying and manipulating BIDS datasets.

Visions provides an extensible suite of tools to support common data analysis operations

A DSL for data-driven computational pipelines

Full automated data pipeline using docker images