Book on Julia for Data Science

Last update: Dec 25, 2022

Overview

Julia Data Science

Open source and open access book for data science in Julia.

You can read the full book on https://juliadatascience.io.

This book is also published at Amazon.com.

LICENSE

This book is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.

Comments

Overview of Plots.jl Chapter
Here is an opinionated version. Feel free to criticize:

Subsections for Plots

Brief overview of the JuliaPlots ecosystem

JuliaPlots Organization

Plots.jl (what is it good for and what are its limitations)

Makie.jl (what is it good for and what are its limitations)

AlgebraOfGraphics.jl (what is it good for and what are its limitations) (version 2.0)

What is Plots.jl?

plot vs plot!

input data

Series Types and functions(e.g. line, line!, heatmap, heatmap!)

How to save a plot

Attributes

Overview (What are attributes, and the whole symbol e.g. :xticks system)

Series attributes

Plot Attributes

Footnote about the extra_kwargs stuff

Examples using the most common things that you want to do in a visualization. This could be inserted right after you introduce a specific attribute.

Color and Palettes I think we should cover colorbrewer and the ones from matplotlib (inferno, viridis, magma). We should cover some stuff from the Claus Wilke Fundamentals of Data Visualization book (Chapters 4 and 19). Also we should cover the three types of color usage:

sequential: continuous stuff, e.g. :blues (only blue)

diverging: continuous stuff, e.g. :RdBu (from red to blue)

distinguishable: discrete stuff, e.g. :Set1_5

I have a very strong positive bias towards colorbrewer Sets (e.g. palette=:Set1_5).

We should also mention that the reader should use a colorblinded-friendly palette or colors. Maybe we should include an official statistics regarding the prevalence of any sort of colorblindness or color difficulties in the population. I remember seeing somewhere that it was around 5% of people.

Layouts

Overview on several ways to do layouts

the layout argument, also cover the grid

the @layout macro

specific measures with the Plots.PlotMeasures submodule

adding subplots incremententally. Define p1, p2, p3; then do a plot(p1, p2, p3; layout=l)

writing
opened by storopoli 13
Add full cover

This PR add code to generate the full cover. See https://github.com/JuliaDataScience/JuliaDataScience/issues/17#issuecomment-927185934 for more information about the dimensions.

I don't consider the cover done and perfect now. It is meant as a place where we can start discussing the appearance.

Preview

EDIT: This wasn't working in Amazon. It was turned into

So, instead using the Amazon cover editor:

which is turned into

opened by rikhuijzer 10

makie.jl: ERROR: LoadError: UndefVarError: Downloads not defined

Hi,

I get an

ERROR: LoadError: UndefVarError: Downloads not defined
Stacktrace:
 [1] top-level scope
   @ o:\Julia\makie.jl:604
in expression starting at o:\Julia\makie.jl:604

Commented out.

And on execution for all demo functions:

ERROR: UndefVarError: Options not defined
Stacktrace:
 [1] custom_plot()
   @ Main o:\Julia\makie.jl:16
 [2] top-level scope
   @ REPL[12]:1

Commented out.

On execution,

custom_plot()

no error, no plot appears.

opened by bardo84 10

Chapter 7 Link broken to Makie Docs

There is a link broken in Chapter 7 datavisMakie.md:

In the "See Makie’s documentation for more." It redirects to http://makie.juliaplots.org/stable/backends_and_output.html#Backends-and-Output which is broken.

cc @lazarusA
bug

opened by storopoli 9
Notation discussion points
Notation discussion points from #20:

Always using : before the start of a code block.

Mentioning functions like DataFrame as DataFrame() or DataFrame(...).

My suggestion: only Julia objects between backticks and filenames and extension names between quotation marks (like Julia's strings).
opened by rikhuijzer 9
[dataframes_select] not the same selection

The lines

https://github.com/JuliaDataScience/JuliaDataScience/blob/b5582d29a9afa300e0a124ad2820389c386c04cc/contents/dataframes_select.md#L48-L56

don't give the same result as the previous example ... (where :id is not shown)

I think you need to rephrase the text.

opened by Mo-Gul 7
Fix pipe alignment in front cover

this should fix some alignment issues regarding the pipes. Additionally, I did a print for the previous one and the lack of grid lines makes the whole cover a little bit dull.

opened by lazarusA 7
Julia cannot reproduce the rand

I have put an issue on Stack Overflow about an example of this book. Could you please explain? https://stackoverflow.com/questions/70321085/julia-cannot-reproduce-the-rand

Many thanks.

Shixiang

opened by ShixiangWang 6
Improve section numbering
For a discussion see issue #221 that it is beneficial to have unnumbered sections when there are no other sections on that level, i.e. that there is no other "section x.2".

There is only the instance

https://github.com/JuliaDataScience/JuliaDataScience/blob/2c750e092b7aa932cf10d7fc12f1d0f7ba7ae909/contents/julia_basics.md#L671

which cannot be made unnumbered, because it is referenced at

https://github.com/JuliaDataScience/JuliaDataScience/blob/2c750e092b7aa932cf10d7fc12f1d0f7ba7ae909/contents/dataframes_performance.md#L9

My suggestion is to bring this one level up, i.e.

- #### Functions with a bang `!` {#sec:function_bang} + ### Functions with a bang `!` {#sec:function_bang}

I think that also fits to the previous section header where I (with my current n00bie understanding) don't see the "bang operator" fitting in.

Do you agree? If yes, I would prepare another commit. Otherwise you can merge the PR directly.
opened by Mo-Gul 6
Logo for Julia Data Science Organization
We need a logo I will talk to someone who can do that for me at UNINOVE. Any thoughts @rikhuijzer ? We should move anything stats/Bayesian so to not confuse with future endeavors.

Maybe something with Tabular Data or Line Plots. We should definitely use Julia colors.

[x] Update JuliaDataScience GitHub Organization Logo

[x] Update JuliaDataScience Book favicon site icon

enhancement
opened by storopoli 6
improve typographical stuff

As I already stated in e.g. https://github.com/JuliaDataScience/JuliaDataScience/pull/215/commits/e8e1d11ed6386f8bcd552abbfd9ab058c3176b51 it would nice to make some sections unnumbered when there is no second section on that level.

When I have seen this correct, this should be possible by appending the section entry by {-} as e.g. can be seen in

https://github.com/JuliaDataScience/JuliaDataScience/blob/56151f0f6d69ad4c62945aa1deb3169af90ef9ad/contents/index.md#L1

So if you consider it would be nice to have that I'll redo the suggestions in a new PR. Especially if it is that easily doable.

PS:
I am not sure if my newest comments in #217 have been noticed by one of you, since I have added them after the PR was merged. Thank you for your comments!

opened by Mo-Gul 5
cheatSheet_cairo.jl improvements
Some suggested changes to the CairoMakie cheatsheet, some for consistency and some to make it easier to understand what the function actually does (I think that is the main use of this figure: once the purpose of a plotting function is clear, the user can always check the documentation for the different ways to call the function). The biggest change is for linesegments: it now uses the linesegments(x, y) signature with the same data as the previous plots, to help understanding what's going on.

Full list of changes:

change range of first plots to have even number of points (for linesegments)

change linesegments to use same data as previous plots

uniformize parameter names in plot titles

use variable heights in crossbar

fix title of violin plot

more explicit title for mesh
opened by knuesel 0
4.1.2 Excel - failure

Hi,

Using Julia 1.8.1, VS Code notebooks

Entering the code from 4.1.2 Excel, I tried running: path = write_grades_xlsx() xf = readxlsx(path)

which gave: MethodError: objects of type Vector{String} are not callable Use square brackets [] for indexing an Array.

Stacktrace: [1] write_xlsx(name::String, df::DataFrame) @ Main ~/julia-test/juliadatascience-dataframes.ipynb:4 [2] write_grades_xlsx() @ Main ~/julia-test/juliadatascience-dataframes.ipynb:3 [3] top-level scope @ ~/julia-test/juliadatascience-dataframes.ipynb:1

Here's the function: function write_xlsx(name, df::DataFrame) path = "$name.xlsx" data = collect(eachcol(df)) cols = names(df) writetable(path, data, cols) end

I found that because you had defined a "names" variable earlier in the chapter, this clobbered the "names()" function. When I changed this to use Base.names(), everything worked properly. (Somewhat ironically, you mentioned the global variable problem just after defining "names". ;-))

I'd recommend just renaming "names" to something less ambiguous, and then it won't break the code below.

Thanks for the great work! Ari

opened by arimeyer 0
new book format

After this first experience of doing and printing the book I still feel that the margins [text at the edges] and overall book size is not an appropriate layout. Thoughts?
version-2

opened by lazarusA 2

Releases(edition-1)

edition-1(Oct 31, 2021)

First edition published as paperback on Amazon
Source code(tar.gz)
Source code(zip)
juliadatascience.pdf(6.59 MB)

Owner

Julia Data Science

Julia Data Science Book

GitHub Repository https://juliadatascience.io

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Aesara

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) an

7.2k Dec 30, 2022

A computer algebra system written in pure Python

SymPy See the AUTHORS file for the list of authors. And many more people helped on the SymPy mailing list, reported bugs, helped organize SymPy's part

9.9k Jan 08, 2023

Wikidata scholarly profiles

Scholia is a python package and webapp for interaction with scholarly information in Wikidata. Webapp As a webapp, it currently runs from Wikimedia To

180 Dec 28, 2022

Book on Julia for Data Science

349 Dec 25, 2022

PennyLane is a cross-platform Python library for differentiable programming of quantum computers.

PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural network.

1.6k Jan 04, 2023

Animation engine for explanatory math videos

Manim is an engine for precise programatic animations, designed for creating explanatory math videos. Note, there are two versions of manim. This repo

48.9k Jan 03, 2023

Datamol is a python library to work with molecules

Datamol is a python library to work with molecules. It's a layer built on top of RDKit and aims to be as light as possible.

276 Dec 19, 2022

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Cookiecutter Data Science A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. Project homepage

0 Sep 05, 2021

Doing bayesian data analysis - Python/PyMC3 versions of the programs described in Doing bayesian data analysis by John K. Kruschke

Doing_bayesian_data_analysis This repository contains the Python version of the R programs described in the great book Doing bayesian data analysis (f

851 Dec 27, 2022

Metaflow is a human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects

Metaflow Metaflow is a human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects. Metaflow

6.3k Jan 03, 2023

Book on Julia for Data Science

Related tags

Overview

Julia Data Science

Open source and open access book for data science in Julia.

LICENSE

Comments

Preview

Releases(edition-1)

edition-1(Oct 31, 2021)

Owner

Julia Data Science

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Aesara

A computer algebra system written in pure Python

Wikidata scholarly profiles

Book on Julia for Data Science

PennyLane is a cross-platform Python library for differentiable programming of quantum computers.

Animation engine for explanatory math videos

Datamol is a python library to work with molecules

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Doing bayesian data analysis - Python/PyMC3 versions of the programs described in Doing bayesian data analysis by John K. Kruschke

Metaflow is a human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects

artisan: visual scope for coffee roasters

PsychoPy is an open-source package for creating experiments in behavioral science.

Open Delmic Microscope Software

Mathics is a general-purpose computer algebra system (CAS). It is an open-source alternative to Mathematica

An open-source application for biological image analysis

3D visualization of scientific data in Python

A framework for feature exploration in Data Science

A modular single-molecule analysis interface

SeqLike - flexible biological sequence objects in Python

AnuGA for the simulation of the shallow water equation