Deploy a simple Multi-Node Clickhouse Cluster with docker-compose in minutes.

Overview

Simple Multi Node Clickhouse Cluster

I hate those single-node clickhouse clusters and manually installation, I mean, why should we:

this is just weird!

So this repo tries to solve these problem.

Note

  • This is a simplified model of Multi Node Clickhouse Cluster, which lacks: LoadBalancer config/Automated Failover/MultiShard Config generation.
  • All clickhouse data is persisted under event-data, if you need to move clickhouse to some other directory, you'll just need to move the directory(that contains docker-compose.yml) and docker-compose up -d to fire it up again.
  • Host network mode is used to simplify the whole deploy procedure, so you might need to create addition firewall rules if you are running this on a public accessible machine.

Prerequisites

To use this, we need docker and docker-compose installed, recommended OS is ubuntu, and it's recommended to install clickhouse-client on machine, so on a typical ubuntu server, doing the following should be sufficient.

apt update
curl -fsSL https://get.docker.com -o get-docker.sh && sh get-docker.sh && rm -f get-docker.sh
apt install docker-compose clickhouse-client -y

Usage

  1. Clone this repo
  2. Edit the necessary server info in topo.yml
  3. Run python3 generate.py
  4. Your cluster info should be in the cluster directory now
  5. Sync those files to related nodes and run docker-compose up -d on them
  6. Your cluster is ready to go

If you still cannot understand what I'm saying above, see the example below.

Example Usage

Edit information

I've Clone the repo and would like to set a 3-master clickhouse cluster and has the following specs

  • 3 replica(one replica on each node)
  • 1 Shard only

So I need to edit the topo.yml as follows:

global:
  clickhouse_image: "yandex/clickhouse-server:21.3.2.5"
  zookeeper_image: "bitnami/zookeeper:3.6.1"

zookeeper_servers:
  - host: 192.168.33.101
  - host: 192.168.33.102
  - host: 192.168.33.103

clickhouse_servers:
  - host: 192.168.33.101
  - host: 192.168.33.102
  - host: 192.168.33.103

clickhouse_topology:
  - clusters:
      - name: "novakwok_cluster"
        shards:
          - name: "novakwok_shard"
            servers:
              - host: 192.168.33.101
              - host: 192.168.33.102
              - host: 192.168.33.103

Generate Config

After python3 generate.py, a structure has been generated under cluster directory, looks like this:

➜  simple-multinode-clickhouse-cluster git:(master) ✗ python3 generate.py 
Write clickhouse-config.xml to cluster/192.168.33.101/clickhouse-config.xml
Write clickhouse-config.xml to cluster/192.168.33.102/clickhouse-config.xml
Write clickhouse-config.xml to cluster/192.168.33.103/clickhouse-config.xml

➜  simple-multinode-clickhouse-cluster git:(master) ✗ tree cluster 
cluster
├── 192.168.33.101
│   ├── clickhouse-config.xml
│   ├── clickhouse-user-config.xml
│   └── docker-compose.yml
├── 192.168.33.102
│   ├── clickhouse-config.xml
│   ├── clickhouse-user-config.xml
│   └── docker-compose.yml
└── 192.168.33.103
    ├── clickhouse-config.xml
    ├── clickhouse-user-config.xml
    └── docker-compose.yml

3 directories, 9 files

Sync Config

Now we need to sync those files to related hosts(of course you can use ansible here):

rsync -aP ./cluster/192.168.33.101/ [email protected]:/root/ch/
rsync -aP ./cluster/192.168.33.102/ [email protected]:/root/ch/
rsync -aP ./cluster/192.168.33.103/ [email protected]:/root/ch/

Start Cluster

Now run docker-compose up -d on every hosts' /root/ch/ directory.

Validation

On 192.168.33.101, use clickhouse-client to connect to local instance and check if cluster is there.

[email protected]:~/ch# clickhouse-client 
ClickHouse client version 18.16.1.
Connecting to localhost:9000.
Connected to ClickHouse server version 21.3.2 revision 54447.

192-168-33-101 :) SELECT * FROM system.clusters;

SELECT *
FROM system.clusters 

┌─cluster──────────────────────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name──────┬─host_address───┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ novakwok_cluster                             │         1 │            1 │           1 │ 192.168.33.101 │ 192.168.33.101 │ 9000 │        1 │ default │                  │            0 │                       0 │
│ novakwok_cluster                             │         1 │            1 │           2 │ 192.168.33.102 │ 192.168.33.102 │ 9000 │        0 │ default │                  │            0 │                       0 │
│ novakwok_cluster                             │         1 │            1 │           3 │ 192.168.33.103 │ 192.168.33.103 │ 9000 │        0 │ default │                  │            0 │                       0 │
│ test_cluster_two_shards                      │         1 │            1 │           1 │ 127.0.0.1      │ 127.0.0.1      │ 9000 │        1 │ default │                  │            0 │                       0 │
│ test_cluster_two_shards                      │         2 │            1 │           1 │ 127.0.0.2      │ 127.0.0.2      │ 9000 │        0 │ default │                  │            0 │                       0 │
│ test_cluster_two_shards_internal_replication │         1 │            1 │           1 │ 127.0.0.1      │ 127.0.0.1      │ 9000 │        1 │ default │                  │            0 │                       0 │
│ test_cluster_two_shards_internal_replication │         2 │            1 │           1 │ 127.0.0.2      │ 127.0.0.2      │ 9000 │        0 │ default │                  │            0 │                       0 │
│ test_cluster_two_shards_localhost            │         1 │            1 │           1 │ localhost      │ 127.0.0.1      │ 9000 │        1 │ default │                  │            0 │                       0 │
│ test_cluster_two_shards_localhost            │         2 │            1 │           1 │ localhost      │ 127.0.0.1      │ 9000 │        1 │ default │                  │            0 │                       0 │
│ test_shard_localhost                         │         1 │            1 │           1 │ localhost      │ 127.0.0.1      │ 9000 │        1 │ default │                  │            0 │                       0 │
│ test_shard_localhost_secure                  │         1 │            1 │           1 │ localhost      │ 127.0.0.1      │ 9440 │        0 │ default │                  │            0 │                       0 │
│ test_unavailable_shard                       │         1 │            1 │           1 │ localhost      │ 127.0.0.1      │ 9000 │        1 │ default │                  │            0 │                       0 │
│ test_unavailable_shard                       │         2 │            1 │           1 │ localhost      │ 127.0.0.1      │    1 │        0 │ default │                  │            0 │                       0 │
└──────────────────────────────────────────────┴───────────┴──────────────┴─────────────┴────────────────┴────────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘
↘ Progress: 13.00 rows, 1.58 KB (4.39 thousand rows/s., 532.47 KB/s.) 
13 rows in set. Elapsed: 0.003 sec. 

Let's create a DB with replica:

192-168-33-101 :) create database novakwok_test on cluster novakwok_cluster;

CREATE DATABASE novakwok_test ON CLUSTER novakwok_cluster

┌─host───────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ 192.168.33.103 │ 9000 │      0 │       │                   2 │                0 │
│ 192.168.33.101 │ 9000 │      0 │       │                   1 │                0 │
│ 192.168.33.102 │ 9000 │      0 │       │                   0 │                0 │
└────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
← Progress: 3.00 rows, 174.00 B (16.07 rows/s., 931.99 B/s.)  99%
3 rows in set. Elapsed: 0.187 sec. 

192-168-33-101 :) show databases;

SHOW DATABASES

┌─name──────────┐
│ default       │
│ novakwok_test │
│ system        │
└───────────────┘
↑ Progress: 3.00 rows, 479.00 B (855.61 rows/s., 136.61 KB/s.) 
3 rows in set. Elapsed: 0.004 sec. 

Connect to another host to see if it's really working.

[email protected]:~/ch# clickhouse-client -h 192.168.33.102
ClickHouse client version 18.16.1.
Connecting to 192.168.33.102:9000.
Connected to ClickHouse server version 21.3.2 revision 54447.

192-168-33-102 :) show databases;

SHOW DATABASES

┌─name──────────┐
│ default       │
│ novakwok_test │
│ system        │
└───────────────┘
↘ Progress: 3.00 rows, 479.00 B (623.17 rows/s., 99.50 KB/s.) 
3 rows in set. Elapsed: 0.005 sec. 

License

GPL

Owner
Nova Kwok
43EC 6073 0BFF A16C 34BB 9EF2 8D42 A0E6 99E5 0639
Nova Kwok
Wiremind Kubernetes helper

Wiremind Kubernetes helper This Python library is a high-level set of Kubernetes Helpers allowing either to manage individual standard Kubernetes cont

Wiremind 3 Oct 09, 2021
Python job scheduling for humans.

schedule Python job scheduling for humans. Run Python functions (or any other callable) periodically using a friendly syntax. A simple to use API for

Dan Bader 10.4k Jan 02, 2023
Docker Container wallstreetbets-sentiment-analysis

Docker Container wallstreetbets-sentiment-analysis A docker container using restful endpoints exposed on port 5000 "/analyze" to gather sentiment anal

145 Nov 22, 2022
🎡 Build Python wheels for all the platforms on CI with minimal configuration.

cibuildwheel Documentation Python wheels are great. Building them across Mac, Linux, Windows, on multiple versions of Python, is not. cibuildwheel is

Python Packaging Authority 1.3k Jan 02, 2023
Rundeck / Grafana / Prometheus / Rundeck Exporter integration demo

Rundeck / Prometheus / Grafana integration demo via Rundeck Exporter This is a demo environment that shows how to monitor a Rundeck instance using Run

Reiner 4 Oct 14, 2022
HXVM - Check Host compatibility with the Virtual Machines

HXVM - Check Host compatibility with the Virtual Machines. Features | Installation | Usage Features Takes input from user to compare how many VMs they

Aman Srivastava 4 Oct 15, 2022
This project shows how to serve an TF based image classification model as a web service with TFServing, Docker, and Kubernetes(GKE).

Deploying ML models with CPU based TFServing, Docker, and Kubernetes By: Chansung Park and Sayak Paul This project shows how to serve a TensorFlow ima

Chansung Park 104 Dec 28, 2022
a CLI that provides a generic automation layer for assessing the security of ML models

Counterfit About | Getting Started | Learn More | Acknowledgments | Contributing | Trademarks | Contact Us -------------------------------------------

Microsoft Azure 575 Jan 02, 2023
Organizing ssh servers in one shell.

NeZha (哪吒) NeZha is a famous chinese deity who can have three heads and six arms if he wants. And my NeZha tool is hoping to bring developer such mult

Zilin Zhu 8 Dec 20, 2021
Define and run multi-container applications with Docker

Docker Compose Docker Compose is a tool for running multi-container applications on Docker defined using the Compose file format. A Compose file is us

Docker 28.2k Jan 08, 2023
Bitnami Docker Image for Python using snapshots for the system packages repositories

Python Snapshot packaged by Bitnami What is Python Snapshot? Python is a programming language that lets you work quickly and integrate systems more ef

Bitnami 1 Jan 13, 2022
Bugbane - Application security tools for CI/CD pipeline

BugBane Набор утилит для аудита безопасности приложений. Основные принципы и осо

GardaTech 20 Dec 09, 2022
Automatically capture your Ookla Speedtest metrics and display them in a Grafana dashboard

Speedtest All-In-One Automatically capture your Ookla Speedtest metrics and display them in a Grafana dashboard. Getting Started About This Code This

Aaron Melton 2 Feb 22, 2022
Bash-based Python-venv convenience wrapper

venvrc Bash-based Python-venv convenience wrapper. Demo Install Copy venvrc file to ~/.venvrc, and add the following line to your ~/.bashrc file: # so

1 Dec 29, 2022
Changelog CI is a GitHub Action that enables a project to automatically generate changelogs

What is Changelog CI? Changelog CI is a GitHub Action that enables a project to automatically generate changelogs. Changelog CI can be triggered on pu

Maksudul Haque 106 Dec 25, 2022
Flexible and scalable monitoring framework

Presentation of the Shinken project Welcome to the Shinken project. Shinken is a modern, Nagios compatible monitoring framework, written in Python. It

Gabès Jean 1.1k Dec 18, 2022
Push Container Image To Docker Registry In Python

push-container-image-to-docker-registry 概要 push-container-image-to-docker-registry は、エッジコンピューティング環境において、特定のエッジ端末上の Private Docker Registry に特定のコンテナイメー

Latona, Inc. 3 Nov 04, 2021
This Docker container is build to run on a server an provide an easy to use interface for every student to vote for their councilors

This Docker container is build to run on a server and provide an easy to use interface for every student to vote for their councilors.

Robin Adelwarth 7 Nov 23, 2022
A collection of beginner-friendly DevOps content

mansion Mansion is just a testing repo for learners to commit into open source project. These are the steps you need to learn: Please do not edit thes

Bryan Lim 62 Nov 30, 2022
A curated list of awesome DataOps tools

Awesome DataOps A curated list of awesome DataOps tools. Awesome DataOps Data Catalog Data Exploration Data Ingestion Data Lake Data Processing Data Q

Kelvin S. do Prado 40 Dec 23, 2022