MongoDB utility to inflate the contents of small collection to a new larger collection

Overview

MongoDB Data Inflater ("data-inflater")

The data-inflater tool is a MongoDB utility to automate the creation of a new large database collection using data sourced from an existing smaller database collection.

By default, the utility will use the Atlas 'sample data set' database collection sample_mflix.movies as the source collection. However, most users will provide parameters to the utility to specify the use of their own database and source collection. If you do want to use the Atlas sample data set, see the sample data manual page for more information.

The data-inflater utility issues multiple concurrent aggregation processes, each copying batches of records in parallel for increased performance. The resulting collection will contain documents with duplicated data but with new unique _id field values. The variance ratio of data in the new collection will approximately reflect the variance ratio of the source collection. Therefore, you should ensure you have supplied at least a few different documents (if not a few hundred or thousand) in the source collection.

If you are running a sharded cluster, the utility will ensure the target collection is sharded with a shard key, and where it can, it will pre-split the chunks to avoid subsequent needless balancer overhead. For example, if you specify the --shardkey parameter for this utility to reference a field (e.g. product_name) as the range based shard key, before creating the target collection, the utility will introspect the spread of values for the shard key field (e.g. product_name). The utility will then create pre-split chunks in the new empty target collection before any data is copied to it, to maximise performance.

How To Run

In a running MongoDB cluster (self-managed or running in Atlas), ensure you have created and populated a source collection with at least one sample record in it (ideally more with varying values for the fields across the different documents to reflect the shape and variance you desire).

Ensure Python3 (version 3.8 or greater) and the MongoDB Python Driver (PyMongo) are already installed on your workstation. Example to install PyMongo:

pip3 install --user pymongo

Ensure the .py script is executable and then execute the following to view the utility's help instructions and the full list of parameters that you can provide:

./data-inflater.py -h

Execute the following to connect to a locally running single server database (default port) to copy and expand the data from an existing source collection, mydb.mySrcColl, to an a new collection, mydb.myDestColl, which will contain 1 million records:

./data-inflater.py --url 'mongodb://localhost:27017' -d 'mydb' -c 'mySrcColl' -t 'myDestColl' -s 1000000

Execute the following to connect to an Atlas cluster (ensure you've already loaded the Atlas sample data set), to inflate the data from the source movies collection to the new movies_big collection, which will contain 100 million records (note, first change the URL username, password and hostname shown, to match the URL of your Atlas cluster):

./data-inflater.py --url 'mongodb+srv://usr:[email protected]/'
Owner
Paul Done
Paul Done
Report Bobcat Status to Google Sheets

bobcat-status-reporter Report Bobcat Status to Google Sheets Why? I recently relocated my miner from my root into the attic. Bobcat recommends operati

Jasmit Tarang 3 Sep 22, 2021
A simple gpsd client and python library.

gpsdclient A small and simple gpsd client and library Installation Needs Python 3 (no other dependencies). If you want to use the library, use pip: pi

Thomas Feldmann 33 Nov 24, 2022
EthTx - Ethereum transactions decoder

EthTx - Ethereum transactions decoder Installation pip install ethtx Requirements The package needs a few external resources, defined in EthTxConfig o

398 Dec 25, 2022
Modest utility collection for development with AIOHTTP framework.

aiohttp-things Modest utility collection for development with AIOHTTP framework. Documentation https://aiohttp-things.readthedocs.io Installation Inst

Ruslan Ilyasovich Gilfanov 0 Dec 11, 2022
Hide new MacBook Pro notch with black wallpaper.

Hide new MacBook Pro notch with black wallpaper.

Wang Chao 1 Oct 27, 2021
convert a dict-list object from / to a typed object(class instance with type annotation)

objtyping 带类型定义的对象转换器 由来 Python不是强类型语言,开发人员没有给数据定义类型的习惯。这样虽然灵活,但处理复杂业务逻辑的时候却不够方便——缺乏类型检查可能导致很难发现错误,在IDE里编码时也没

Song Hui 15 Dec 22, 2022
Extends the pyranges module with operations on joined genomic intervals

tiedpyranges Extends the pyranges module with operations on joined genomic intervals (e.g. exons of same transcript) Install with: pip install tiedpyr

Marco Mariotti 4 Aug 05, 2022
Edit SRT files to delay subtitle time-stamps.

subtitle-delay A program written in Python that directly edits SRT file to delay the subtitles. Features: Will throw an error if delaying with negativ

8 Jul 17, 2022
'ToolBurnt' A Set Of Tools In One Place =}

'ToolBurnt' A Set Of Tools In One Place =}

MasterBurnt 5 Sep 10, 2022
A simple and easy to use collection of random python functions.

A simple and easy to use collection of random python functions.

Diwan Mohamed Faheer 1 Nov 17, 2021
Manage your exceptions in Python like a PRO

A linter to manage all your python exceptions and try/except blocks (limited only for those who like dinosaurs).

Guilherme Latrova 353 Dec 31, 2022
Exports the local variables into a global dictionary for later debugging.

PyExfiltrator Julia’s @exfiltrate for Python; Exports the local variables into a global dictionary for later debugging. Installation pip install pyexf

6 Nov 07, 2022
A Python script that parses and checks public proxies. Multithreading is supported.

A Python script that parses and checks public proxies. Multithreading is supported.

LevPrav 7 Nov 25, 2022
Homebase Name Changer for Fortnite: Save the World.

Homebase Name Changer This program allows you to change the Homebase name in Fortnite: Save the World. How to use it? After starting the HomebaseNameC

PRO100KatYT 7 May 21, 2022
MicroMIUI - Script to optimize miui and not only

MicroMIUI - Script to optimize miui and not only

Groiznyi-Studio 1 Nov 02, 2021
The Black shade analyser and comparison tool.

diff-shades The Black shade analyser and comparison tool. AKA Richard's personal take at a better black-primer (by stealing ideas from mypy-primer) :p

Richard Si 10 Apr 29, 2022
Abby's Left Hand Modifiers Dictionary

Abby's Left Hand Modifiers Dictionary Design This dictionary is inspired by and

12 Dec 08, 2022
✨ Voici un code en Python par moi, et en français qui permet de générer du texte Lorem.

Lorem Gen ❗ Voici un code en Python par moi, et en français qui permet de générer du texte Lorem. Dépendences : pip install lorem_text 💖 Enjoy 🎫 Mon

MrGabin 3 Jun 07, 2021
✨ Un bot Twitter totalement fait en Python par moi, et en français.

Twitter Bot ❗ Un bot Twitter totalement fait en Python par moi, et en français. Il faut remplacer auth = tweepy.OAuthHandler(consumer_key, consumer_se

MrGabin 3 Jun 06, 2021
Similar looking domain detection using python fuzzywuzzy

Major cause of phishing and BEC incident is similar looking domain, if you detect it early, you can prevent incidents early, python fuzzywuzzy module let you do that

2 Nov 07, 2021