A Python package that can be used to download post and comment data from Reddit.

Overview

Reddit Data Collector

Reddit Data Collector is a Python package that allows a user to collect post and comment data from Reddit. It is built on top of the Python module PRAW, which stands for "The Python Reddit API Wrapper". It aims to make it very simple for a user to collect data from Reddit for further analysis (e.g. Natural Language Processing), without having to learn the inner workings of PRAW or the Reddit API.

The main functionalities provided by the package currently include:

  1. Ability to collect a sample of post data and comment data from Reddit by simply providing the subreddit names that you wish to collect data from.

  2. Ability to convert that data into a pandas DataFrame in order to inspect it and save it for further use.

  3. Ability to seamlessly update an existing .csv file that contains some sample data collected with the package in the past, with some new sample data that is also collected with the package.

It is currently maintained by Nico Van den Hooff.

Installation

Dependencies

Reddit Data Collector requires Python and:

  • pandas (>=1.3.5)
  • praw (>=7.5.0)
  • tqdm (>=4.62.3)

User installation

The recommended way to install Reddit Data Collector is using pip:

pip install reddit-data-collector

How to Use Reddit Data Collector

Please see the examples directory for step by step instructions on how to use Reddit Data Collector.

Development

Important links

Source code

You can check the latest sources with the command:

git clone https://github.com/nicovandenhooff/reddit-data-collector.git

Contributing

To learn more about making a contribution to Reddit Data Collector, please see the contributing file.

Potential Ideas for Contribution

  • Add ability to collect images from Reddit posts that contain them.
  • Add author information to post and comment data, currently the Reddit API is inconsistent with suspended and deleted author data, so this functionality has not been built in yet.

Testing

After installation, you can launch the test suite, which is contained in the tests/tests.py. Note that you will have to have pytest >= 6.2.5 installed. You can launch the test suite by following these steps from the projects root directory:

  1. Open up tests.py with the following command:
open tests/tests.py

Comment out lines 24 to 30. Change the values in DataCollector() in line 32 to your Reddit credentials.

  1. Run the following command:
pytest tests/test.py

Project History

The project was started in January 2022 by Nico Van den Hooff as a side project while he was completing the UBC Master of Data Science Project. Nico wanted to obtain a sample of posts and comments from Reddit, but noticed that while PRAW existed and provided seamless access to Reddit's API, there was no package available that allowed for a simple method to collect this data.

Inspiration

Certain sections of this README file was inspired by the scikit-learn README.

You might also like...
Auto Join: A GitHub action script to automatically invite everyone to the organization who comment at the issue page.

Auto Invite To Org By Issue Comment A GitHub action script to automatically invite everyone to the organization who comment at the issue page. What is

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.
Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker. Unlimited RajeLiker Credit Hack. Thanks To RajeLiker.

A simple Discord bot that can fetch definitions and post them in chat.
A simple Discord bot that can fetch definitions and post them in chat.

A simple Discord bot that can fetch definitions and post them in chat. If you are connected to a voice channel, the bot will also read out the definition to you.

A simple fun discord bot using discord.py that can post memes

A simple fun discord bot using discord.py * * Commands $commands - to see all commands $meme - for a random meme from the internet $cry - to make the

One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind them.

AwesomeVersion One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind

A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel
A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel

CovidLive AU Summary Slackbot This bot is a very simple slackbot that pulls data, summarises and posts up to date AU COVID stats to a provided slack c

Track live sentiment for stocks from Reddit and Twitter and identify growing stocks
Track live sentiment for stocks from Reddit and Twitter and identify growing stocks

Market Sentiment About This repository can mainly be used for two things. a. Tracking the live sentiment of stocks from Reddit and Twitter b. Tracking

A reddit.com bot that will return reference links from official python documentation site for the standard library.

Python Docs Bot A reddit.com bot that will return documentation links for the library and language reference sections of the python docs website. The

A Python bot that uses the Reddit API to send users inspiring messages.

AnonBot By Edric Antoine A Python bot that uses the Reddit API to send users inspiring messages. When a message includes 'What would Anon do?', the bo

Releases(v1.1.0)
  • v1.1.0(Mar 14, 2022)

    [1.1.0] - 2022-03-13

    Changed

    • Changed get_data in reddit_data_collector.py to return pandas DataFrame by default
    • Updated tests for the above
    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jan 15, 2022)

    [1.0.2] - 2022-01-14

    Fixed

    • Updated _check_subreddit_exists in reddit_data_collector.py to check both names as .lower()
    • Updated tests for the above

    Changed

    • Updated README to include instructions on coverage tests
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Jan 12, 2022)

    [1.0.1] - 2022-01-12

    Fixed

    • Spelling error of separate argument in to_pandas function of reddit_data_collector.io.py, previously it was spelt like seperate

    Changed

    • Update example use and move to /examples
    • Update PyPi link in docs to working link
    • Add new potential ideas for contribution
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jan 7, 2022)

Owner
Nico Van den Hooff
UBC Master of Data Science
Nico Van den Hooff
Bot Telegram per creare e gestire un Babbo Natale Segreto con amici ecc

Babbo Natale Segreto: Telegram Bot Bot Telegram per creare e gestire un Babbo Natale Segreto con amici ecc. Che cos'è? Il Babbo Natale Segreto è un gi

Francesco Ciociola 2 Jul 18, 2022
Program that automates the bump of the Disboard Bot. Done 100% in Python with PyAutoGUI library

Auto-Discord-Bump Program that automates the bump of the Disboard Bot done 100% in python with PyAutoGUI How to configue You will need 3 things before

Mateus 1 Dec 19, 2021
Twitter Analysis of MIUUL CEO

Twitter Analysis of MIUUL CEO Business Problem I got last @mvahitkeskin 184 twee

Çağrı Karadeniz 6 Mar 12, 2022
ANKIT-OS/STYLISH-TEXT is a special repository. Its Is A Telegram Bot Which Can Translate Your Text Into 100+ Language

🔥 ᴳᴼᴼᴳᴸᴱ⁻ᵀᴿᴬᴺᔆᴸᴬᵀᴱᴿ 🔥 The owner would not be responsible for any kind of bans due to the bot. • ⚡ INSTALLING ⚡ • • 🛠️ Lᴀɴɢᴜᴀɢᴇs Aɴᴅ Tᴏᴏʟs 🔰 • If

ANKIT KUMAR 1 Dec 23, 2021
A simple Discord bot that notifies users of new Abitti versions

A simple Discord bot that notifies users of new Abitti versions. New features might be added later on. If you have good ideas, feel free to do a PR.

1 Feb 11, 2022
Ditch Xiaomi's cloud and use a Telegram bot instead

Yi-Home_Telegram_Bot_Interface Ditch Xiaomi's cloud and use a Telegram bot instead Features Motion detection Works by monitoring a tmp file that is cr

Erli 10 Aug 18, 2022
A simple python discord bot which give you a yogurt brand name, basing on a large database often updated.

YaourtBot A discord simple bot by Lopinosaurus Before using this code : ・Move env file to .env ・Change the channel ID on line 38 of bot.py to your #pi

The only one bunny who can dev. 0 May 09, 2022
This bot automaticaly access to giveaway ! You can won free NFT !

This bot automaticaly access to giveaway ! You can won free NFT !

2s.py 28 Oct 20, 2022
This is a crypto trading bot that scans the Binance Annoucements page for new coins, and places trades on Gateio

gateio-trading-bot-binance-announcements This Gateio x Binance cryptocurrency trading bot scans the Binance Announcements page and picks up on new coi

Andrei 1.2k Jan 01, 2023
Notification Reminder Application For Python

Notification-Reminder-Application No matter how well you set up your to-do list and calendar, you aren’t going to get things done unless you have a re

1 Nov 26, 2021
Unofficial Medium Python Flask API and SDK

PyMedium - Unofficial Medium API PyMedium is an unofficial Medium API written in python flask. It provides developers to access to user, post list and

Engine Bai 157 Nov 11, 2022
send sms via grafana alert webhook

notifier fire alarm What does this project do: the aim of this project is to send alarm notification from grafana alert manager via kavenegar api. sta

Ali Soltani 4 Oct 20, 2021
阿里云盘上传脚本

阿里云盘上传脚本 Author:李小恩 Github:https://github.com/Hidove/aliyundrive-uploader 如有侵权,请联系我删除 禁止用于非法用途,违者后果自负 环境要求 python3 使用方法 安装 git clone https://github.co

Hidove 301 Jan 01, 2023
Fastest Tiktok Username checker on site.

Tiktok Username Checker Fastest Tiktok Username checker on site

sql 3 Jun 19, 2021
A Python Client to View F1TV Content the right way

F1Hub is a terminal application running directly on your computer -- no connection to the website needed* *In theory. As of now, the F1TV website is needed for some content

kodos 3 Jun 14, 2022
Using twitter lists as your feed

Twitlists A while ago, Twitter changed their timeline to be algorithmically-fed rather than a simple reverse-chronological feed. In particular, they p

Peyton Walters 5 Nov 21, 2022
IBD Style Relative Strength Percentile Ranking of Stocks (i.e. 0-100 Score).

relative-strength IBD Style Relative Strength Percentile Ranking of Stocks (i.e. 0-100 Score). I also made a TradingView indicator, but it cannot give

57 Jan 06, 2023
Hostapd-mac-monitor - Setup a hostapd AP to conntrol the connections of specific MACs

A brief explanation This script provides way to setup a monitoring service of sp

2 Feb 03, 2022
Pydf: A modular Telegram Bot which provides Pdf Tools using PyPdf2

pyDF-Bot 🌍 Pydf - Pyrogram Document File Bot, a modular Telegram Bot which prov

HyDrix 2 Feb 18, 2022
Convenient script for trading with python.

Convenient script for trading with python.

VladKochetov007 66 Dec 07, 2022