Python Markov Chain chatbot running on Telegram

Overview

Hanasubot

Hanasubot (Japanese 話すボット, talking bot) is a Python chatbot running on Telegram. The bot is based on Markov Chains so it can learn your word instantly, unlike neural network chatbots which require training. It uses a modified version of markovify library for that purporse. However, the output may not make sense at all, though it can sometimes generate hilarious replies.

In theory, the bot can learn in any languages, but for some languages word segmentation is required. The bot currently supports Chinese and Japanese word segmentation, with pkuseg, CkipTagger and mecab. Language detection relies on pycld2.

Hanasubot has a permission system so you can easily stop the bot learning from naughty kids in your group, while still reply them. Users with admin right can erase lines from bot corpus as well.

The bot is designed for Chinese Telegram groups so there are a lot of messages written in Chinese. I18n will happen in future and any help is welcome.

Installation

Python 3.6+ is required.

VENV_PATH=/path/to/your/venv  # Change this
python3 -m venv $VENV_PATH
source $VENV_PATH/bin/activate

pip3 install -r requirements.txt

If you are using Python 3.6, dataclasses 0.8 is required as well:

pip3 install dataclasses==0.8

For Python 3.7 and up, dataclasses is included so no need to install it.

To use CkipTagger for Traditional Chinese tokenization, you have to download the model file (see CkipTagger readme for a detailed guide):

python3 -c "from ckiptagger import data_utils; data_utils.download_data_gdown('./')"

Then unzip to a folder named ckipdata, in the same directory as the Python scripts.

Optionally, you can initialize the user dict for pkuseg and CkipTagger, before start running the bot:

touch ./pkuseg_dict.txt
touch ./ckip_dict.json

Configuration

Copy config.example.py and fill it out. Please check the comments in config file.

cp config.example.py config.py

After that, simply start the bot:

python3 tgbot.py

Bot commands and usage

Simply reply to the bot and it will say some random words if you have collected enough corpus. The bot will also learn from your message instantly. Special commands are as follows.

Require root

  • /reload_config - Reload config file without restarting the bot. Some entries cannot be dynamically reloaded though, see config.example.py for details.

Require admin

  • /erase - Remove lines from corpus. (Non-admins can only erase lines sent by themselves.)
  • /userweight - Set user weight.
  • /ban - Set user right to -1.
  • /restrict - Set user right to 1.
  • /grantnormal - Set user right to 2.
  • /granttrusted -Set user right to 3.
  • /grantadmin - Set user right to 4. Admins are able to add/remove other admins with above commands. See also the user right levels section.

Require trusted

  • /addword_cn - Add a word into pkuseg user dictionary.
  • /addword_tw - Add a word into CkipTagger user dictionary.
  • /rmword_cn - Remove a word from pkuseg user dictionary.
  • /rmword_tw - Remove a word from CkipTagger user dictionary.

Other commands

  • /clddbg - Test language detection of some texts.
  • /cutdbg - Test tokenization of some texts.
  • /policy - See what data is collected by the bot and so on.
  • /reload - Claim your admin rights after you get Telegram group admin.
  • /source - See the source code.
  • /start - Start chatting, useful when you can't find the bot messages to reply.

Database

Initialize

CREATE TABLE IF NOT EXISTS chat(
    chat_id integer PRIMARY KEY,
    chat_tgid integer NOT NULL UNIQUE,
    chat_name text
);
CREATE TABLE IF NOT EXISTS user(
    user_id integer PRIMARY KEY,
    user_tgid integer NOT NULL UNIQUE,
    user_name text,
    user_right integer DEFAULT 2,
    user_weight real DEFAULT 1.0
);
CREATE TABLE IF NOT EXISTS corpus(
    corpus_id integer PRIMARY KEY,
    corpus_time integer,
    corpus_line text NOT NULL UNIQUE,
    corpus_raw integer REFERENCES raw,
    corpus_chat integer REFERENCES chat,
    corpus_user integer REFERENCES user,
    corpus_weight real DEFAULT 1.0
);
CREATE TABLE IF NOT EXISTS raw(
    raw_id integer PRIMARY KEY,
    raw_text text UNIQUE
);

User right levels

  • 5 - root.
  • 4 - admin, can change user rights (except root users), can erase a line from corpus, and can set user_weight and corpus_weight (WIP).
  • 3 - trusted user, can feed the bot via private messages, and can add words into dictionary (for tokenization purposes).
  • 2 - normal user.
  • 1 - restricted user, bot will not write their messages into database.
  • -1 - banned user, bot will not reply to their messages.

TODOs

  • Let admins set corpus_weight
  • Batch /erase

License

MIT

Dodo - A graphical, hackable email client based on notmuch

Dodo Dodo is a graphical email client written in Python/PyQt5, based on the comm

Aleks Kissinger 44 Nov 12, 2022
Palo Alto Networks PAN-OS SDK for Python

Palo Alto Networks PAN-OS SDK for Python The PAN-OS SDK for Python (pan-os-python) is a package to help interact with Palo Alto Networks devices (incl

Palo Alto Networks 281 Dec 09, 2022
This Mirror Bot is a multipurpose Telegram Bot writen in Python for mirroring files on the Internet to our beloved Google Drive.

MIRROR HUNTER This Mirror Bot is a multipurpose Telegram Bot writen in Python for mirroring files on the Internet to our beloved Google Drive. Repo la

anime republic 130 May 28, 2022
Python library to download market data via Bloomberg, Eikon, Quandl, Yahoo etc.

findatapy findatapy creates an easy to use Python API to download market data from many sources including Quandl, Bloomberg, Yahoo, Google etc. using

Cuemacro 1.3k Jan 04, 2023
Find songs by lyrics.

LyricSearch Hi, welcome to LyricSearch - a simple (Yes), fast (Maybe), and powerful (Approach) lyric search engine. We support Three search methods to

Dicer_ 1 Dec 13, 2021
Astro Bot With Golang

Astro-Bot Features: Astronomy Picture of the day Horoscope People In Space How we built it Our team was broken, one person didn't do anything the othe

Vaarun Sinha 1 Nov 21, 2021
Automatically download any NFT collection from OpenSea.

OpenSea NFT Stealer The sole purpose of this script is to download any NFT collection from OpenSea. How does it work? Basically, the OpenSea website a

Dan 111 Dec 29, 2022
Neko is An Anime themed advance Telegram group management bot.

NekoRobot A modular telegram Python bot running on python3 with an sqlalchemy, mongodb database. ╒═══「 Status 」 Maintained Support Group Included Free

Lovely Prince 11 Oct 11, 2022
A pixeldrain python package using pixeldrain official api

Made with Python3 (C) @FayasNoushad Copyright permission under MIT License License - https://github.com/FayasNoushad/Pixeldrain/blob/main/LICENSE In

Fayas Noushad 6 Jan 26, 2022
Forward Propagation, Backward Regression and Pose Association for Hand Tracking in the Wild (CVPR 2022)

HandLer This repository contains the code and data for the following paper: Forward Propagation, Backward Regression, and Pose Association for Hand Tr

<a href=[email protected]"> 17 Oct 02, 2022
A simple language translator with python and google translate api

Language translator with python A simple language translator with python and google translate api Install pip and python 3.9. All the required depende

0 Nov 11, 2021
Script for polybar to display and control media(not only Spotify) using DBus.

polybar-now-playing Script for polybar to display and control media(not only Spotify) using DBus Python script to display and control current playing

Dope Wizard 48 Dec 31, 2022
AWS Auto Inventory allows you to quickly and easily generate inventory reports of your AWS resources.

Photo by Denny Müller on Unsplash AWS Automated Inventory ( aws-auto-inventory ) Automates creation of detailed inventories from AWS resources. Table

AWS Samples 123 Dec 26, 2022
A Telegram bot to download from Youtube server.

IDN-YoutubeDL-Bot A Telegram bot to download from Youtube server. Configs 📖 API_ID - Your APP ID. Get it from my.telegram.org API_HASH - Your API_HAS

IDNCoderX 4 Dec 02, 2022
The official source code for Ghost Discord selfbot.

👻 Ghost Selfbot The official code for Ghost which was recently discontinued and released to the public. Feel free to use any of the code found in thi

Ghost 121 Nov 09, 2022
Telegram Bot for updating ongoing matches of Fotmob.com in channel by @AbirHasan2005

Fotmob-Bot A very simple Telegram Bot which will update ongoing matches of Fotmob in a channel. Demo Channel Configs API_ID - Get this from @TeleORG_B

Abir Hasan 22 Oct 21, 2022
A Python module for communicating with the Twilio API and generating TwiML.

twilio-python The default branch name for this repository has been changed to main as of 07/27/2020. Documentation The documentation for the Twilio AP

Twilio 1.6k Jan 05, 2023
A python script that automatically farms the Discord bot 'Dank Memer'.

Dank Farmer A python script that automatically farms the Discord bot 'Dank Memer'. Requirements pynput Disclaimer DO NOT use if you are not willing to

2 Dec 30, 2021
An Telegram Bot By @AsmSafone To Stream Videos in Telegram Voice Chat. This is Also The Source Code of The Bot Which is Being Used In @SafoTheBot Group! ❤️

Telegram Video Player Bot (Beta) An Telegram Bot By @AsmSafone To Stream Videos in Telegram Voice Chat. Special Features Supports Live Streaming From

SAF ONE 206 Jan 03, 2023
Cookiecutter templates for Serverless applications using AWS SAM and the Rust programming language.

Cookiecutter SAM template for Lambda functions in Rust This is a Cookiecutter template to create a serverless application based on the Serverless Appl

AWS Samples 24 Nov 11, 2022