Python Markov Chain chatbot running on Telegram

Overview

Hanasubot

Hanasubot (Japanese 話すボット, talking bot) is a Python chatbot running on Telegram. The bot is based on Markov Chains so it can learn your word instantly, unlike neural network chatbots which require training. It uses a modified version of markovify library for that purporse. However, the output may not make sense at all, though it can sometimes generate hilarious replies.

In theory, the bot can learn in any languages, but for some languages word segmentation is required. The bot currently supports Chinese and Japanese word segmentation, with pkuseg, CkipTagger and mecab. Language detection relies on pycld2.

Hanasubot has a permission system so you can easily stop the bot learning from naughty kids in your group, while still reply them. Users with admin right can erase lines from bot corpus as well.

The bot is designed for Chinese Telegram groups so there are a lot of messages written in Chinese. I18n will happen in future and any help is welcome.

Installation

Python 3.6+ is required.

VENV_PATH=/path/to/your/venv  # Change this
python3 -m venv $VENV_PATH
source $VENV_PATH/bin/activate

pip3 install -r requirements.txt

If you are using Python 3.6, dataclasses 0.8 is required as well:

pip3 install dataclasses==0.8

For Python 3.7 and up, dataclasses is included so no need to install it.

To use CkipTagger for Traditional Chinese tokenization, you have to download the model file (see CkipTagger readme for a detailed guide):

python3 -c "from ckiptagger import data_utils; data_utils.download_data_gdown('./')"

Then unzip to a folder named ckipdata, in the same directory as the Python scripts.

Optionally, you can initialize the user dict for pkuseg and CkipTagger, before start running the bot:

touch ./pkuseg_dict.txt
touch ./ckip_dict.json

Configuration

Copy config.example.py and fill it out. Please check the comments in config file.

cp config.example.py config.py

After that, simply start the bot:

python3 tgbot.py

Bot commands and usage

Simply reply to the bot and it will say some random words if you have collected enough corpus. The bot will also learn from your message instantly. Special commands are as follows.

Require root

  • /reload_config - Reload config file without restarting the bot. Some entries cannot be dynamically reloaded though, see config.example.py for details.

Require admin

  • /erase - Remove lines from corpus. (Non-admins can only erase lines sent by themselves.)
  • /userweight - Set user weight.
  • /ban - Set user right to -1.
  • /restrict - Set user right to 1.
  • /grantnormal - Set user right to 2.
  • /granttrusted -Set user right to 3.
  • /grantadmin - Set user right to 4. Admins are able to add/remove other admins with above commands. See also the user right levels section.

Require trusted

  • /addword_cn - Add a word into pkuseg user dictionary.
  • /addword_tw - Add a word into CkipTagger user dictionary.
  • /rmword_cn - Remove a word from pkuseg user dictionary.
  • /rmword_tw - Remove a word from CkipTagger user dictionary.

Other commands

  • /clddbg - Test language detection of some texts.
  • /cutdbg - Test tokenization of some texts.
  • /policy - See what data is collected by the bot and so on.
  • /reload - Claim your admin rights after you get Telegram group admin.
  • /source - See the source code.
  • /start - Start chatting, useful when you can't find the bot messages to reply.

Database

Initialize

CREATE TABLE IF NOT EXISTS chat(
    chat_id integer PRIMARY KEY,
    chat_tgid integer NOT NULL UNIQUE,
    chat_name text
);
CREATE TABLE IF NOT EXISTS user(
    user_id integer PRIMARY KEY,
    user_tgid integer NOT NULL UNIQUE,
    user_name text,
    user_right integer DEFAULT 2,
    user_weight real DEFAULT 1.0
);
CREATE TABLE IF NOT EXISTS corpus(
    corpus_id integer PRIMARY KEY,
    corpus_time integer,
    corpus_line text NOT NULL UNIQUE,
    corpus_raw integer REFERENCES raw,
    corpus_chat integer REFERENCES chat,
    corpus_user integer REFERENCES user,
    corpus_weight real DEFAULT 1.0
);
CREATE TABLE IF NOT EXISTS raw(
    raw_id integer PRIMARY KEY,
    raw_text text UNIQUE
);

User right levels

  • 5 - root.
  • 4 - admin, can change user rights (except root users), can erase a line from corpus, and can set user_weight and corpus_weight (WIP).
  • 3 - trusted user, can feed the bot via private messages, and can add words into dictionary (for tokenization purposes).
  • 2 - normal user.
  • 1 - restricted user, bot will not write their messages into database.
  • -1 - banned user, bot will not reply to their messages.

TODOs

  • Let admins set corpus_weight
  • Batch /erase

License

MIT

Presentation and code files for the talk at PyCon Indonesia

pycon-indonesia Presentation and code files for the talk at PyCon Indonesia. Files used for the PyCon Indonesia presentation. [Directory Includes:] Be

Neeraj Pandey 2 Dec 04, 2021
Bot playing "mathbattle" game from Telegram messenger

mathbattlebot Bot playing mathbattle game from Telegram messenger Installing: run in command line pip3 install -r requirements.txt Running: Example c

Egor 1 May 30, 2022
🤟The VC Music Source code of @DaisyXBot ❤️ v3 Out now

DAISYXMUSIC V3 🎵 A bot that can play music on telegram group's voice call Available on telegram as @DaisyXbot Whats new 🔥 Thumbnail Support Playlist

TeamDaisyX 207 Dec 05, 2022
NFT Generator: A modular NFT generator application

NFT Generator A simple passion project done with the role to learn a bit about h

2 Aug 30, 2022
This is a python bot that automatically logs in, clicks the new button, and sends heroes to work in the bombcrypto game

This is a python bot that automatically logs in, clicks the new button, and sends heroes to work in the bombcrypto game. It is fully open source and free.

856 Jan 04, 2023
An enhanced discord.py, based off of the now-archived discord.py project

enhanced-discord.py A modern, maintained, easy to use, feature-rich, and async ready API wrapper for Discord written in Python. The Future of enhanced

Devision 2 Dec 21, 2022
Replace sequence_IDs in gff3 based on given genome.fasta

gff-rename Replace the sequence IDs in a gff3 file with a set of provided sequence IDs from a genom.fasta. This is useful when a gff3 file is retrieve

tolkit 1 Nov 12, 2021
This project, search all entities related to A2P in twilio

Mirror A2P Twilio This project, search all entities related to A2P in twilio (phone numbers, messaging services, campaign, A2P brand information and P

Iván Cárdenas 2 Nov 03, 2022
API RestFull web de pontos turisticos de certa região

##RESTful Web API para exposição de pontos turísticos de uma região## Propor um novo ponto turístico Moderação dos pontos turísticos cadastrados Lista

Lucas Silva 2 Jan 28, 2022
Tools untuk krek akun igeh :v

Instalasi pkg update && pkg upgrade -y pkg install python git -y rm -rf InstaCrack git clone https://github.com/AngCyber/InstaCrack pip install reques

Moch Aang Ardiansyah-XD 12 Apr 05, 2022
Telegram üzerinden paylaşılan kısa linkleri geçmenin daha hızlı bir yolu

Telegram Url skipper Telegramda paylaşılan kısa linkleri geçmenin daha hızlı bir yolu · Hata Raporla · Öneri Yap İçerik Tablosu Kurulum Kullanım Lisan

WarForPeace 6 Oct 07, 2022
A powerful bot to copy your google drive data to your team drive

⚛️ Clonebot - Heroku version ⚡ CloneBot is a telegram bot that allows you to copy folder/team drive to team drives. One of the main advantage of this

MsGsuite 269 Dec 23, 2022
May or may not be work🚶

AnyDLBot There are multiple things I can do: 👉 All Supported Video Formats of https://rg3.github.io/youtube-dl/supportedsites.html 👉 Upload as file

Arun 2 Nov 16, 2021
Facebook Clooning Tool BD...

Facebook Clooning Tool BD...

Ariyan Ahmed Mamun 2 Feb 16, 2022
Telegram Bot to store Posts and Documents and it can Access by Special Links.

File-sharing-Bot Telegram Bot to store Posts and Documents and it can Access by Special Links. I Guess This Will Be Usefull For Many People..... 😇 .

Code X Botz 1.2k Jan 08, 2023
Student-Management-System-in-Python - Student Management System in Python

Student-Management-System-in-Python Student Management System in Python

G.Niruthian 3 Jan 01, 2022
Tools untuk cek nomor rekening, terhadap penipuan yang sudah terjadi!

No Rekening Checker Selalu waspada terhadap penipuan! Sebelum anda transfer sejumlah uang alangkah baiknya untuk cek terlebih dahulu, apakah norek itu

Hanif Ahmad Syauqi 8 Dec 25, 2022
Best Buy purchase bot

B3 Best-Buy-Bot. Written in Python NOTICE: Don't be a disgrace to society. Don't use this for any mass buying/reselling purposes. About B3 is a bot th

Dogey11 8 Aug 15, 2022
A generative art library for NFT avatar and collectible projects.

Generative NFT Art Introduction The generative-art-nft repository is a library for creating generative art. It was developed for the purpose of creati

Rounak Banik 657 Jan 02, 2023
The Dolby.io Developer Days Getting Started with Media APIs Workshop repo.

Dolby.io Developer Days Media APIs Getting Started Application About this Workshop and Application This example is designed to get participants workin

Dolby.io Samples 2 Nov 03, 2022