Searches a document for hash tags. Support multiple natural languages. Works in various contexts.

Overview

ht-getter

Searches a document for hash tags. Supports multiple natural languages. Works in various contexts.

This package uses a non-regex approach and supports both halfwidth and fullwidth alphanumeric characters as well as various writing systems.

Installation

pip install ht-getter

Function

def get_hash_tags(source, mode = "strings")

Arguments:

source -> The source text to be searched. Must be passed as a str type.

mode -> Specifies the mode of the results. The default value is “strings”

mode = “strings” -> The results are returned as a list of strings

mode = “indices” -> The results are returned as a list of lists of the start and end indices of the hash tags.

Code Sample

from ht_getter.getter import get_hash_tags

source_text = '''This simple package helps find you find #hash_tags in various types of #documents#. It also works with other languages like #日本語 or #한국어.
It supports #fullwidth #alpha-numeric characters. You can get a #list of the #hash_tags or a list of their #indices in the #####source_text."
'''

hash_tags = get_hash_tags(source_text)
hash_tag_indices = get_hash_tags(source_text, mode = "indices")

print(hash_tags)
print(hash_tag_indices)

Things to Keep in Mind:

  1. This package can be used in various contexts. (Social media, news articles, etc.)
  2. This package looks for substrings that have the structure of a hash tag but does not check that the substring is a valid hash tag on any platform.
You might also like...
Soccerdata - Efficiently scrape soccer data from various sources
Soccerdata - Efficiently scrape soccer data from various sources

SoccerData is a collection of wrappers over soccer data from Club Elo, ESPN, FBr

Python Tool to Easily Generate Multiple Documents

Python Tool to Easily Generate Multiple Documents Running the script doesn't require internet Max Generation is set to 10k to avoid lagging/crashing R

Build documentation in multiple repos into one site.

mkdocs-multirepo-plugin Build documentation in multiple repos into one site. Setup Install plugin using pip: pip install git+https://github.com/jdoiro

Quick tutorial on orchest.io that shows how to build multiple deep learning models on your data with a single line of code using python
Quick tutorial on orchest.io that shows how to build multiple deep learning models on your data with a single line of code using python

Deep AutoViML Pipeline for orchest.io Quickstart Build Deep Learning models with a single line of code: deep_autoviml Deep AutoViML helps you build te

Type hints support for the Sphinx autodoc extension

sphinx-autodoc-typehints This extension allows you to use Python 3 annotations for documenting acceptable argument types and return value types of fun

AWS Tags As A Database is a Python library using AWS Tags as a Key-Value database.

AWS Tags As A Database is a Python library using AWS Tags as a Key-Value database. This database is completely free* 💸

Craxk is a SINGLE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reproduced by a single machine.
Craxk is a SINGLE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reproduced by a single machine.

What is Craxk ? Craxk is a UNIQUE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reprod

Find target hash collisions for Apple's NeuralHash perceptual hash function.💣
Find target hash collisions for Apple's NeuralHash perceptual hash function.💣

neural-hash-collider Find target hash collisions for Apple's NeuralHash perceptual hash function. For example, starting from a picture of this cat, we

That Hash will name that hash type! Identify MD5, SHA256 and 300+ other hashes Comes with
That Hash will name that hash type! Identify MD5, SHA256 and 300+ other hashes Comes with

Call for translators! We're looking for translators to help translate this spec for everyone! Read this documentation in the following languages 한국어 中

A Pytorch implementation of
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Splitter ⠀⠀ A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019). Abstract Recent inte

A Pytorch implementation of
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Splitter ⠀⠀ A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019). Abstract Recent inte

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

document organizer with tags and full-text-search, in a simple and clean sqlite3 schema
document organizer with tags and full-text-search, in a simple and clean sqlite3 schema

document organizer with tags and full-text-search, in a simple and clean sqlite3 schema

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Transparent proxy server that works as a poor man's VPN. Forwards over ssh. Doesn't require admin. Works with Linux and MacOS. Supports DNS tunneling.

sshuttle: where transparent proxy meets VPN meets ssh As far as I know, sshuttle is the only program that solves the following common case: Your clien

A Regex based linter tool that works for any language and works exclusively with custom linting rules.

renag Documentation Available Here Short for Regex (re) Nag (like "one who complains"). Now also PEGs (Parsing Expression Grammars) compatible with py

Code for CVPR 2021 oral paper
Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

ToxiChat Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Install depen

Releases(v0.0.1)
Owner
Rairye
NLP (mostly tokenization, normalization, and topics concerning CJK languages) Most projects currently available in JS and Python.
Rairye
Generates, filters, parses, and cleans data regarding the financial disclosures of judges in the American Judicial System

This repository contains code that gets data regarding financial disclosures from the Court Listener API main.py: contains driver code that interacts

Ali Rastegar 2 Aug 06, 2022
Easy OpenAPI specs and Swagger UI for your Flask API

Flasgger Easy Swagger UI for your Flask API Flasgger is a Flask extension to extract OpenAPI-Specification from all Flask views registered in your API

Flasgger 3.1k Dec 24, 2022
Create Python API documentation in Markdown format.

Pydoc-Markdown Pydoc-Markdown is a tool and library to create Python API documentation in Markdown format based on lib2to3, allowing it to parse your

Niklas Rosenstein 375 Jan 05, 2023
In this Github repository I will share my freqtrade files with you. I want to help people with this repository who don't know Freqtrade so much yet.

My Freqtrade stuff In this Github repository I will share my freqtrade files with you. I want to help people with this repository who don't know Freqt

Simon Kebekus 104 Dec 31, 2022
Mozilla Campus Club CCEW is a student committee working to spread awareness on Open Source software.

Mozilla Campus Club CCEW is a student committee working to spread awareness on Open Source software. We organize webinars and workshops on different technical topics and making Open Source contributi

Mozilla-Campus-Club-Cummins 8 Jun 15, 2022
Leetcode Practice

LeetCode Practice Description This is my LeetCode Practice. Visit LeetCode Website for detailed question description. The code in this repository has

Leo Hsieh 75 Dec 27, 2022
This programm checks your knowlege about the capital of Japan

Introduction This programm checks your knowlege about the capital of Japan. Now, what does it actually do? After you run the programm you get asked wh

1 Dec 16, 2021
Data-science-on-gcp - Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

data-science-on-gcp Source code accompanying book: Data Science on the Google Cloud Platform, 2nd Edition Valliappa Lakshmanan O'Reilly, Jan 2022 Bran

Google Cloud Platform 1.2k Dec 28, 2022
ReStructuredText and Sphinx bridge to Doxygen

Breathe Packagers: PGP signing key changes for Breathe = v4.23.0. https://github.com/michaeljones/breathe/issues/591 This is an extension to reStruct

Michael Jones 643 Dec 31, 2022
DocumentPy is a Python application that runs in a command-line interface environment, made for creating HTML documents.

DocumentPy DocumentPy is a Python application that runs in a command-line interface environment, made for creating HTML documents. Usage DocumentPy, a

Lotus 0 Jul 15, 2021
The OpenAPI Specification Repository

The OpenAPI Specification The OpenAPI Specification is a community-driven open specification within the OpenAPI Initiative, a Linux Foundation Collabo

OpenAPI Initiative 25.5k Dec 29, 2022
PyPresent - create slide presentations from notes

PyPresent Create slide presentations from notes Add some formatting to text file

1 Jan 06, 2022
Material for the ros2 crash course

Material for the ros2 crash course

Emmanuel Dean 1 Jan 22, 2022
Run `black` on python code blocks in documentation files

blacken-docs Run black on python code blocks in documentation files. install pip install blacken-docs usage blacken-docs provides a single executable

Anthony Sottile 460 Dec 23, 2022
Hasköy is an open-source variable sans-serif typeface family

Hasköy Hasköy is an open-source variable sans-serif typeface family. Designed with powerful opentype features and each weight includes latin-extended

67 Jan 04, 2023
A Python package develop for transportation spatio-temporal big data processing, analysis and visualization.

English 中文版 TransBigData Introduction TransBigData is a Python package developed for transportation spatio-temporal big data processing, analysis and

Qing Yu 251 Jan 03, 2023
Read write method - Read files in various types of formats

一个关于所有格式文件读取的方法 1。 问题描述: 各种各样的文件格式,读写操作非常的麻烦,能够有一种方法,可以整合所有格式的文件,方便用户进行读取和写入。 2

2 Jan 26, 2022
A repository of links with advice related to grad school applications, research, phd etc

A repository of links with advice related to grad school applications, research, phd etc

Shaily Bhatt 946 Dec 30, 2022
Tips for Writing a Research Paper using LaTeX

Tips for Writing a Research Paper using LaTeX

Guanying Chen 727 Dec 26, 2022
This repository outlines deploying a local Kubeflow v1.3 instance on microk8s and deploying a simple MNIST classifier using KFServing.

Zero to Inference with Kubeflow Getting Started This repository houses all of the tools, utilities, and example pipeline implementations for exploring

Ed Henry 3 May 18, 2022