Hspell, the free Hebrew spellchecker and morphology engine.

Overview
This is version 1.4 of Hspell, the free Hebrew spellchecker and morphology
engine.

You can get Hspell from:
	http://hspell.ivrix.org.il/

Hspell was written by Nadav Har'El and Dan Kenigsberg:
	nyh    @ math.technion.ac.il
	danken @   cs.technion.ac.il

Hspell is free software, released under the GNU Affero General Public License
(AGPL) version 3. Note that not only the programs in the distribution, but
also the dictionary files and the generated word lists, are licensed under
the AGPL.
There is no warranty of any kind for the contents of this distribution.
See the LICENSE file for more information and the exact license terms.

The rest of this README file explains Hspell's spelling standard (niqqud-less),
a bit about the technology behind Hspell, how to use the "hspell" program
(but see the manual page for more current information), and lists a few future
directions. See the separate INSTALL file for instructions on how to install
Hspell.


About Hspell's spelling standard
--------------------------------

Hspell was designed to be 100% and strictly compliant with the official
niqqud-less spelling rules ("Ha-ktiv Khasar Ha-niqqud", colloquially known as
"Ktiv Male", or "plene spelling" in English), published by the Academy of
the Hebrew Language. This is both an advantage and a disadvantage, depending
on your viewpoint. It's an advantage because it encourages a *correct* and
consistent spelling style throughout your writing. It is a disadvantage,
because a few of the Academia's official spelling decisions are relatively
unknown to the general public.

Users of Hspell (and all Hebrew writers, for that matter) are encouraged to
read the Academia's official niqqud-less spelling rules (which are printed at
the end of most modern Hebrew dictionaries), and to refer to Hebrew
dictionaries which use the niqqud-less spelling (such as Millon Ha-hove or
Rav Milim). We also provide in docs/niqqudless.odt a document (in Hebrew)
which describes in detail Hspell's spelling standard, and why certain words
are spelled the way they are.


The technology behind Hspell
----------------------------

The "hspell" program itself is mostly a simple (but efficient) program
that checks input words against a long list of valid words. The real "brains"
behind it are the word lists (lexicon) provided by the Hspell project.

In order for it to be completely free of other people's copyright restrictions,
the Hspell project is a clean-room implementation, not based on other
companies' word lists, on other companies' spell checkers, or on copying of
printed dictionaries. The word list is also not based on automatic scanning
of available Hebrew documents (such as online newspapers), because there is
no way to guarantee that such a list will be correct, complete, or consistent
with regard to spelling rules.

Instead, our idea was to write programs which know how to correctly inflect
Hebrew nouns and conjugate Hebrew verbs. The inputs to these programs are
lists of noun stems and of verb roots, plus hints needed for the correct
inflection when these cannot be figured out automatically. These input files
are obviously an important part of the Hspell project. The "word list
generators" (written in Perl, and are also part of the Hspell project) then
create the complete word-list for use by the spellchecking program, hspell.
The generated lists are useful for much more than spellchecking, by the
way - see more on that below ("the future").

Although we wrote all of Hspell's code ourselves, we are truly indebted to
the old-style "open source" pioneers - people who wrote books about the
knowledge they developed, instead of hiding it in proprietary software.
For the correct noun inflections, Dr. Shaul Barkali's "The Complete Noun Book"
has been a great help. Prof. Uzzi Ornan's booklet "Verb Conjugation in Flow
Charts" has been instrumental in the implementation of verb conjugation,
and Barkali's "The Complete Verb Book" was used too.

During our work we have extensively used a number of Hebrew dictionaries,
including Even Shoshan, Millon Ha-hove and Rav-Milim, to ensure the correctness
of certain words. Various Hebrew newspapers and books, both printed and online,
were used for inspiration and for finding words we still do not recognize.
We wish to thank Cilla Tuviana and Dr. Zvi Har'El for their assistance with
some grammatical questions.


Using hspell
------------

After unpacking the distribution and running "configure", "make" and
"make install" (see the INSTALL file for more information), the hspell
executable is installed (by default) in /usr/local/bin, and the dictionary
files are in /usr/local/share/hspell.

The "hspell" program can be used on any sort of text file containing Hebrew
and potentially non-Hebrew characters which it ignores. For example, it
works well on Hebrew text files, TeX/LaTeX files, and HTML. Running

	hspell filename

Will check the spelling in filename and will output the list of incorrect
words (just like the old-fashioned UNIX "spell" program did). If run without
a file parameter, hspell reads from its standard input.

In the current release, hspell expects ISO-8859-8-encoded files. If files
using a different encoding (e.g., UTF8) are to be checked, they must be
converted first to ISO-8859-8 (e.g., see iconv(1), recode(1)).

If the "-c" option is given, hspell will suggest corrections for misspelled
words, whenever it can find such corrections. The correction mechanism in this
release is especially good at finding corrections for incorrect niqqud-less
spellings, with missing or extra 'immot-qri'a.

The "-l" (verbose) option will explain for each correct word why it was
recognized, if Hspell was built with the "linginfo" optional feature enabled
(a morphological analysis is shown, i.e., fully describe all possible ways to
read the given word as an inflected word with optional prefixes).

Because hspell's output (naturally) is "logical-order", it is normally
useful to pipe it to bidiv or rev before viewing. For example

	hspell -c filename | bidiv | less

Another convenient alternative is to run hspell on a BiDi-enabled terminal.

Instead of using the hspell program described above, users can also use
Hspell's lexicon through one of the popular multi-lingual spell-checkers,
aspell and hunspell. See the INSTALL file for more information on building
these dictionaries.


How *you* can help
------------------

By now, Hspell is fairly mature, and its lexicon of over 24,000 base words
is fairly comprehensive, similar in breadth to some printed dictionaries.
Careful attention has also been given to its accuracy, and its conformance
with the spelling rules of the Academy of the Hebrew Language.

Nevertheless, Hspell does not, and probably never will, cover all of modern
Hebrew language. Also, undoubtedly, it may contain some errors as well.
If you find such omissions or errors, please let us know.

Before reporting such omissions or errors, please try to verify that the word
you are proposing is indeed correctly spelled: Please refer to modern
dictionaries. Please also look at doc/niqqudless.odt - the word you are
proposing might actually be a known mispelling which we discuss in that
document.
Word and phrase lists in CSV

Word Lists Word and phrase lists in CSV, collected from different sources. Oxford Word Lists: oxford-5k.csv - Oxford 3000 and 5000 oxford-opal.csv - O

Anton Zhiyanov 14 Oct 14, 2022
A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

Python User Agents user_agents is a Python library that provides an easy way to identify/detect devices like mobile phones, tablets and their capabili

Selwin Ong 1.3k Dec 22, 2022
Parse Any Text With Python

ParseAnyText A small package to parse strings. What is the work of it? Well It's a module to creates parser that helps to parse a text easily with les

Sayam Goswami 1 Jan 11, 2022
Phone Number formatting for PlaySMS Platform - BulkSMS Platform

BulkSMS-Number-Formatting Phone Number formatting for PlaySMS Platform - BulkSMS Platform. Phone Number Formatting for PlaySMS Phonebook Service This

Edwin Senunyeme 1 Nov 08, 2021
Python character encoding detector

Chardet: The Universal Character Encoding Detector Detects ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants) Big5, GB2312, EUC-TW, HZ-GB-2312, IS

Character Encoding Detector 1.8k Jan 08, 2023
Wordle strategy: Find frequency of letters appearing in 5-letter words in the English language

Find frequency of letters appearing in 5-letter words in the English language In

Gabriel Apolinário 1 Jan 17, 2022
Making simplex testing clean and simple

Making Simplex Project Testing - Clean and Simple What does this repo do? It organizes the python stack for the coding project What do I need to do in

Mohit Mahajan 1 Jan 30, 2022
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Life4 3k Jan 02, 2023
Hamming code generation, error detection & correction.

Hamming code generation, error detection & correction.

Farhan Bin Amin 2 Jun 30, 2022
Maiden & Spell community player ranking based on tournament data.

MnSRank Maiden & Spell community player ranking based on tournament data. Why? 2021 just ended and this seemed like a cool idea. Elo doesn't work well

Jonathan Lee 1 Apr 20, 2022
A Python3 script that simulates the user typing a text on their keyboard.

A Python3 script that simulates the user typing a text on their keyboard. (control the speed, randomness, rate of typos and more!)

Jose Gracia Berenguer 3 Feb 22, 2022
A Python package to facilitate research on building and evaluating automated scoring models.

Rater Scoring Modeling Tool Introduction Automated scoring of written and spoken test responses is a growing field in educational natural language pro

ETS 59 Oct 10, 2022
Fixes mojibake and other glitches in Unicode text, after the fact.

ftfy: fixes text for you print(fix_encoding("(ง'⌣')ง")) (ง'⌣')ง Full documentation: https://ftfy.readthedocs.org Testimonials “My life is li

Luminoso Technologies, Inc. 3.4k Jan 08, 2023
LazyText is inspired b the idea of lazypredict, a library which helps build a lot of basic models without much code.

LazyText is inspired b the idea of lazypredict, a library which helps build a lot of basic models without much code. LazyText is for text what lazypredict is for numeric data.

Jay Vala 13 Nov 04, 2022
A pipeline for making highlighted text stand-alone.

title emoji colorFrom colorTo sdk app_file pinned decontextualizer 📤 green gray streamlit main.py false Decontextualizer As a second step in improvin

Paul Bricman 26 Dec 17, 2022
Simple python program to auto credit your code, text, book, whatever!

Credit Simple python program to auto credit your code, text, book, whatever! Setup First change credit_text to whatever text you would like to credit

Hashm 1 Jan 29, 2022
RSS Reader application for the Emacs Application Framework.

EAF RSS Reader RSS Reader application for the Emacs Application Framework. Load application (add-to-list 'load-path "~/.emacs.d/site-lisp/eaf-rss-read

EAF 15 Dec 07, 2022
CowExcept - Spice up those exceptions with cowexcept!

CowExcept - Spice up those exceptions with cowexcept!

James Ansley 41 Jun 30, 2022
Vector space based Information Retrieval System for Text Processing - Information retrieval

Information Retrieval: Text Processing Group 13 Sequence of operations Install Requirements Add given wikipedia files to the corpus directory. Downloa

1 Jan 01, 2022
Athens: a great tool for taking notes and organising knowldge

AthensSyncer Athens is a great tool for taking notes and organising knowldge. But it is a bummer that you cannot use it accross multiple devices. Well

6 Dec 14, 2022