Internationalized Domain Names for Python (IDNA 2008 and UTS #46)

Overview

Internationalized Domain Names in Applications (IDNA)

Support for the Internationalised Domain Names in Applications (IDNA) protocol as specified in RFC 5891. This is the latest version of the protocol and is sometimes referred to as “IDNA 2008”.

This library also provides support for Unicode Technical Standard 46, Unicode IDNA Compatibility Processing.

This acts as a suitable replacement for the “encodings.idna” module that comes with the Python standard library, but which only supports the older superseded IDNA specification (RFC 3490).

Basic functions are simply executed:

>>> import idna
>>> idna.encode('ドメイン.テスト')
b'xn--eckwd4c7c.xn--zckzah'
>>> print(idna.decode('xn--eckwd4c7c.xn--zckzah'))
ドメイン.テスト

Installation

To install this library, you can use pip:

$ pip install idna

Alternatively, you can install the package using the bundled setup script:

$ python setup.py install

Usage

For typical usage, the encode and decode functions will take a domain name argument and perform a conversion to A-labels or U-labels respectively.

>>> import idna
>>> idna.encode('ドメイン.テスト')
b'xn--eckwd4c7c.xn--zckzah'
>>> print(idna.decode('xn--eckwd4c7c.xn--zckzah'))
ドメイン.テスト

You may use the codec encoding and decoding methods using the idna.codec module:

>>> import idna.codec
>>> print('домен.испытание'.encode('idna'))
b'xn--d1acufc.xn--80akhbyknj4f'
>>> print(b'xn--d1acufc.xn--80akhbyknj4f'.decode('idna'))
домен.испытание

Conversions can be applied at a per-label basis using the ulabel or alabel functions if necessary:

>>> idna.alabel('测试')
b'xn--0zwm56d'

Compatibility Mapping (UTS #46)

As described in RFC 5895, the IDNA specification does not normalize input from different potential ways a user may input a domain name. This functionality, known as a “mapping”, is considered by the specification to be a local user-interface issue distinct from IDNA conversion functionality.

This library provides one such mapping, that was developed by the Unicode Consortium. Known as Unicode IDNA Compatibility Processing, it provides for both a regular mapping for typical applications, as well as a transitional mapping to help migrate from older IDNA 2003 applications.

For example, “Königsgäßchen” is not a permissible label as LATIN CAPITAL LETTER K is not allowed (nor are capital letters in general). UTS 46 will convert this into lower case prior to applying the IDNA conversion.

>>> import idna
>>> idna.encode('Königsgäßchen')
...
idna.core.InvalidCodepoint: Codepoint U+004B at position 1 of 'Königsgäßchen' not allowed
>>> idna.encode('Königsgäßchen', uts46=True)
b'xn--knigsgchen-b4a3dun'
>>> print(idna.decode('xn--knigsgchen-b4a3dun'))
königsgäßchen

Transitional processing provides conversions to help transition from the older 2003 standard to the current standard. For example, in the original IDNA specification, the LATIN SMALL LETTER SHARP S (ß) was converted into two LATIN SMALL LETTER S (ss), whereas in the current IDNA specification this conversion is not performed.

>>> idna.encode('Königsgäßchen', uts46=True, transitional=True)
'xn--knigsgsschen-lcb0w'

Implementors should use transitional processing with caution, only in rare cases where conversion from legacy labels to current labels must be performed (i.e. IDNA implementations that pre-date 2008). For typical applications that just need to convert labels, transitional processing is unlikely to be beneficial and could produce unexpected incompatible results.

encodings.idna Compatibility

Function calls from the Python built-in encodings.idna module are mapped to their IDNA 2008 equivalents using the idna.compat module. Simply substitute the import clause in your code to refer to the new module name.

Exceptions

All errors raised during the conversion following the specification should raise an exception derived from the idna.IDNAError base class.

More specific exceptions that may be generated as idna.IDNABidiError when the error reflects an illegal combination of left-to-right and right-to-left characters in a label; idna.InvalidCodepoint when a specific codepoint is an illegal character in an IDN label (i.e. INVALID); and idna.InvalidCodepointContext when the codepoint is illegal based on its positional context (i.e. it is CONTEXTO or CONTEXTJ but the contextual requirements are not satisfied.)

Building and Diagnostics

The IDNA and UTS 46 functionality relies upon pre-calculated lookup tables for performance. These tables are derived from computing against eligibility criteria in the respective standards. These tables are computed using the command-line script tools/idna-data.

This tool will fetch relevant codepoint data from the Unicode repository and perform the required calculations to identify eligibility. There are three main modes:

  • idna-data make-libdata. Generates idnadata.py and uts46data.py, the pre-calculated lookup tables using for IDNA and UTS 46 conversions. Implementors who wish to track this library against a different Unicode version may use this tool to manually generate a different version of the idnadata.py and uts46data.py files.
  • idna-data make-table. Generate a table of the IDNA disposition (e.g. PVALID, CONTEXTJ, CONTEXTO) in the format found in Appendix B.1 of RFC 5892 and the pre-computed tables published by IANA.
  • idna-data U+0061. Prints debugging output on the various properties associated with an individual Unicode codepoint (in this case, U+0061), that are used to assess the IDNA and UTS 46 status of a codepoint. This is helpful in debugging or analysis.

The tool accepts a number of arguments, described using idna-data -h. Most notably, the --version argument allows the specification of the version of Unicode to use in computing the table data. For example, idna-data --version 9.0.0 make-libdata will generate library data against Unicode 9.0.0.

Additional Notes

  • Packages. The latest tagged release version is published in the Python Package Index.
  • Version support. This library supports Python 3.5 and higher. As this library serves as a low-level toolkit for a variety of applications, many of which strive for broad compatibility with older Python versions, there is no rush to remove older intepreter support. Removing support for older versions should be well justified in that the maintenance burden has become too high.
  • Python 2. Python 2 is supported by version 2.x of this library. While active development of the version 2.x series has ended, notable issues being corrected may be backported to 2.x. Use "idna<3" in your requirements file if you need this library for a Python 2 application.
  • Testing. The library has a test suite based on each rule of the IDNA specification, as well as tests that are provided as part of the Unicode Technical Standard 46, Unicode IDNA Compatibility Processing.
  • Emoji. It is an occasional request to support emoji domains in this library. Encoding of symbols like emoji is expressly prohibited by the technical standard IDNA 2008 and emoji domains are broadly phased out across the domain industry due to associated security risks. For now, applications that wish need to support these non-compliant labels may wish to consider trying the encode/decode operation in this library first, and then falling back to using encodings.idna. See the Github project for more discussion.
Owner
Kim Davies
Kim Davies
Natural Language Processing - Sommer Semester 2022

Natural Language Processing (DIS25a/NLP) This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Progra

Classrooms of IR Group at Technische Hochschule Köln 19 Sep 07, 2022
Dome - Subdomain Enumeration Tool. Fast and reliable python script that makes active and/or passive scan to obtain subdomains and search for open ports.

DOME - A subdomain enumeration tool Check the Spanish Version Dome is a fast and reliable python script that makes active and/or passive scan to obtai

Vadi 329 Jan 01, 2023
Script checks provided domains for log4j vulnerability

log4j Script checks provided domains for log4j vulnerability. A token is created with canarytokens.org and passed as header at request for a single do

Matthias Nehls 2 Dec 12, 2021
Proof of Concept Exploit for ManageEngine ServiceDesk Plus CVE-2021-44077

CVE-2021-44077 Proof of Concept Exploit for CVE-2021-44077: PreAuth RCE in ManageEngine ServiceDesk Plus 11306 Based on: https://xz.aliyun.com/t/106

Horizon 3 AI Inc 25 Nov 09, 2022
#whois it? Let's find out!

whois_bot #whois it? Let's find out! Currently in development: a gatekeeper bot for a community (https://t.me/IT_antalya) of 250+ expat IT pros of Ant

Kirill Nikolaev 14 Jun 24, 2022
The disassembler parses evm bytecode from the command line or from a file.

EVM Bytecode Disassembler The disassembler parses evm bytecode from the command line or from a file. It does not matter whether the bytecode is prefix

alpharush 22 Dec 27, 2022
Client script for the fisherman phishing tool

Client script for the fisherman phishing tool

Pushkar Raj 1 Feb 23, 2022
TCP/UDP port scanner on python, usong scapy and multiprocessin

Port Scanner TCP/UDP port scanner on python, usong scapy and multiprocessing. Usage python3 scanner.py [OPTIONS] IP_ADDRESS [{tcp|udp}[/[PORT|PORT-POR

Egor Krokhin 1 Dec 05, 2021
Dark-Fb No Login 100% safe

Dark-Fb No Login 100% safe TERMUX • pkg install python2 && git -y • pip2 install requests mechanize tqdm • git clone https://github.com/BOT-033/Sensei

Bukan Hamkel 1 Dec 04, 2021
A Modified version of TCC's Osprey poc framework......

fierce-fish fierce-fish是由TCC(斗象能力中心)出品并维护的开源漏洞检测框架osprey的改写,去掉臃肿功能的精简版本poc框架 PS:真的用不惯其它臃肿的功能,不过作为一个收集漏洞poc && exp的框架还是非常不错的!!! osprey For beginners fr

lUc1f3r11 10 Dec 30, 2022
An automated, reliable scanner for the Log4Shell (CVE-2021-44228) vulnerability.

Log4JHunt An automated, reliable scanner for the Log4Shell CVE-2021-44228 vulnerability. Video demo: Usage Here the help usage: $ python3 log4jhunt.py

RedHunt Labs 39 Nov 21, 2022
Crypto Meta Extractor

Crypto Meta Extractor This repository contains the code which extracts some metadata of all the cryptocurrencies listed (9K) on CoinMarketCap. Coding

Samyak Jain 3 Jul 03, 2022
Notebooks, slides and dataset of the CorrelAid Machine Learning Winter School

CorrelAid Machine Learning Spring School Welcome to the CorrelAid ML Spring School! In this repository you can find the slides and other files for the

CorrelAid 12 Nov 23, 2022
Security System using OpenCV

Security-System Security System using OpenCV Files in this Repository: email_send.py - This file contains python code to send an email when something

Mehul Patwari 1 Oct 28, 2021
CVE-2021-26084 Remote Code Execution on Confluence Servers

CVE-2021-26084 CVE-2021-26084 Remote Code Execution on Confluence Servers. Dork Fofa: app="ATLASSIAN-Confluence" Usage Show help information. python P

FQ Hsu 63 Dec 30, 2022
HTTP Protocol Stack Remote Code Execution Vulnerability CVE-2022-21907

CVE-2022-21907 Description POC for CVE-2022-21907: HTTP Protocol Stack Remote Code Execution Vulnerability. create by antx at 2022-01-17. Detail HTTP

赛欧思网络安全研究实验室 365 Nov 30, 2022
Tool To generate Stable Undetected Payload

windowsPayload Tool To generate Stable Undetected Payload Don t Upload to Virus Total :) Follow on Social Media Platforms ScreenShots How to install +

youhacker55 117 Dec 30, 2022
This is a multi-password‌ cracking tool that can help you hack facebook accounts very quickly

Pro_Crack Facebook Fast Cracking Tool This is a multi-password‌ cracking tool that can help you hack facebook accounts very quickly Installation On Te

•JINN• 1 Jan 16, 2022
Mass scan for .git repository and .env file exposure

Mass .Git repository and .Env file Scan by Scarmandef Scanner to find .env file and .git repository exposure on multiple hosts Because of the response

8 Jun 23, 2022