当前位置:网站首页>Jincang database kingbasees SQL language reference manual (3.1.1.10. text search type, 3.1.1.11. UUID type)
Jincang database kingbasees SQL language reference manual (3.1.1.10. text search type, 3.1.1.11. UUID type)
2022-07-18 13:16:00 【Thousands of sails passed by the side of the sunken boat_】
3.1.1.10. Text search type
KingbaseES Two data types are provided , They are designed to support full-text search , Full text search is a natural language file Search the collection to locate those that best match Inquire about Activities for documents .tsvector Type represents a document in a form optimized for text search ,tsquery Type represents a text query . Full text search Provides a detailed explanation of this function , also Text search functions and operators Summarizes the related functions and operators .
3.1.1.10.1. tsvector
One tsvector Value is a sorted distinguishable Lexeme A list of , Lexeme Be being regularization A word that combines different variants of the same word . Sorting and de duplication are done automatically during input , As shown in the following example :
SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector;
tsvector
----------------------------------------------------
'a' 'and' 'ate' 'cat' 'fat' 'mat' 'on' 'rat' 'sat'
To indicate a word position that contains white space or punctuation , Enclose them in quotation marks :
SELECT $$the lexeme ' ' contains spaces$$::tsvector;
tsvector
-------------------------------------------
' ' 'contains' 'lexeme' 'spaces' 'the'
( In this example, we use string text surrounded by dollar symbols and the next one is to avoid confusion caused by including double quotation marks in the text ). Embedded quotation marks and backslashes must be double written :
SELECT $$the lexeme 'Joe''s' contains a quote$$::tsvector;
tsvector
------------------------------------------------
'Joe''s' 'a' 'contains' 'lexeme' 'quote' 'the'
Optional , Integers Location Can be attached to the word position :
SELECT 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12'
::tsvector;
tsvector
-----------------------------------------------------------------------------
'a':1,6,10 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'on':5 'rat':12 'sat':4
A location usually indicates the location of the source word in the document . Location information can be used Proximity ranking . The position value can be from 1 To 16383, Larger numbers will be 16383. For the same word position, the repeated position will be discarded .
The word position with position can be further marked with a The weight , It can be A、 B、C or D. D Is the default value and therefore does not appear in the output :
SELECT 'a:1A fat:2B,4C cat:5D'::tsvector;
tsvector
----------------------------
'a':1A 'cat':5 'fat':2B,4C
Weights are often used to reflect document structure , For example, mark the subject words as different from the text words . The text search ranking function can assign different priorities to different weight markers .
understand tsvector It is important that the type itself does not perform any word normalization , It assumes that the words given to it have been properly normalized for application . for example ,
SELECT 'The Fat Rats'::tsvector;
tsvector
--------------------
'Fat' 'Rats' 'The'
For most English text search applications , The above words will be considered informal , however tsvector I don't care about that . The original document text should usually go through to_tsvector Normalize the words for the search appropriately :
SELECT to_tsvector('english', 'The Fat Rats');
to_tsvector
-----------------
'fat':2 'rat':3
3.1.1.10.2. tsquery
One tsquery The value stores the word bits to be used for the search , And use the Boolean operator &(AND)、|(OR) and !(NOT) To combine them , And phrase search operators <->(FOLLOWED BY). There is also a FOLLOWED BY A variant of the operator <N>, among ``N`` Is an integer constant , It specifies the distance between the two word bits to search .<-> Equivalent to <1>.
Parentheses can be used to force grouping of operators . If there are no parentheses ,!(NOT) The highest priority , The second is <->(FOLLOWED BY), And then there was &(AND), And finally |(OR).
Here are some examples :
SELECT 'fat & rat'::tsquery;
tsquery
---------------
'fat' & 'rat'
SELECT 'fat & (rat | cat)'::tsquery;
tsquery
---------------------------
'fat' & ( 'rat' | 'cat' )
SELECT 'fat & rat & ! cat'::tsquery;
tsquery
------------------------
'fat' & 'rat' & !'cat'
Optionally , One tsquery The word position in can be marked with one or more weight letters , This will limit them to only those with one of those weights tsvector Word phase matching :
SELECT 'fat:ab & cat'::tsquery;
tsquery
------------------
'fat':AB & 'cat'
Besides , One tsquery The morphemes in can be marked as * To specify prefix matching :
SELECT 'super:*'::tsquery; tsquery ----------- 'super':*
This query will match a tsvector China and Israel “super” Any word at the beginning .
The quotation mark rule of word position is the same as that described before tsvector The word position in is the same ; also , just as tsvector, Any request for word normalization must be converted to tsquery Type before .to_tsquery Functions can easily perform this normalization :
SELECT to_tsquery('Fat:ab & Cats');
to_tsquery
------------------
'fat':AB & 'cat'
Be careful to_tsquery Prefixes will be handled in the same way as other words , This also means that the following comparison will return true :
SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'kingbase:*' ); ?column? ---------- t
because kingbase Will be treated as postgr:
SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'kingbase:*' ); to_tsvector | to_tsquery ---------------+------------ 'postgradu':1 | 'postgr':*
This will match postgraduate Processed form .
3.1.1.11. UUID type
data type uuid Storage by RFC 4122、ISO/IEC 9834-8:2005 And the universal unique identifier defined by relevant standards (UUID)( Some systems refer to this data type as a globally unique identifier GUID). This identifier is a 128 The amount of bits , It is produced by a carefully selected algorithm , This algorithm can ensure that any other person using the same algorithm in the known space can produce the same identifier, which is very, very unlikely . therefore , For distributed systems , These identifiers provide a good guarantee of uniqueness compared with sequence generators , Sequence generators can only be guaranteed to be unique in one database .
One UUID Written as a sequence of lowercase hexadecimal digits , The sequence is divided into groups by hyphens : The first is a 8 Bit group , Next are three 4 Bit group , And finally a 12 Bit group . Total 32 position ( Hexadecimal digits ) According to the 128 Binary bits . A standard form UUID Be similar to :
a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11
KingbaseES Also accept another form of input : Use upper case bits 、 The standard format is surrounded by curly braces 、 Ignore some or all hyphens 、 In any 4 Add a hyphen after the bit group . for example :
A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11
{a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11}
a0eebc999c0b4ef8bb6d6bb9bd380a11
a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
{a0eebc99-9c0b4ef8-bb6d6bb9-bd380a11}
Output is always in standard form .
KingbaseES by UUID Provides storage and comparison functions , But the core database does not contain anything for generating UUID Function of , Because no single algorithm can be well adapted to every application . Besides ,UUID It can be generated by the client application , Or generated by other libraries called through server-side functions .
边栏推荐
猜你喜欢

MGRE/OSPF综合实验

Chapter 5: NoSQL database

This article enables you to understand IIC, SPI and UART protocols

Still reading logs on the command line? Use kibana, visual log analysis yyds~

【剑指 Offer】从尾到头打印链表(栈思想)+ 数组中重复的数字(哈希映射)

Paper reading: u-net: revolutionary networks for biomedical image segmentation

19th week homework

【RT-Thread】nxp rt10xx 设备驱动框架之--uart搭建和使用

Reversingkr WP (7)

SSH远程端口转发
随机推荐
内存管理页面属性
为什么 UDP 头只有 8 个字节
【RT-Thread】nxp rt10xx 设备驱动框架之--uart搭建和使用
Hcip day 5 notes
Word -- set tab width
JS small effect jump according to the corresponding item name
A survey on dialogue systems: recent advances and new frontiers
The development of digital collection system helps enterprises' meta universe scene marketing
(笔记)ideavim和ide冲突解决方法
mysql中一些功能相似的函数详解
How to systematize the introduction to data science for free?
【HBZ分享】TCP协议讲解
HCIP第三天学习笔记
Grid layout
ReversingKr-wp(5)
今日份工作感悟
hcip第六天笔记
Source insight 4 what about Chinese garbled code?
解决pycharm无法输入中文的方法:
JS regular advanced code to understand