当前位置：网站首页>Jincang database kingbasees SQL language reference manual (3.1.1.10. text search type, 3.1.1.11. UUID type)

Jincang database kingbasees SQL language reference manual (3.1.1.10. text search type, 3.1.1.11. UUID type)

2022-07-18 13:16:00 【Thousands of sails passed by the side of the sunken boat_】

3.1.1.10. Text search type

KingbaseES Two data types are provided , They are designed to support full-text search , Full text search is a natural language file Search the collection to locate those that best match Inquire about Activities for documents .tsvector Type represents a document in a form optimized for text search ,tsquery Type represents a text query . Full text search Provides a detailed explanation of this function , also Text search functions and operators Summarizes the related functions and operators .

3.1.1.10.1. tsvector

One tsvector Value is a sorted distinguishable Lexeme A list of , Lexeme Be being regularization A word that combines different variants of the same word . Sorting and de duplication are done automatically during input , As shown in the following example ：

SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector;
                      tsvector
----------------------------------------------------
 'a' 'and' 'ate' 'cat' 'fat' 'mat' 'on' 'rat' 'sat'

To indicate a word position that contains white space or punctuation , Enclose them in quotation marks ：

SELECT $$the lexeme '    ' contains spaces$$::tsvector;
                 tsvector
-------------------------------------------
 '    ' 'contains' 'lexeme' 'spaces' 'the'

（ In this example, we use string text surrounded by dollar symbols and the next one is to avoid confusion caused by including double quotation marks in the text ）. Embedded quotation marks and backslashes must be double written ：

SELECT $$the lexeme 'Joe''s' contains a quote$$::tsvector;
                    tsvector
------------------------------------------------
 'Joe''s' 'a' 'contains' 'lexeme' 'quote' 'the'

Optional , Integers Location Can be attached to the word position ：

SELECT 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12'
::tsvector;
                                  tsvector
-----------------------------------------------------------------------------
 'a':1,6,10 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'on':5 'rat':12 'sat':4

A location usually indicates the location of the source word in the document . Location information can be used Proximity ranking . The position value can be from 1 To 16383, Larger numbers will be 16383. For the same word position, the repeated position will be discarded .

The word position with position can be further marked with a The weight , It can be A、 B、C or D. D Is the default value and therefore does not appear in the output ：

SELECT 'a:1A fat:2B,4C cat:5D'::tsvector;
          tsvector
----------------------------
 'a':1A 'cat':5 'fat':2B,4C

Weights are often used to reflect document structure , For example, mark the subject words as different from the text words . The text search ranking function can assign different priorities to different weight markers .

understand tsvector It is important that the type itself does not perform any word normalization , It assumes that the words given to it have been properly normalized for application . for example ,

SELECT 'The Fat Rats'::tsvector;
      tsvector
--------------------
 'Fat' 'Rats' 'The'

For most English text search applications , The above words will be considered informal , however tsvector I don't care about that . The original document text should usually go through to_tsvector Normalize the words for the search appropriately ：

SELECT to_tsvector('english', 'The Fat Rats');
   to_tsvector
-----------------
 'fat':2 'rat':3

3.1.1.10.2. tsquery

One tsquery The value stores the word bits to be used for the search , And use the Boolean operator &（AND）、|（OR） and !（NOT） To combine them , And phrase search operators <->（FOLLOWED BY）. There is also a FOLLOWED BY A variant of the operator <N>, among ``N`` Is an integer constant , It specifies the distance between the two word bits to search .<-> Equivalent to <1>.

Parentheses can be used to force grouping of operators . If there are no parentheses ,!（NOT） The highest priority , The second is <->（FOLLOWED BY）, And then there was &（AND）, And finally |（OR）.

Here are some examples ：

SELECT 'fat & rat'::tsquery;
    tsquery
---------------
 'fat' & 'rat'

SELECT 'fat & (rat | cat)'::tsquery;
          tsquery
---------------------------
 'fat' & ( 'rat' | 'cat' )

SELECT 'fat & rat & ! cat'::tsquery;
        tsquery
------------------------
 'fat' & 'rat' & !'cat'

Optionally , One tsquery The word position in can be marked with one or more weight letters , This will limit them to only those with one of those weights tsvector Word phase matching ：

SELECT 'fat:ab & cat'::tsquery;
    tsquery
------------------
 'fat':AB & 'cat'

Besides , One tsquery The morphemes in can be marked as * To specify prefix matching ：

SELECT 'super:*'::tsquery;
  tsquery
-----------
 'super':*

This query will match a tsvector China and Israel “super” Any word at the beginning .

The quotation mark rule of word position is the same as that described before tsvector The word position in is the same ; also , just as tsvector, Any request for word normalization must be converted to tsquery Type before .to_tsquery Functions can easily perform this normalization ：

SELECT to_tsquery('Fat:ab & Cats');
    to_tsquery
------------------
 'fat':AB & 'cat'

Be careful to_tsquery Prefixes will be handled in the same way as other words , This also means that the following comparison will return true ：

SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'kingbase:*' );
 ?column?
----------
 t

because kingbase Will be treated as postgr：

SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'kingbase:*' );
  to_tsvector  | to_tsquery
---------------+------------
 'postgradu':1 | 'postgr':*

This will match postgraduate Processed form .

3.1.1.11. UUID type

data type uuid Storage by RFC 4122、ISO/IEC 9834-8:2005 And the universal unique identifier defined by relevant standards （UUID）（ Some systems refer to this data type as a globally unique identifier GUID）. This identifier is a 128 The amount of bits , It is produced by a carefully selected algorithm , This algorithm can ensure that any other person using the same algorithm in the known space can produce the same identifier, which is very, very unlikely . therefore , For distributed systems , These identifiers provide a good guarantee of uniqueness compared with sequence generators , Sequence generators can only be guaranteed to be unique in one database .

One UUID Written as a sequence of lowercase hexadecimal digits , The sequence is divided into groups by hyphens ： The first is a 8 Bit group , Next are three 4 Bit group , And finally a 12 Bit group . Total 32 position （ Hexadecimal digits ） According to the 128 Binary bits . A standard form UUID Be similar to ：

a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11

KingbaseES Also accept another form of input ： Use upper case bits 、 The standard format is surrounded by curly braces 、 Ignore some or all hyphens 、 In any 4 Add a hyphen after the bit group . for example ：

A0EEBC99-9C0B-4EF8-BB6D-6BB9BD380A11
{a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11}
a0eebc999c0b4ef8bb6d6bb9bd380a11
a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
{a0eebc99-9c0b4ef8-bb6d6bb9-bd380a11}

Output is always in standard form .

KingbaseES by UUID Provides storage and comparison functions , But the core database does not contain anything for generating UUID Function of , Because no single algorithm can be well adapted to every application . Besides ,UUID It can be generated by the client application , Or generated by other libraries called through server-side functions .

原网站

版权声明
本文为[Thousands of sails passed by the side of the sunken boat_]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/199/202207160304394731.html

当前位置：网站首页>Jincang database kingbasees SQL language reference manual (3.1.1.10. text search type, 3.1.1.11. UUID type)

Jincang database kingbasees SQL language reference manual (3.1.1.10. text search type, 3.1.1.11. UUID type)

边栏推荐

猜你喜欢

随机推荐