20.2. Creating Free Text Indexes

20.2.1. The CREATE TEXT INDEX statement

Define and optionally initialize a text index on a column.

create_freetext_index
	: CREATE TEXT [XML] INDEX ON q_table_name '(' column ')'
	[WITH KEY column]
	[NOT INSERT]
	[CLUSTERED WITH '(' column_commalist ')' ]
	[USING FUNCTION]
	[LANGUAGE STRING]
  [ENCODING STRING]
	;

XML - The XML keyword specifies that the data is to be indexed as XML, hence element names and attributes will be processed separately for use with the XCONTAINS predicate.

The q_table_name is a qualified table name on which the index is created. The generated procedures and auxiliary tables will be with the owner and qualifier of this table.

The column must be a column of the above table, of the VARCHAR, NVARCHAR, LONG VARCHAR, LONG NVARCHAR, LONG XML or XMLTYPE data type. The column may additionally have the IDENTIFIED BY option if the content is XML. This will be used to provide a base URI for traversing relative references. The XML option for the index has to be specified if the content is LONG XML or XMLTYPE.

WITH KEY column - This allows optionally specifying a uniquely identifying column which will be used as a foreign key for referencing the table from the text index. If this is not specified and there is a single part integer primary key, this primary key is used as the key. If there is no suitable primary key and the option is not present, an integer column is added and a sequence object is used to supply distinct values.

When specified, the column must be non-NULL and its run time value must either be an integer or a composite (See composite data type). The length of the values of this column is crucially important since it is repeated for each distinct word of each row. This column is called the free text document id column in the rest of this documentation.

NOT INSERT - If present, specifies that the index is not filled when created. The VT_INDEX_<table> function is still created but not run. You can run this manually or scheduled at an appropriate time. If batching is enabled then the index will be filled up at that time.

CLUSTERED WITH - The column list must consist of columns of the table. Their values are stored in the text index so that the values can be more efficiently located for filtering than if they had to be retrieved from the table itself. The combined length of the columns should be relatively small, not much over 200 bytes for this to be effective. There is no hard upper limit but long blobs are not advisable.

USING FUNCTION - This allows specifying a hook function for indexing an unindexing a document. The index hook is called before processing the words of the column to be indexed. This can be used to index extra text in addition to the column value or to modify the text before indexing. If the hook function returns 1 the column is not additionally processed. If hook returns 0 the column is processed normally, in addition to the words the hook may have inserted. See the section on hook functions. The hook functions are always named <table>_<column>_INDEX_HOOK and <table>_<column>_UNINDEX_HOOK, in the owner and qualifier of the table, where <table> is the unqualified name of the table.

LANGUAGE - The string literal following this keyword may be a language name. See Internationalization for details.

ENCODING - The string literal following this keyword may be an encoding name.

Creating the text index may take a long time. All this time is in 'atomic' mode, so no other database activity is allowed during it and no log is generated. If there is an error, e.g. out of disk the created index is dropped and the error is reported. A checkpoint should be made after the index is complete. If the index should be created under a different qualifier or the generated tables altered after the index is created the NOT INSERT option should be used to delay filling the index.

The CREATE TEXT INDEX statement will automatically make a unique index based on the free text document id, if this is not the PK of the table. If one wishes to modify this index, it can be found and dropped with DROP INDEX and reconstructed, but make sure that the reference in SYS_VT_INDEX matches and that the new index has the same name as the previous version.

A freshly created text index is in synchronous mode. This means that that changes to the table are immediately reflected on the index as they occur. This is done through a set of automatically generated triggers. If large changes to data will be performed, the batch mode is far more efficient.

	See Also:
	Although it is recommended to use the methods described above, the vt_create_text_index function can also be used

Prefix	IRI
n3	http://docs.openlinksw.com/virtuoso/creatingtxtidxs/
schema	http://schema.org/
n5	http://creativecommons.org/licenses/by/4.0/
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
n4	http://www.openlinksw.com/#
xsdh	http://www.w3.org/2001/XMLSchema#

Prefix	URI
xmlns:n3	http://docs.openlinksw.com/virtuoso/creatingtxtidxs/
xmlns:schema	http://schema.org/
xmlns:n5	http://creativecommons.org/licenses/by/4.0/
xmlns:rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
xmlns:n4	http://www.openlinksw.com/#
xmlns:xsdh	http://www.w3.org/2001/XMLSchema#

Prefix

URI

xmlns:n3

http://docs.openlinksw.com/virtuoso/creatingtxtidxs/

xmlns:schema

http://schema.org/

xmlns:n5

http://creativecommons.org/licenses/by/4.0/

xmlns:rdf

http://www.w3.org/1999/02/22-rdf-syntax-ns#

xmlns:n4

http://www.openlinksw.com/#

xmlns:xsdh

http://www.w3.org/2001/XMLSchema#

Subject	Predicate	Object
n3:	rdf:type	schema:TechArticle
n3:	schema:name	20.2.ÃÂ Creating Free Text Indexes
n3:	schema:copyrightHolder	_:vb78822
n3:	schema:datePublished	2016-09-09 16:16:54
n3:	schema:headline	20.2.ÃÂ Creating Free Text Indexes
n3:	schema:keywords	OpenLink,Virtuoso,database,RDBMS,relational,SQL,RDF,triple store,linked data,linked open data,Big Data
n3:	schema:license	n5:deed.en_US
n3:	schema:publisher	_:vb78821
n3:	schema:url	n3:
_:vb78821	rdf:type	schema:Organization
_:vb78821	schema:name	OpenLink Software
_:vb78821	schema:url	n4:this
_:vb78822	rdf:type	schema:Organization
_:vb78822	schema:name	OpenLink Software
_:vb78822	schema:url	n4:this

Subject

Predicate

Object

n3:

rdf:type

schema:TechArticle

n3:

schema:name

20.2.ÃÂ Creating Free Text Indexes

n3:

schema:copyrightHolder

_:vb78822

n3:

schema:datePublished

2016-09-09 16:16:54

n3:

schema:headline

20.2.ÃÂ Creating Free Text Indexes

n3:

schema:keywords

OpenLink,Virtuoso,database,RDBMS,relational,SQL,RDF,triple store,linked data,linked open data,Big Data

n3:

schema:license

n5:deed.en_US

n3:

schema:publisher

_:vb78821

n3:

schema:url

n3:

_:vb78821

rdf:type

schema:Organization

_:vb78821

schema:name

OpenLink Software

_:vb78821

schema:url

n4:this

_:vb78822

rdf:type

schema:Organization

_:vb78822

schema:name

OpenLink Software

_:vb78822

schema:url

n4:this

Prev	Up	Next
Chapter 20. Free Text Search	Home	20.2.2. Choosing An Application Specific Document ID

20.2. Creating Free Text Indexes

20.2.1. The CREATE TEXT INDEX statement

Namespace Prefixes

Statements

Namespace Prefixes

Statements