18.5.10. Adding New Languages And Encodings Into Virtuoso

There are too many languages to be able to support them all by default so Virtuoso is user extensible in this respect. The built-in 'x-any' language supports most languages to a degree, but it is not the optimum solution for some specific languages or if you want to perform a words' normalization to make text search more effective. To make Virtuoso extensible, language-specific functions are organized into language handlers , and handlers are organized in hierarchical trees. Every handler contains pointers to such functions as "count words in given string", "call given callback once for every word in the string" etc.

XML documents and SQL procedures may identify languages by their names, for example by value of xml:lang attribute, content_language argument of built-in functions, or by __lang option etc... Every language handler defines up to two names of the language it supports, one matching ISO 639 regulations (e.g. 'en'), and one matching RFC 1766 (e.g. 'en-UK'). When Virtuoso finds a match to the language name specified, it searches through the an internal hash-table. If the name is unknown, the 'x-any' handler will be returned as a default.

Custom language handlers should contain a pointer to a more generic handler, e.g. to the handler, Handler may have NULLs stored instead of pointers to required functions, these NULLs will be replaced with pointers to generic handler's functions automatically when the custom handler will be activated.

	See Also:
lh_get_handler lh_load_handler

See Also:

lh_get_handler

lh_load_handler

There are two trees of language handlers in current version of Virtuoso. "Main" tree starts from 'x-any' root and contains handlers of languages used in documents, another tree starts from 'x-ftq-x-any' root contains handlers of Free Text query ('ftq') languages. The difference is in handling of wildcard characters: query string 'hello, wo*ld' consists of two "words", 'hello' and 'wo*ld', and 'x-ftq-x-any' will properly locate them, but 'x-any' handler will report three words -- 'hello', 'wo' and 'ld', because it knows nothing about special meaning of '*' in query strings. That is why every handler may contain a pointer to a handler of its own query language.

In addition to plain language handlers, it's possible to add handlers of "encoded language" They are useful if you have large number of documents in some particular encoding and speed of free text indexing is critical for your applications. While usual handlers deal with Unicode data, and it requires data to be decoded before processing, functions of "encoded language" handler may accept buffers of encoded text, eliminating decoding.

	See Also:
elh_get_handler elh_load_handler

See Also:

elh_get_handler

elh_load_handler

The OpenLink Virtuoso Server distribution contains sources of sample language handler for 'en-UK' language. The difference between 'x-any' handler and this one is in handling of abbreviations and numbers. 'en-UK' handler will read phrase '$3.54 per sq.inch.' as the sequence of words '3.54', 'per' and 'sq.inch', instead of sequence '54', 'per', 'sq' and 'inch' that 'x-any' will read. The generic 'x-any' handler has no specific rules for dealing with the "decimal point" because in many scripts "decimal comma" is used, thus '3.54' will be processed as pair of words '3' and '54', but '3' will be ignored in many cases as noise word due to its 1-character length.

In addition to the language extension interface, Virtuoso provides an eh_load_handler function to add new encodings, but it should be used solely for multi-character encodings which cannot be supported through the usual Virtuoso International Character Support . If an encoding was created by the CHARSET_DEFINE function, Virtuoso can build special lookup tables for very fast text translation from Unicode to the encoding, thus you are not likely to gain in performance by writing your own C code, but some applications will know nothing about your encoding because they will check only the SYS_CHARSETS system table.

Prefix	IRI
schema	http://schema.org/
n5	http://creativecommons.org/licenses/by/4.0/
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
n4	http://www.openlinksw.com/#
xsdh	http://www.w3.org/2001/XMLSchema#
n2	http://docs.openlinksw.com/virtuoso/langfuncapi/

Prefix	URI
xmlns:schema	http://schema.org/
xmlns:n5	http://creativecommons.org/licenses/by/4.0/
xmlns:rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
xmlns:n4	http://www.openlinksw.com/#
xmlns:xsdh	http://www.w3.org/2001/XMLSchema#
xmlns:n2	http://docs.openlinksw.com/virtuoso/langfuncapi/

Prefix

URI

xmlns:schema

http://schema.org/

xmlns:n5

http://creativecommons.org/licenses/by/4.0/

xmlns:rdf

http://www.w3.org/1999/02/22-rdf-syntax-ns#

xmlns:n4

http://www.openlinksw.com/#

xmlns:xsdh

http://www.w3.org/2001/XMLSchema#

xmlns:n2

http://docs.openlinksw.com/virtuoso/langfuncapi/

Subject	Predicate	Object
n2:	rdf:type	schema:APIReference
n2:	rdf:type	schema:TechArticle
n2:	schema:name	18.5.10.ÃÂ Adding New Languages And Encodings Into Virtuoso
n2:	schema:copyrightHolder	_:vb80860
n2:	schema:datePublished	2016-09-09 16:16:54
n2:	schema:headline	18.5.10.ÃÂ Adding New Languages And Encodings Into Virtuoso
n2:	schema:keywords	OpenLink,Virtuoso,database,RDBMS,relational,SQL,RDF,triple store,linked data,linked open data,Big Data
n2:	schema:license	n5:deed.en_US
n2:	schema:publisher	_:vb80859
n2:	schema:url	n2:
_:vb80859	rdf:type	schema:Organization
_:vb80859	schema:name	OpenLink Software
_:vb80859	schema:url	n4:this
_:vb80860	rdf:type	schema:Organization
_:vb80860	schema:name	OpenLink Software
_:vb80860	schema:url	n4:this

Subject

Predicate

Object

n2:

rdf:type

schema:APIReference

n2:

rdf:type

schema:TechArticle

n2:

schema:name

18.5.10.ÃÂ Adding New Languages And Encodings Into Virtuoso

n2:

schema:copyrightHolder

_:vb80860

n2:

schema:datePublished

2016-09-09 16:16:54

n2:

schema:headline

18.5.10.ÃÂ Adding New Languages And Encodings Into Virtuoso

n2:

schema:keywords

OpenLink,Virtuoso,database,RDBMS,relational,SQL,RDF,triple store,linked data,linked open data,Big Data

n2:

schema:license

n5:deed.en_US

n2:

schema:publisher

_:vb80859

n2:

schema:url

n2:

_:vb80859

rdf:type

schema:Organization

_:vb80859

schema:name

OpenLink Software

_:vb80859

schema:url

n4:this

_:vb80860

rdf:type

schema:Organization

_:vb80860

schema:name

OpenLink Software

_:vb80860

schema:url

n4:this

Prev	Up	Next
18.5.9. Executing SQL	Home	18.6. VSEI Plugins

18.5.10. Adding New Languages And Encodings Into Virtuoso

Namespace Prefixes

Statements

Namespace Prefixes

Statements