15.4.8. XML Free Text Indexing Rules

XML documents are inserted into the free text index as follows:

The process works on the parsed XML tree; therefore character and local entity references are expanded.

Whole words of text content, bounded by delimiters used for free text, are each assigned an ordinal number. Noise words defined in the noise.txt file used by free text indexing are not counted.

Attribute names and values are not indexed.

Element start and end tags are indexed using the expanded names - that is, prefixed with the namespace URI + ':'.

An element start tag's ordinal number is one less than the ordinal number of the first whole word in the text value.

A close tag's ordinal number is one greater than that of the last word in the text value.

From these rules follows that:

<html>
  <body>
   <title>Title of Document</title>
   <p>Some <b>bold</b> text </p>
  </body>
</html>

will be indexed as follows:

<html>		0
<body>		0
<title>		0
Title		1
of		- no number, noise word
Document		2
</title>		3
   <p>		3
Some		4
 <b>		4
bold		5
</b>		6
 text		6
</p>		6
  </body>		6
</html>		6

As a result, the phrase "some bold text" is the string value of the <p> tag and will match the free text expression "some bold text" even though there is mark-up in it. Conversely, the phrase "Document some bold" does not match. Words will not considered adjacent if there is a mix of opening and closing tags. They will only be considered adjacent if there are solely one or more either opening or closing tags between them. This can be circumvented by using the NEAR connective instead of the phrase construct.

A free text condition will only be true of an element if all the words needed to satisfy the condition are part of the element's string value. This string value includes text children of descendants.

Prefix	IRI
schema	http://schema.org/
n5	http://creativecommons.org/licenses/by/4.0/
n3	http://docs.openlinksw.com/virtuoso/xmlfreetextrules/
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
n4	http://www.openlinksw.com/#
xsdh	http://www.w3.org/2001/XMLSchema#

Prefix	URI
xmlns:schema	http://schema.org/
xmlns:n5	http://creativecommons.org/licenses/by/4.0/
xmlns:n3	http://docs.openlinksw.com/virtuoso/xmlfreetextrules/
xmlns:rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
xmlns:n4	http://www.openlinksw.com/#
xmlns:xsdh	http://www.w3.org/2001/XMLSchema#

Prefix

URI

xmlns:schema

http://schema.org/

xmlns:n5

http://creativecommons.org/licenses/by/4.0/

xmlns:n3

http://docs.openlinksw.com/virtuoso/xmlfreetextrules/

xmlns:rdf

http://www.w3.org/1999/02/22-rdf-syntax-ns#

xmlns:n4

http://www.openlinksw.com/#

xmlns:xsdh

http://www.w3.org/2001/XMLSchema#

Subject	Predicate	Object
n3:	rdf:type	schema:APIReference
n3:	rdf:type	schema:TechArticle
n3:	schema:name	15.4.8.ÃÂ XML Free Text Indexing Rules
n3:	schema:copyrightHolder	_:vb82702
n3:	schema:datePublished	2016-09-09 16:16:54
n3:	schema:headline	15.4.8.ÃÂ XML Free Text Indexing Rules
n3:	schema:keywords	OpenLink,Virtuoso,database,RDBMS,relational,SQL,RDF,triple store,linked data,linked open data,Big Data
n3:	schema:license	n5:deed.en_US
n3:	schema:publisher	_:vb82701
n3:	schema:url	n3:
_:vb82701	rdf:type	schema:Organization
_:vb82701	schema:name	OpenLink Software
_:vb82701	schema:url	n4:this
_:vb82702	rdf:type	schema:Organization
_:vb82702	schema:name	OpenLink Software
_:vb82702	schema:url	n4:this

Subject

Predicate

Object

n3:

rdf:type

schema:APIReference

n3:

rdf:type

schema:TechArticle

n3:

schema:name

15.4.8.ÃÂ XML Free Text Indexing Rules

n3:

schema:copyrightHolder

_:vb82702

n3:

schema:datePublished

2016-09-09 16:16:54

n3:

schema:headline

15.4.8.ÃÂ XML Free Text Indexing Rules

n3:

schema:keywords

OpenLink,Virtuoso,database,RDBMS,relational,SQL,RDF,triple store,linked data,linked open data,Big Data

n3:

schema:license

n5:deed.en_US

n3:

schema:publisher

_:vb82701

n3:

schema:url

n3:

_:vb82701

rdf:type

schema:Organization

_:vb82701

schema:name

OpenLink Software

_:vb82701

schema:url

n4:this

_:vb82702

rdf:type

schema:Organization

_:vb82702

schema:name

OpenLink Software

_:vb82702

schema:url

n4:this

Prev	Up	Next
15.4.7. text-contains XPath Predicate	Home	15.4.9. XML Processing & Free Text Encoding Issues

15.4.8. XML Free Text Indexing Rules

Namespace Prefixes

Statements

Namespace Prefixes

Statements