Name
xper_doc — returns an entity object ('XPER entity') created from an XML document
Synopsis
xper_doc
(
|
in document varchar , |
in parser_mode integer , | |
in base_uri varchar , | |
in content_encoding varchar , | |
in content_language varchar , | |
in dtd_validator_config varchar , | |
in
index_attrs
integer
) ; |
Description
This parses the argument, which is expected to be a well formed XML fragment and returns a parse tree as a special object with underlying disk structure, named "persistent XML" or "XPER" While the result of xml_tree is a memory-resident array of vectors, the XPER object consumes only a little amount of memory, and almost all data are disk-resident. XPERs are better then "XML trees" for large documents and for "write once -- read many" stores such as a table with one XML document per row used as a "library" of documents. To be saved in a LONG VARCHAR column, "XML tree" entity will be converted back to plain text of XML syntax; but "XPER" entity will be saved as a ready-to-use disk structure.
Parameters
document
well formed XML or HTML document
parser_mode
0, 1 or 2; 0 - XML parser mode, 1 - HTML parser mode, 2 - 'dirty HTML' mode (with quiet recovery after any syntax error)
base_uri
in HTML parser mode change all absolute references to relative from given base_uri (http://<host>:<port>/<path>)
content_encoding
string with content encoding type of <document>; valid are 'ASCII', 'ISO', 'UTF8', 'ISO8859-1', 'LATIN-1' etc., defaults are 'UTF-8' for XML mode and 'LATIN-1' for HTML mode.
content_language
string with language tag of content of <document>; valid names are listed in IETF RFC 1766, default is 'x-any' (it means 'mix of words from various human languages)
dtd_validator_config
configuration string for DTD validator, default is empty string meaning that DTD validator should be fully disabled. See Configuration Options of the DTD Validator for details.
index_attrs
1 or 0, indicating if additional free-text indexing information must be stored for all attributes of the document. It is 1 by default. If set to '0', it will produce a disk structure compatible with old versions of Virtuoso and will give a small benefit in disk usage but it will disable some important optimizations in free-text search operations.
Return Types
XML entity with underlying parse tree of source document; the tree will be a special sort of BLOB.
Examples
Example 24.530. Xper_Doc
declare tree any; tree := xper_doc (file_to_string ('doc.html'), 1, 'http://localhost.localdomain/', 'ISO'); ... tree := xper_doc (file_to_string ('doc.xml')); ... -- String cannot be longer than 10 megabytes. String session can. tree := xper_doc (file_to_string_output ('huge_doc.xml')); ... -- A special way to read local files. -- Strings started from characters 'file://' -- are treated as local filesystem URIs. tree := xper_doc ('file://doc.xml');