16.17.2. RDF Index Scheme

Starting with version 6.00.3126, the default RDF index scheme consists of 2 full indices over RDF quads plus 3 partial indices. This index scheme is generally adapted to all kinds of workloads, regardless of whether queries generally specify a graph. As indicated the default index scheme in Virtuoso is almost always applicable as is, whether one has a RDF database with very large numbers of small graphs or just one or a few large graphs. With Virtuoso 7 the indices are column-wise by default, which results in them to consuming usually about 1/3 of the space the equivalent row-wise structures would consume.

Alternate indexing schemes are possible but will not be generally needed. For upgrading old databases with a different index scheme see the corresponding documentation.

The index scheme consists of the following indices:

PSOG

- primary key
POGS

- bitmap index for lookups on object value.
SP

- partial index for cases where only S is specified.
OP

- partial index for cases where only O is specified.
GS

- partial index for cases where only G is specified.

This index scheme is created by the following statements:

CREATE TABLE DB.DBA.RDF_QUAD (
  G IRI_ID_8,
  S IRI_ID_8,
  P IRI_ID_8,
  O ANY,
  PRIMARY KEY (P, S, O, G)
  )
ALTER INDEX RDF_QUAD ON DB.DBA.RDF_QUAD
  PARTITION (S INT (0hexffff00));

CREATE DISTINCT NO PRIMARY KEY REF BITMAP INDEX RDF_QUAD_SP
  ON RDF_QUAD (S, P)
  PARTITION (S INT (0hexffff00));

CREATE BITMAP INDEX RDF_QUAD_POGS
  ON RDF_QUAD (P, O, G, S)
  PARTITION (O VARCHAR (-1, 0hexffff));

CREATE DISTINCT NO PRIMARY KEY REF BITMAP INDEX RDF_QUAD_GS
  ON RDF_QUAD (G, S)
  PARTITION (S INT (0hexffff00));

CREATE DISTINCT NO PRIMARY KEY REF INDEX RDF_QUAD_OP
  ON RDF_QUAD (O, P)
  PARTITION (O VARCHAR (-1, 0hexffff));

The idea is to favor queries where the predicate is specified in triple patterns. The entire quad can be efficiently accessed when P and at least one of S and O are known. This has the advantage of clustering data by the predicate which improves working set. A page read from disk will only have entries pertaining to the same predicate; chances of accessing other entries of the page are thus higher than if the page held values for arbitrary predicates. For less frequent cases where only S is known, as in DESCRIBE , the distinct P s of the S are found in the SP index. These SP pairs are then used for accessing the PSOG index to get the O and G . For cases where only the G is known, as when dropping a graph, the distinct S s of the G are found in the GS index. The P s of the S are then found in the SP index. After this, the whole quad is found in the PSOG index.

The SP , OP , and GS indices do not store duplicates. If an S has many values of the P , there is only one entry. Entries are not deleted from SP , OP , or GS . This does not lead to erroneous results since a full index (that is, either POSG or PSOG ) is always consulted in order to know if a quad actually exists. When updating data, most often a graph is entirely dropped and a substantially similar graph inserted in its place. The SP , OP , and GS indices get to stay relatively unaffected.

Still, over time, especially if there are frequent updates and values do not repeat between consecutive states, the SP , OP , and GS indices will get polluted, which may affect performance. Dropping and recreating the index will remedy this situation.

In cases where this is not practical, the index scheme should only have full indices; i.e., each key holds all columns of the primary key of the quad. This will be the case if the DISTINCT NO PRIMARY KEY REF options are not specified in the CREATE INDEX statement. In such cases, all indices remain in strict sync across deletes.

Many RDF workloads have bulk-load and read-intensive access patterns with few deletes. The default index scheme is optimized for these. With these situations, this scheme offers significant space savings, resulting in better working set. Typically, this layout takes 60-70% of the space of a layout with 4 full indices.

Prefix	IRI
schema	http://schema.org/
n4	http://creativecommons.org/licenses/by/4.0/
n2	http://docs.openlinksw.com/virtuoso/rdfperfrdfscheme/
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
n5	http://www.openlinksw.com/#
xsdh	http://www.w3.org/2001/XMLSchema#

Prefix	URI
xmlns:schema	http://schema.org/
xmlns:n4	http://creativecommons.org/licenses/by/4.0/
xmlns:n2	http://docs.openlinksw.com/virtuoso/rdfperfrdfscheme/
xmlns:rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
xmlns:n5	http://www.openlinksw.com/#
xmlns:xsdh	http://www.w3.org/2001/XMLSchema#

Prefix

URI

xmlns:schema

http://schema.org/

xmlns:n4

http://creativecommons.org/licenses/by/4.0/

xmlns:n2

http://docs.openlinksw.com/virtuoso/rdfperfrdfscheme/

xmlns:rdf

http://www.w3.org/1999/02/22-rdf-syntax-ns#

xmlns:n5

http://www.openlinksw.com/#

xmlns:xsdh

http://www.w3.org/2001/XMLSchema#

Subject	Predicate	Object
n2:	rdf:type	schema:TechArticle
n2:	rdf:type	schema:APIReference
n2:	schema:name	16.17.2.ÃÂ RDF Index Scheme
n2:	schema:copyrightHolder	_:vb81370
n2:	schema:datePublished	2016-09-09 16:16:54
n2:	schema:headline	16.17.2.ÃÂ RDF Index Scheme
n2:	schema:keywords	OpenLink,Virtuoso,database,RDBMS,relational,SQL,RDF,triple store,linked data,linked open data,Big Data
n2:	schema:license	n4:deed.en_US
n2:	schema:publisher	_:vb81369
n2:	schema:url	n2:
_:vb81369	rdf:type	schema:Organization
_:vb81369	schema:name	OpenLink Software
_:vb81369	schema:url	n5:this
_:vb81370	rdf:type	schema:Organization
_:vb81370	schema:name	OpenLink Software
_:vb81370	schema:url	n5:this

Subject

Predicate

Object

n2:

rdf:type

schema:TechArticle

n2:

rdf:type

schema:APIReference

n2:

schema:name

16.17.2.ÃÂ RDF Index Scheme

n2:

schema:copyrightHolder

_:vb81370

n2:

schema:datePublished

2016-09-09 16:16:54

n2:

schema:headline

16.17.2.ÃÂ RDF Index Scheme

n2:

schema:keywords

OpenLink,Virtuoso,database,RDBMS,relational,SQL,RDF,triple store,linked data,linked open data,Big Data

n2:

schema:license

n4:deed.en_US

n2:

schema:publisher

_:vb81369

n2:

schema:url

n2:

_:vb81369

rdf:type

schema:Organization

_:vb81369

schema:name

OpenLink Software

_:vb81369

schema:url

n5:this

_:vb81370

rdf:type

schema:Organization

_:vb81370

schema:name

OpenLink Software

_:vb81370

schema:url

n5:this

Prev	Up	Next
16.17. RDF Performance Tuning	Home	16.17.3. Index Scheme Selection

16.17.2. RDF Index Scheme

Namespace Prefixes

Statements

Namespace Prefixes

Statements