16.17.3.Index Scheme Selection
The indexes in place on the RDF_QUAD
table
can greatly affect the performance of SPARQL queries,
as can be determined by running the STATISTICS
command on the table as follows:
SQL> STATISTICS DB.DBA.RDF_QUAD; Showing SQLStatistics of table(s) 'DB.DBA.RDF_QUAD' TABLE_QUALIFIER TABLE_OWNER TABLE_NAME NON_UNIQUE INDEX_QUALIFIER INDEX_NAME TYPE SEQ_IN_INDEX COLUMN_NAME COLLATION CARDINALITY PAGES FILTER_CONDITION VARCHAR VARCHAR VARCHAR SMALLINT VARCHAR VARCHAR SMALLINT SMALLINT VARCHAR VARCHAR INTEGER INTEGER VARCHAR _______________________________________________________________________________ DB DBA RDF_QUAD NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL DB DBA RDF_QUAD 0 DB RDF_QUAD 3 1 P NULL NULL NULL NULL DB DBA RDF_QUAD 0 DB RDF_QUAD 3 2 S NULL NULL NULL NULL DB DBA RDF_QUAD 0 DB RDF_QUAD 3 3 O NULL NULL NULL NULL DB DBA RDF_QUAD 0 DB RDF_QUAD 3 4 G NULL NULL NULL NULL DB DBA RDF_QUAD 1 DB RDF_QUAD_GS 3 1 G NULL NULL NULL NULL DB DBA RDF_QUAD 1 DB RDF_QUAD_GS 3 2 S NULL NULL NULL NULL DB DBA RDF_QUAD 1 DB RDF_QUAD_OP 3 1 O NULL NULL NULL NULL DB DBA RDF_QUAD 1 DB RDF_QUAD_OP 3 2 P NULL NULL NULL NULL DB DBA RDF_QUAD 1 DB RDF_QUAD_POGS 3 1 P NULL NULL NULL NULL DB DBA RDF_QUAD 1 DB RDF_QUAD_POGS 3 2 O NULL NULL NULL NULL DB DBA RDF_QUAD 1 DB RDF_QUAD_POGS 3 3 G NULL NULL NULL NULL DB DBA RDF_QUAD 1 DB RDF_QUAD_POGS 3 4 S NULL NULL NULL NULL DB DBA RDF_QUAD 1 DB RDF_QUAD_SP 3 1 S NULL NULL NULL NULL DB DBA RDF_QUAD 1 DB RDF_QUAD_SP 3 2 P NULL NULL NULL NULL 15 Rows. -- 24 msec. SQL>
With only one index (OGPS
) created by
default, if the graph is always given, as with one or more
FROM
or FROM
NAMED
clauses, and there are no patterns where only graph
and predicate are given, then the default indices should be
sufficient. If predicate and graph are given but subject is not,
then it is sometimes useful to add:
CREATE BITMAP INDEX RDF_QUAD_PGOS ON DB.DBA.RDF_QUAD (G, P, O, S) PARTITION (O VARCHAR (-1, 0hexffff));
Note: If the server version is pre-5.0.7, leave out the partitioning clause.
Making the PGOS
index can help in some
cases even if it is not readily apparent from the queries that one
is needed. This is so, for example, if the predicate by itself is
selective; i.e., there is a predicate that occurs in only a few
triples.
There is one known application scenario that requires a small alteration to the default index scheme. If the application has a large number of small graphs, e.g. millions of graphs of tens or hundreds of triples each, and it commonly happens that large numbers of graphs contain exactly the same triple, for example the same triple is found in 100000 or one million graphs, then some operations will become inefficient with the default index scheme. In specific, dropping a graph may have to scan through large amounts of data in order to find the right quad to delete from the set of quads that differ only in the graph.
This will affect a graph replace and a graph drop or generally
any deletion that falls on a quad of the described sort. If this is
the situation in the application, then dropping the RDF_QUAD_GS
distinct projection and replacing it with
a covering index that starts with G
is
appropriate:
Drop index RDF_QUAD_GS; Create column index RDF_QUAD_GPSO on RDF_QUAD (G, P, S, O) partition (S int (0hexffff00);
The partition clause only affects cluster settings and is
ignored in the single server case. Partitioning on S
is usually better than on O
since the distribution of S
is generally less skewed than that of O
. That is, there usually are some very common
O
values (e.g. class "thing"). This will
increase space consumption by maybe 25% compared to the default
scheme.