9.28.Virtuoso SQL Optimization

Virtuoso provides a cost based SQL optimizer which performs the following types of query transformation:

Join Order
Loop Invariants
Opening derived tables
Migrating enclosing predicates into derived tables or unions
Dropping unreferenced columns or results
Detection of identically false predicates
Index selection
Grouping of co-located remote tables into single remote statements
Selection of join algorithm

Virtuoso evaluates various permutations of joined tables against its cost model and determines the best fit, from which it generates a query graph. This query graph can be returned as a result set by the explain() SQL function. The cost model is based on table row counts, defined indices and uniqueness constraints, and column cardinalities, i.e. counts of distinct values in columns. Additionally, histograms can be made for value distribution of individual columns.

Virtuoso automatically maintains statistics about tables in the local database. When tables are attached from known types of remote DBMS's, Virtuoso also attempts to retrieve statistics information if available. The sys_stat_analyze or sys_db_stat stored procedures can be used to force an update of statistics, also recompiling all SQL statements or procedures depending on these statistics. Once this is done, this overrides the automatic statistics. The values of automatic statistics can be seen in the SYS_COL_AUTO_STAT table.

The stored procedure:

DB.DBA.SYS_STAT_ANALYZE (in full_table_name varchar, in prec integer, )

constructs the basic table statistics and feeds it into the DB.DBA.SYS_COL_STAT system table. The DB.DBA.SYS_DB_STAT stored procedure performs this operation on the entire database.

The stored procedure:

DB.DBA.SYS_STAT_HISTOGRAM (in full_table_name varchar, in full_column_name varchar, in n_buckets integer, in prec integer, )

constructs table column histogram and feeds it into the DB.DBA.SYS_COL_HIST system table. The default value of prec, in both cases, is 5, which implies that a five percent sample of the table will be used. A percentage of 0 means that the whole table will be read.

Example9.43.Demonstration of the STAT_ANALYSE & STAT_HISTOGRAM Procedures

The following script is intended for use with the ISQL program as the user dba, in the DB qualifier. The foreach statement is a special feature of the ISQL utility.

create table "DB"."DBA"."DTTEST" ("ID" integer primary key);
foreach integer between 1 10 insert into "DB"."DBA"."DTTEST" ("ID") values (?);
sys_stat_analyze ('DB.DBA.DTTEST');
select * from DB.DBA.SYS_COL_STAT;
sys_stat_histogram ('DB.DBA.DTTEST', 'ID', 2);
select * from DB.DBA.SYS_COL_HIST;

That yields:

Resultset 1 (from DB.DBA.SYS_COL_STAT)
----------------------------------------
CS_TABLE         CS_COL           CS_N_DISTINCT CS_MIN  CS_MAX  CS_N_VALUES CS_N_ROWS
VARCHAR NOT NULL VARCHAR NOT NULL INTEGER       VARCHAR VARCHAR INTEGER     INTEGER
_______________________________________________________________________________

DB.DBA.DTTEST    ID               10            1       10      10          10

Resultset 2 (from DB.DBA.SYS_COL_HIST)
---------------------------------------
CH_TABLE          CH_COL            CH_NTH_SAMPLE     CH_VALUE
VARCHAR NOT NULL  VARCHAR NOT NULL  INTEGER NOT NULL  VARCHAR
_______________________________________________________________________________

DB.DBA.DTTEST     id                0                 1
DB.DBA.DTTEST     id                5                 6
DB.DBA.DTTEST     id                10                0