9.28.Virtuoso SQL Optimization
Virtuoso provides a cost based SQL optimizer which performs the following types of query transformation:
Join Order |
Loop Invariants |
Opening derived tables |
Migrating enclosing predicates into derived tables or unions |
Dropping unreferenced columns or results |
Detection of identically false predicates |
Index selection |
Grouping of co-located remote tables into single remote statements |
Selection of join algorithm |
Virtuoso evaluates various permutations of joined tables against
its cost model and determines the best fit, from which it generates
a query graph. This query graph can be returned as a result set by
the explain()
SQL function.
The cost model is based on table row counts, defined indices and
uniqueness constraints, and column cardinalities, i.e. counts of
distinct values in columns. Additionally, histograms can be made
for value distribution of individual columns.
Virtuoso automatically maintains statistics about tables in the local database. When tables are attached from known types of remote DBMS's, Virtuoso also attempts to retrieve statistics information if available. The sys_stat_analyze or sys_db_stat stored procedures can be used to force an update of statistics, also recompiling all SQL statements or procedures depending on these statistics. Once this is done, this overrides the automatic statistics. The values of automatic statistics can be seen in the SYS_COL_AUTO_STAT table.
The stored procedure:
DB.DBA.SYS_STAT_ANALYZE (in
full_table_name varchar, in prec integer, )
constructs the basic table statistics and feeds it into the DB.DBA.SYS_COL_STAT system table. The DB.DBA.SYS_DB_STAT stored procedure performs this operation on the entire database.
The stored procedure:
constructs table column histogram and feeds it into the DB.DBA.SYS_COL_HIST system table. The default value of prec, in both cases, is 5, which implies that a five percent sample of the table will be used. A percentage of 0 means that the whole table will be read.
Example9.43.Demonstration of the STAT_ANALYSE & STAT_HISTOGRAM Procedures
The following script is intended for use with the ISQL program as the user dba, in the DB qualifier. The foreach statement is a special feature of the ISQL utility.
create table "DB"."DBA"."DTTEST" ("ID" integer primary key); foreach integer between 1 10 insert into "DB"."DBA"."DTTEST" ("ID") values (?); sys_stat_analyze ('DB.DBA.DTTEST'); select * from DB.DBA.SYS_COL_STAT; sys_stat_histogram ('DB.DBA.DTTEST', 'ID', 2); select * from DB.DBA.SYS_COL_HIST;
That yields:
Resultset 1 (from DB.DBA.SYS_COL_STAT) ---------------------------------------- CS_TABLE CS_COL CS_N_DISTINCT CS_MIN CS_MAX CS_N_VALUES CS_N_ROWS VARCHAR NOT NULL VARCHAR NOT NULL INTEGER VARCHAR VARCHAR INTEGER INTEGER _______________________________________________________________________________ DB.DBA.DTTEST ID 10 1 10 10 10 Resultset 2 (from DB.DBA.SYS_COL_HIST) --------------------------------------- CH_TABLE CH_COL CH_NTH_SAMPLE CH_VALUE VARCHAR NOT NULL VARCHAR NOT NULL INTEGER NOT NULL VARCHAR _______________________________________________________________________________ DB.DBA.DTTEST id 0 1 DB.DBA.DTTEST id 5 6 DB.DBA.DTTEST id 10 0