11.23. Vectored Execution and Query Parallelization
Note: This feature only applies to Virtuoso 7.0 and later.
Vectored execution means executing queries or stored procedures simultaneously on multiple sets of parameters. Further, when a query contains a JOIN , a single invocation of the query will, with vectored execution, execute every consecutive JOIN step with multiple inputs. When every stage of a query's evaluation is performed on a large number of intermediate result rows at a time, two benefits are obtained:
-
The interpretation overhead disappears and
-
Locality of reference in
JOINs
can be better exploited.
For example, with a JOIN like:
SELECT COUNT(*) FROM part, lineitem WHERE l_partkey = p_partkey AND p_size < 23 OPTION ( LOOP, ORDER );
the outermost LOOP of the query will look for parts with p_size < 23 . The part keys of these are used as lookup keys for an index on l_partkey in lineitem . This index translates these values into values of the primary key, l_orderkey , l_linenumber , which is then used to get the data row with the price and discount. When each of these steps is done with tens of thousands of values at the same time, the SQL interpretation overhead is almost completely eliminated and locality can be exploited when accessing nearby rows. The chance of hitting nearby rows also increases when the size of the intermediate result batch increases.
11.23.1. Automatic Query Parallelization
If a query does not modify data, executes in READ COMMITTED isolation, and contains some form of aggregation or ORDER BY , it can be automatically parallelized. Parallelization typically splits the query's outermost LOOP into approximately equal size chunks, which are independently evaluated each on its own thread. The results are merged together when all are ready, and are combined in an aggregation or ORDER BY . This is entirely transparent to the user.