Note: This feature only applies to Virtuoso 7.0 and later.
As of version 7, Virtuoso offers a column-wise compressed storage format alongside its traditional row-wise storage format.
In the column-wise storage model, each column of a table or index is stored contiguously, so that values of a single column on consecutive rows are physically adjacent. In this way, adjacent values are of the same type, and if the index is sorted on said value, the consecutive values often form an ascending sequence. This organization allows the use of more powerful compression techniques than could be used for rows where consecutive values belong to different columns, and thus are of disparate data types with values in different ranges.
Furthermore, when queries only access a subset of columns from one table, only those columns actually being accessed need to be read from disk, thereby making better use of I/O throughput and memory. Unreferenced columns will not take space in the memory based cache of the database. Further, the traffic between CPU cache and main memory is reduced when data is more compact, leading to better CPU utilization.
The column-wise format is substantially more compact and offers substantially greater sequential-access performance, as well as greater random-access performance in situations where many rows are accessed together in a join. For single-row random-access, a row-wise format offers higher performance as long as the data is in memory. In practice, for large tables, the higher compression achieved with column-wise storage allows a larger portion of the data to be kept in memory, leading to less frequent I/O and consequently higher performance.
One should not use column-wise storage in cases where columns are frequently updated, especially if a single row is updated per statement. This will give performance substantially worse than row-wise storage. However, bulk inserts and deletes are efficient with column-wise storage.
Any index or primary key, i.e., any table, can be declared to be stored column-wise. A single table can have multiple indices, of which some are stored column-wise and some are not. As with tables stored row-wise, the table row itself is stored following the primary key index entry on the index tree leaf corresponding to the entry. This arrangement is sometimes called a clustered index .
One can specify column-wise storage as the default for any new tables or indices by adding ColumnStore = 1 to the [Parameters] section of the virtuoso.ini file. Otherwise, tables and indices are created tow-wise unless the column option is specified, as described below.
The statement below declares the table xx to be stored column-wise:
CREATE TABLE xx ( id INT, data VARCHAR, PRIMARY KEY (id) COLUMN );
This statement adds a column-wise stored index to the table:
CREATE COLUMN INDEX xxi ON xx (data);
The COLUMN keyword can come after the column list of the primary key declaration of a table or anywhere between the CREATE and INDEX keywords of a create index statement.
Note that the BITMAP keyword cannot be used together with the COLUMN keyword. Column-wise indices will automatically use bitmap compression when appropriate without this being specified. A column-wise index is likely to be more space-efficient than a row-wise bitmap index with the same key parts.
The directives for column compression in CREATE TABLE (NO COMPRESS, COMPRESS PREFIX) have no effect on column-wise stored tables. Data is compressed in a manner chosen at run time based on the data itself.
All SQL operations work identically for column- or row-wise tables and indices. The locking behavior is also identical, with row-level locking supported on all isolation levels. The behavior of the READ COMMITTED isolation is non-locking, showing the pre-image of updated data when reading pages with uncommitted inserts or updates.
Recovery is by roll forward, and checkpoints will only store committed states of the database, even if started when there are uncommitted transactions pending.
The system table DB.DBA.sys_col_info holds information about space utilization of column-wise indices.
This table is updated only after the DB.DBA.sys_index_space_stats procedure view has been accessed. Thus, one must first make a selection from DB.DBA.sys_index_space_stats .
The columns of sys_col_info have the following meaning:
- The table in question.
- The index in question.
- The ordinal position of the column in question in the key.
- This indicates the type of compression entry the rest of the row concerns. For each column in the key, there is a row with
set to -1, representing the total of the remaining fields.
- The name of the column concerned.
- This is the number of database pages allocated for storing data of this column.
- The count of compression entries for the column. A compression entry is logically an array of consecutive values that share a common compression format. Different parts of the same column may have different compression.
- This is the count of values that are stored with the compression format in question.
- The is the number of bytes actually occupied by the compression entries concerned. Pages may not always by full, thus this metric can be used to measure the page fill ratio, i.e.:
100 * coi_bytes / (coi_n_pages * 8192.0)
To see which columns take the most space, and how full the pages are, as well as the overall effectiveness of compression, one can do:
SELECT coi_column , coi_pages * 8192 AS total_bytes , coi_bytes / (coi_pages * 8192.0) AS page_fill , coi_bytes , 1.0 * coi_bytes / coi_values AS ce_bytes_per_value , 8192.0 * coi_pages / coi_values AS bytes_per_value FROM sys_col_info WHERE coi_type = -1 ORDER BY coi_pages DESC ;
Note that issuing a query like:
SELECT TOP 20 * FROM sys_index_space_stats ORDER BY iss_pages DESC;
will update the sys_col_info table which is initially empty.
The sys_index_space_stats view shows the number of pages used for the sparse row-wise index tree top for column-wise indices.
The number of rows shown there for column-wise indices is the number of entries of the sparse index, not the row-count of the index. The space utilization here will be under 1% of the total for a column-wise index.
Below we look at space utilization of the O column of the primary key of the RDF_QUAD table.
SELECT * FROM sys_col_info WHERE coi_index = 'DB.DBA.RDF_QUAD' AND coi_column = 'O' ; coi_table coi_index coi_nth coi_type coi_column coi_pages coi_ces coi_values coi_bytes VARCHAR NOT NULL VARCHAR NOT NULL INTEGER NOT NULL INTEGER NOT NULL VARCHAR INTEGER INTEGER INTEGER INTEGER _______________________________________________________________________________ DB.DBA.RDF_QUAD DB.DBA.RDF_QUAD 2 -1 O 654663 0 1252064815 4617808494 DB.DBA.RDF_QUAD DB.DBA.RDF_QUAD 2 1 O 0 229074 97104862 947215 DB.DBA.RDF_QUAD DB.DBA.RDF_QUAD 2 3 O 0 3227395 490806316 3905658370 DB.DBA.RDF_QUAD DB.DBA.RDF_QUAD 2 4 O 0 94038 17227799 8554746 DB.DBA.RDF_QUAD DB.DBA.RDF_QUAD 2 6 O 0 389126 551074747 579191659 DB.DBA.RDF_QUAD DB.DBA.RDF_QUAD 2 8 O 0 160814 48480188 12026273 DB.DBA.RDF_QUAD DB.DBA.RDF_QUAD 2 10 O 0 652817 47370903 111430231
The top line is the overall summary across all the compression types.
The lines below give information per-compression-type. The values of coi_type mean the following:
. The value occurs once, followed by the number of repetitions.
. Values are stored consecutively without compression. The array elements are 4- or 8-byte depending on range. For variable length types, some compression applies because values differing only in their last byte will only have the last byte stored.
. For closely-spaced unique ascending values, the bitmap has a start value in full, and a bitmap with the nth bit set if start + nth occurs in the column.
. For non-ordered, low-cardinality columns, there can be a dictionary with either 4 or 8 bytes per entry, depending on the number of distinct values being encoded. The compression entry is prefixed by an array with the values in full, followed by an array of positions in the dictionary.
run length with small deltas
. For repeating, closely-spaced ascending values, the run-length-delta format stores a start value in full, followed by an array of bytes of which 4 bits are a delta to the previous value, and 4 bits are a run length.
integer delta with large deltas
. This format stores an initial value followed by stretches of non-ordered values within 64K of the base value. There can be multiple such stretches, each prefixed with a 32-bit delta from the base value. This is useful for closely-spaced medium- cardinality values like dates, or for relatively sparse ascending sequences, e.g., ascending sequences with a step of 1000 or more.