5.1.7.Creating A Collation

Virtuoso supports collation orders for CHAR and VARCHAR fields that are different from the binary, as per ANSI SQL92. When comparing strings using a collation, Virtuoso compares the "weights" of the characters instead of their codes. This allows programs to make different characters compare as equal (example: case-insensitive comparisons).

A collation can be created by supplying a collation definition text file to the collation_define() SQL function. The collation definition file contains a list of the exceptions to the binary collation order. An exception consists of <character code> = <collation weight> pairs. For example a case-insensitive collation can be defined by specifying all the lower case letters to have the same collation weights as the corresponding uppercase ones.

Collation Definition File

The collation definition file should follow the following guidelines:

  • Each definition should reside on a separate line.

  • The format of the definition is: <CHAR>=<CODE>, where CHAR and CODE can be either the letters themselves, or their decimal codes. For example: '67=68' is the same as 'C=D' using the ASCII character set. For Unicode collations the codes can exceed the byte boundary.

You can define a new collation using the following function:

collation_define ( COLLATION_NAME FILE_PATH ADD_TYPE )

Collations System Table

The SYS_COLLATIONS system table holds the data for all defined collations. It has the following structure:

CREATE TABLE SYS_COLLATIONS (
    COLL_NAME VARCHAR,
    COLL_TABLE LONG VARBINARY,
    COLL_IS_WIDE INTEGER);

COLL_NAME is the fully qualified name of the collation - its identifier.

COLL_TABLE holds the collation table itself. This is 256 bytes or 65536 wide chars.

COLL_IS_WIDE holds the collation's type: 0 for CHAR and 1 for NCHAR. An 8-bit collation cannot be used by anything that requires NCHAR data and vice versa.

A collation can be deleted by deleting its row from SYS_COLLATIONS.

[Note] Note

The collation will still be available until the server is restarted, as it's definition is cached in memory.

Collations and Column Data

The collation is a property of the column holding the data. This means that all comparisons including that column will use its collation. SQL functions will strip collation data from the column; for example, if a column "CompanyName" has an assigned collation "Spanish" then the SQL call

LEFT (CompanyName, 10);

will use the default collation).

Collations can be defined on a per-column basis, at table creation time, and on a per-database basis as a configuration parameter. There is a special form of the CAST operator that allows casting a column to a collation.

A collation identifier has the same form as any other SQL identifier (<qualifier>.<owner>.<name>) and it can be escaped with the same syntax as other identifiers.

Defining a Collation for a Table Column

You may assign a collation to a column at table creation using the following syntax:

create table TABLE_NAME (
...
COLLATED   VARCHAR(50) COLLATE Spanish,
COLLATED   CHAR(20) COLLATE DB.DBA.Spanish,
....
)

Assigning a collation to a non-character column gives an error.

If the COLLATE is omitted, the default database collation is used.

On database start-up the collation for each table's column is loaded from the SYS_COLLATIONS table and if not found, the COLLATE attribute is ignored until the next restart.

Defining Database-Wide Collations

The database's default collation is defined by the configuration parameter "Collation" in the "Parameters" section of the virtuoso.ini file. This database wide collation is the default system collation used where none other is specified. This setting can only be changed in the virtuoso.ini file and hence requires a Virtuoso server restart. As with all collations, legal values are those contained in the DB.DBA.SYS_COLLATIONS table. The list can be retrieved using charsets_list(1)