5.1.7.Creating A Collation
Virtuoso supports collation orders for CHAR and VARCHAR fields that are different from the binary, as per ANSI SQL92. When comparing strings using a collation, Virtuoso compares the "weights" of the characters instead of their codes. This allows programs to make different characters compare as equal (example: case-insensitive comparisons).
A collation can be created by supplying a collation definition
text file to the collation_define()
SQL function. The collation definition file contains a list of the
exceptions to the binary collation order. An exception consists of
<character code> = <collation weight> pairs. For
example a case-insensitive collation can be defined by specifying
all the lower case letters to have the same collation weights as
the corresponding uppercase ones.
Collation Definition File
The collation definition file should follow the following guidelines:
-
Each definition should reside on a separate line.
-
The format of the definition is: <CHAR>=<CODE>, where
CHAR
andCODE
can be either the letters themselves, or their decimal codes. For example: '67=68' is the same as 'C=D' using the ASCII character set. For Unicode collations the codes can exceed the byte boundary.
You can define a new collation using the following function:
Collations System Table
The SYS_COLLATIONS system table holds the data for all defined collations. It has the following structure:
CREATE TABLE SYS_COLLATIONS ( COLL_NAME VARCHAR, COLL_TABLE LONG VARBINARY, COLL_IS_WIDE INTEGER);
COLL_NAME
is the fully
qualified name of the collation - its identifier.
COLL_TABLE
holds the
collation table itself. This is 256 bytes or 65536 wide chars.
COLL_IS_WIDE
holds the
collation's type: 0 for CHAR and 1 for NCHAR. An 8-bit collation
cannot be used by anything that requires NCHAR data and vice
versa.
A collation can be deleted by deleting its row from SYS_COLLATIONS.
Note | |
---|---|
The collation will still be available until the server is restarted, as it's definition is cached in memory. |
Collations and Column Data
The collation is a property of the column holding the data. This means that all comparisons including that column will use its collation. SQL functions will strip collation data from the column; for example, if a column "CompanyName" has an assigned collation "Spanish" then the SQL call
LEFT (CompanyName, 10);
will use the default collation).
Collations can be defined on a per-column basis, at table creation time, and on a per-database basis as a configuration parameter. There is a special form of the CAST operator that allows casting a column to a collation.
A collation identifier has the same form as any other SQL identifier (<qualifier>.<owner>.<name>) and it can be escaped with the same syntax as other identifiers.
Defining a Collation for a Table Column
You may assign a collation to a column at table creation using the following syntax:
create table TABLE_NAME ( ... COLLATED VARCHAR(50) COLLATE Spanish, COLLATED CHAR(20) COLLATE DB.DBA.Spanish, .... )
Assigning a collation to a non-character column gives an error.
If the COLLATE is omitted, the default database collation is used.
On database start-up the collation for each table's column is loaded from the SYS_COLLATIONS table and if not found, the COLLATE attribute is ignored until the next restart.
Defining Database-Wide Collations
The database's default collation is defined by the configuration
parameter "Collation" in the "Parameters" section of the virtuoso.ini file. This database
wide collation is the default system collation used where none
other is specified. This setting can only be changed in the
virtuoso.ini file and hence requires a Virtuoso server restart. As
with all collations, legal values are those contained in the
DB.DBA.SYS_COLLATIONS table. The list can be retrieved using
charsets_list(1)