13.3.Transactional Replication

The unit of replication is a publication. A publication is an ordered sequence of transaction entries. One database transaction can add data to zero or more publications. The data contributed to a publication by a transaction is appended to the publication at the time of commit. Because commits are serialized database wide, items in a publication have a well defined order.

Each transaction entry in a publication has a unique sequence number within the publication. Each subscriber of a publication has a level of synchronization, which is the serial number of the last transaction from the publication which this subscriber has processed.

Each publication has exactly one publisher and zero or more subscribers. Any multi-master merge replication schemes will be based on this notion, with data to be merged back into the original source regarded as a separate publication and the merge regarded as a process between publications.

In order to publish data for replication by others a server must have a unique name within the group of servers participating in the replication. This server name is assigned to the server in its virtuoso.ini file in the DBName setting.

To publish data the publishing server initializes a publication with the repl_publish function, where it names the publication and assigns a log file name for it. The server can then start adding transactions to the publication, which can happen either under application control or implicitly.

[Tip] Tip

See the repl_text function.

To subscribe to publications a server must also have a distinct DBName. It identifies the publishing server by associating a host name and port number to its logical name with the repl_server function. It can then call repl_subscribe() . Replication feeds from publisher are replayed by 'dba' user by default. The default can be changed (see repl_subscribe() function). for each of the publications it subscribes to. A publication is uniquely identified on the subscriber with the publishing server name and the publication name. Note that several servers in a network may publish like named publications and these will be logically distinct, having each their own distinct publisher.

A subscriber may or may not be connected to the publisher at any point in time. If a subscriber is connected to the publisher it may either be 'in sync' or syncing'. In the syncing state it is receiving transaction entries with numbers consecutive from its sync level up until the last committed serial number committed on the server.

At the start of the sync communication the subscriber indicates the level of the last successfully processed transaction in the publication. The sync exchange terminates when the subscriber reaches the last committed item on the publication. At this point the subscriber is said to be 'in sync'. The connection to the publisher is then maintained by default and is used to send sync information as it becomes available. This means that once an entry is appended to the publication by a committing a transaction it is sent to the 'in sync' subscribers without separate request.

The publisher can terminate the replication feed by unilateral decision. It will do it if the sending of the message times out for too long or if the queue of 'to be sent' replication records exceeds a settable threshold. This essentially happens with communication failures or if the subscriber continuously processes the feed at a speed lower than the feed production speed of the publisher. A disconnected subscriber can reconnect at will, in which case it enters the 'syncing' state and will receive transactions from the point where the feed was cut.

A subscriber can disconnect from the publisher at any time without ill effect.

A table

SYS_REPL_ACCOUNTS (
    SERVER varchar,
    ACCOUNT varchar,

    NTH integer,
    LEVEL integer,
    IS_MANDATORY integer,
    IS_UPDATEABLE integer,
    SYNC_USER varchar,

    P_MONTH integer,
    P_DAY integer,
    P_WDAY integer,
    P_TIME time,

    primary key (SERVER, ACCOUNT))

is used to store information about published accounts and accounts this server is subscribed to.

A table

SYS_REPL_SUBSCRIBERS (
    RS_SERVER varchar,
    RS_ACCOUNT varchar,
    RS_SUBSCRIBER varchar not null,

    RS_LEVEL integer NOT NULL,
    RS_VALID integer NOT NULL,

    primary key (RS_SERVER, RS_ACCOUNT, RS_SUBSCRIBER))

is used to store subscribers' status (pushback accounts for updateable subscriptions are there too). Subscribers for an account are added to this table automatically on each request to sync an account from subscriber or manually from Admin UI.

SYS_REPL_SUBSCRIBERS.RS_VALID column is be used to designate subscribers whose replication account level is valid (lags not more than REPL_MAX_DELTA behind the publisher's level).

RS_VALID state of subscriber is checked and updated on every sync request from subscriber. If subscriber is found to be invalid all further sync requests from it are ignored. Such subscriber need to be reinitialized manually and marked as valid using Admin UI.