16.16.RDF Replication

Tables of RDF storage, such as DB.DBA.RDF_QUAD and DB.DBA.RDF_OBJ, can not be replicated in a usual way, because it's content is cached in memory in special ways and synchronized with values outside these tables, such as current values of special sequence objects.

Moreover, same IRI may have different internal IRI_IDs on different boxes, because the assigned IDs vary if new IRIs appear in data in different order. Similarly, there will be different IDs of RDF literal, datatypes and languages, blocking any attempt of one-to-one replication between RDF storages.

However, a special asynchronous RDF replication makes it possible to configure a "publisher" Virtuoso instance to keep the log of changes in some RDF graphs and subscribe some Virtuoso instances to replay all these changes.

Configuration functions are quite straightforward.

RDF graphs to replicate are all members of <http://www.openlinksw.com/schemas/virtrdf#rdf_repl_graph_group> graph group. That group can be filled in with graphs like any other graph group, but it is better to get the advantage of proper security check made by DB.DBA.RDF_REPL_GRAPH_INS() that inserts a graph to the group and DB.DBA.RDF_REPL_GRAPH_DEL() that removes a graph from the group.

Only publicly readable graphs can be replicated, an error is signalled otherwise, and it is better to know about a security issue as early as possible.

The DB.DBA.RDF_REPL_START() function starts the RDF replication at the publishing side. It creates replication "publication" named '__rdf_repl' and makes a log file '__rdf_repl.log' to record changes in replicated graphs. If the replication has been started before then an error is signalled; passing value 1 for parameter "quiet" elimintaes the error so the incorrect call has no effect at all. If the replication is enabled then the value of registry variable 'DB.DBA.RDF_REPL' indicates the moment of replication start.

The DB.DBA.RDF_REPL_START() function performs a security check before starting the replication to check.

The DB.DBA.RDF_REPL_STOP() stops the RDF replication at the publishing side. It calls repl_unpublish() but does not make empty reates replication "publication" named '__rdf_repl' and makes a log file '__rdf_repl.log' to record changes in replicated graphs.

Replication is asynchronous and the order of insertion and removal operations at the subscriber's side may not match the order at the publisher. As a result, it is not recommended to make few subscriptions that writes changes of few publishers into one common graph. A client-side application can force the synchronuzation by calling DB.DBA.RDF_REPL_SYNC() that acts like repl_sync() but for an RDF subscription. DB.DBA.RDF_REPL_SYNC() will not only initial synchronisation but also wait for the end of subscription to guarantee that the total effect of INSERT and DELETE operations is correct even if these operations were made in an order that differs from the original one.