16.16.RDF Replication
Tables of RDF storage, such as DB.DBA.RDF_QUAD and DB.DBA.RDF_OBJ, can not be replicated in a usual way, because it's content is cached in memory in special ways and synchronized with values outside these tables, such as current values of special sequence objects.
Moreover, same IRI may have different internal IRI_IDs on different boxes, because the assigned IDs vary if new IRIs appear in data in different order. Similarly, there will be different IDs of RDF literal, datatypes and languages, blocking any attempt of one-to-one replication between RDF storages.
However, a special asynchronous RDF replication makes it possible to configure a "publisher" Virtuoso instance to keep the log of changes in some RDF graphs and subscribe some Virtuoso instances to replay all these changes.
Configuration functions are quite straightforward.
RDF graphs to replicate are all members of
<http://www.openlinksw.com/schemas/virtrdf#rdf_repl_graph_group>
graph group. That group can be filled in with graphs like any other
graph group, but it is better to get the advantage of proper
security check made by DB.DBA.RDF_REPL_GRAPH_INS()
that inserts a graph to the group and DB.DBA.RDF_REPL_GRAPH_DEL()
that removes a graph from the group.
Only publicly readable graphs can be replicated, an error is signalled otherwise, and it is better to know about a security issue as early as possible.
The DB.DBA.RDF_REPL_START()
function starts the
RDF replication at the publishing side. It creates replication
"publication" named '__rdf_repl' and makes a log file
'__rdf_repl.log' to record changes in replicated graphs. If the
replication has been started before then an error is signalled;
passing value 1 for parameter "quiet" elimintaes the error so the
incorrect call has no effect at all. If the replication is enabled
then the value of registry variable 'DB.DBA.RDF_REPL' indicates the
moment of replication start.
The DB.DBA.RDF_REPL_START()
function performs a
security check before starting the replication to check.
The DB.DBA.RDF_REPL_STOP()
stops the RDF
replication at the publishing side. It calls repl_unpublish()
but does not make empty
reates replication "publication" named '__rdf_repl' and makes a log
file '__rdf_repl.log' to record changes in replicated graphs.
Replication is asynchronous and the order of insertion and
removal operations at the subscriber's side may not match the order
at the publisher. As a result, it is not recommended to make few
subscriptions that writes changes of few publishers into one common
graph. A client-side application can force the synchronuzation by
calling DB.DBA.RDF_REPL_SYNC()
that acts like
repl_sync()
but for
an RDF subscription. DB.DBA.RDF_REPL_SYNC()
will not only initial
synchronisation but also wait for the end of subscription to
guarantee that the total effect of INSERT and DELETE operations is
correct even if these operations were made in an order that differs
from the original one.