16.17.16. Fast Approximate RDF Graph Diff and Patch
Two algorithms described below resemble "unified diff" and "patch by unified diff" but they work on RDF graphs, not on plain texts.
They work reasonably for graphs composed from CBDs (concise bounded descriptions) of some subjects, if these subjects are either "named" IRIs or can be identified by values of their inverse functional properties.
Many sorts of commonly used graphs match these restrictions, including all graphs without blank nodes, most of FOAF files, graphs that can be "prettyprinted" in JSON, most of dumps of relational databases etc.
The basic idea is as simple as zipper:

Place one graph at the left and one to the right,

Find a retainer box at the right and a matching pin at the left,

Join them

Pull the slider as long as possible.

Repeat this while there are pins and boxes that can be matched and sliders that can be moved.
An IRI in left graph (say, G1)
matches
to same IRI in right graph (G2)
as pin to
box. The same is true for literals too.
Functional and inverse functional properties are teeth that form chains, algorithm "moves sliders" along these chains, incrementally connecting more and more nodes.
If there is a match of this sort (O1 in G1
matches O2 in G2)
and the matched nodes are values of same
inverse functional property P
(there are
{ S1 P O1 }
in G1
and { S2 P O2 }
in
G2
) then we guess that S1
matches S2
.
If S1
in G1
matches S2
in G2
and the matched nodes are subjects of same
functional property P
( there are
{ S1 P N1 }
in G1
and { S2 P N2 }
in
G2
) then we guess that N1
matches N2
, now it's
possible to try same interaction on triples where N1
and N2
are in subject
position, that's how slides move. A typical example of a long
zipper is closed list with matched heads.
Collect Functional And Inverse Functional Properties
Lists of functional properties can be retrieved from an ontology graph by query like:
SPARQL define output:valmode "LONG" SELECT (<LONG::sql:VECTOR_AGG(?s)) FROM <myontologygraph> WHERE { ?s a owl:functionalProperty }
Inverse functional properties could be retrieved by a similar query, but unfortunately the ontology may mention so called NULL values that can be property values for many subjects. Current implementation of diff and patch does not recognize NULL values so they can cause patch with "false alarm" errors. The workaround is to retrieve only properties that have no NULL values declared:
SPARQL define output:valmode "LONG" SELECT (<LONG::sql:VECTOR_AGG(?s)) FROM <myontologygraph> WHERE { ?s a owl:inverseFunctionalProperty . OPTIONAL { ?s owl:nullIFPValue ?v } FILTER (!Bound(?v)) }
If no ontology is available then appropriate predicates can be obtained from sample graphs using DB.DBA.RDF_GRAPH_COLLECT_FP_LIST .
ImplementationSpecific Extensions of GUO Ontology
Note : This section contains implementation details that are needed only if you want to write your own patch or diff procedure, you don't have to worry about internals if you want to use existing procedures.
Basic GUO ontology is not expressive enough to work with blank nodes, so some custom extensions $ are needed.
In the rest of the description:
@prefix guo: <http://webr3.org/owl/guo#>
is assumed.
The diff contains one node of rdf:type
guo:diff
.
For debugging purpose it has properties guo:graph1
and guo:graph2
that corespond to gfrom
and gto
arguments of DB.DBA.RDF_SUO_DIFF_TTL .
The diff also contains zero or more nodes of rdf:type guo:UpdateInstruction
. These nodes are as
described in basic GUO ontology, but guo:target_graph
is now optional, guo:target_subject
can be a blank node and objects of
predicates "inside" values of guo:insert
and guo:delete
can also be blank nodes.
These blank nodes are "placeholders" for values, calculated
according to the most important GUO extension  rule nodes.
There are eight sorts of rule nodes, four for gfrom
side of diff and four similar for gto
side. Out of four sorts related to one side, two
are for functional properties and two similar are for inverse
functional properties. Thus rdf:types
of
these nodes are:
guo:fromruleFP0, guo:fromruleFP1, guo:fromruleIFP0, guo:fromruleIFP1
and
guo:toruleFP0, guo:toruleFP1, guo:toruleIFP , guo:toruleIFP1 .
Each rule node has property guo:order
that is an nonnegative integer.
These integers enumerate all guo:fromrule
... nodes, starting from zero.
When patch procedure works, these rules are used in this order, the result of each rule is a blank node that either exists in the graph or just created.
All results are remembered for use in the rest of the patch procedure.
Similarly, other sequence of these integers enumerate all
guo:torule
... nodes, also starting
from zero.
Consider a sequence of guo:fromrule
... nodes, because guo:torule
nodes
have identical properties.
A rule node can have zero or more values of guo:dep
property, each value is a bnode that is rule
node that should be calculated before the current one.
Every rule has exactly one predicate guo:path
that is a blank node. Each property of this
blank node describes one possible "move of slider": predicate to
follow is in predicate position and a node to start from is in
object position. An IRI or a literal in object position is used as
is, a blank node in object position should be of type guo:fromrule
... and have smaller guo:order
so it refers to already calculated result
bnode of some preceding rule.
Rule of form:
R a guo:fromruleIFP1 ; guo:path [ P1 O1 ; P2 O2 ; ... ; Pn On ] .
searches for a unique blank node _:Rres
that is a common subject of triples:
_:Rres P1 O1 _:Rres P2 O2 . . . _:Rres Pn On
in the gfrom graph.
If subjects differ in these triples or some triples are not
found or the subject is not a blank node then an appropriate error
is logged and rule fails, otherwise _:Rres
is remembered as the result of the rule.
Similarly, rule of form:
R a guo:fromruleFP1 ; guo:path [ P1 O1 ; P2 O2 ; ... ; Pn On ] .
searches for a unique blank node _:Rres
that is a common object of triples:
O1 P1 _:Rres O2 P2 _:Rres . . . On Pn _:Rres
in the gfrom graph.
Rule of form:
R a guo:fromruleIFP0 ; guo:path [ P1 O1 ; P2 O2 ; ... ; Pn On ] .
ensures that the gfrom
graph does not
contain any triple like:
_:Rres P1 O1 _:Rres P2 O2
or
_:Rres Pn On
It is an error if something exists. If nothing found then the result of the rule is newly created unique blank node. That's how patch procedure creates new blank nodes when it inserts "totally new" data.
Similarly, rule of form:
R a guo:fromruleIFP0 ; guo:path [ P1 O1 ; P2 O2 ; ... ; Pn On ] .
ensures that the gfrom
graph does not
contain any triple like:
O1 P1 _:Rres O2 P2 _:Rres
or
On Pn _:Rres
Current version of patch procedure does not use rules
guo:torule
... , however they can be
used by custom procedure of few sorts. First, these rules can be
used to produce a "reversed diff". Next, these rules can be used to
validate the result of the patch  if the patch can not be reverted
then the result is "suspicious".