### 16.17.16. Fast Approximate RDF Graph Diff and Patch

Two algorithms described below resemble "unified diff" and "patch by unified diff" but they work on RDF graphs, not on plain texts.

They work reasonably for graphs composed from CBDs (concise bounded descriptions) of some subjects, if these subjects are either "named" IRIs or can be identified by values of their inverse functional properties.

Many sorts of commonly used graphs match these restrictions, including all graphs without blank nodes, most of FOAF files, graphs that can be "pretty-printed" in JSON, most of dumps of relational databases etc.

The basic idea is as simple as zipper:

• Place one graph at the left and one to the right,

• Find a retainer box at the right and a matching pin at the left,

• Join them

• Pull the slider as long as possible.

• Repeat this while there are pins and boxes that can be matched and sliders that can be moved.

An IRI in left graph `(say, G1)` matches to same IRI in right graph `(G2)` as pin to box. The same is true for literals too.

Functional and inverse functional properties are teeth that form chains, algorithm "moves sliders" along these chains, incrementally connecting more and more nodes.

If there is a match of this sort `(O1 in G1 matches O2 in G2)` and the matched nodes are values of same inverse functional property `P` (there are `{ S1 P O1 }` in `G1` and `{ S2 P O2 }` in `G2` ) then we guess that `S1` matches `S2` .

If `S1` in `G1` matches `S2` in `G2` and the matched nodes are subjects of same functional property `P` ( there are `{ S1 P N1 }` in `G1` and `{ S2 P N2 }` in `G2` ) then we guess that `N1` matches `N2` , now it's possible to try same interaction on triples where `N1` and `N2` are in subject position, that's how slides move. A typical example of a long zipper is closed list with matched heads.

#### Collect Functional And Inverse Functional Properties

Lists of functional properties can be retrieved from an ontology graph by query like:

```SPARQL define output:valmode "LONG"
SELECT (<LONG::sql:VECTOR_AGG(?s))
FROM <my-ontology-graph>
WHERE
{
?s a owl:functionalProperty
}
```

Inverse functional properties could be retrieved by a similar query, but unfortunately the ontology may mention so called NULL values that can be property values for many subjects. Current implementation of diff and patch does not recognize NULL values so they can cause patch with "false alarm" errors. The workaround is to retrieve only properties that have no NULL values declared:

```SPARQL define output:valmode "LONG"
SELECT (<LONG::sql:VECTOR_AGG(?s))
FROM <my-ontology-graph>
WHERE
{
?s a owl:inverseFunctionalProperty .
OPTIONAL { ?s owl:nullIFPValue ?v }
FILTER (!Bound(?v))
}
```

If no ontology is available then appropriate predicates can be obtained from sample graphs using DB.DBA.RDF_GRAPH_COLLECT_FP_LIST .

#### Implementation-Specific Extensions of GUO Ontology

Note : This section contains implementation details that are needed only if you want to write your own patch or diff procedure, you don't have to worry about internals if you want to use existing procedures.

Basic GUO ontology is not expressive enough to work with blank nodes, so some custom extensions \$ are needed.

In the rest of the description:

```@prefix guo: <http://webr3.org/owl/guo#>
```

is assumed.

The diff contains one node of `rdf:type guo:diff` .

For debugging purpose it has properties `guo:graph1` and `guo:graph2` that corespond to `gfrom` and `gto` arguments of DB.DBA.RDF_SUO_DIFF_TTL .

The diff also contains zero or more nodes of `rdf:type guo:UpdateInstruction` . These nodes are as described in basic GUO ontology, but `guo:target_graph` is now optional, `guo:target_subject` can be a blank node and objects of predicates "inside" values of `guo:insert` and `guo:delete` can also be blank nodes. These blank nodes are "placeholders" for values, calculated according to the most important GUO extension - rule nodes.

There are eight sorts of rule nodes, four for `gfrom` side of diff and four similar for `gto` side. Out of four sorts related to one side, two are for functional properties and two similar are for inverse functional properties. Thus `rdf:type-s` of these nodes are:

```guo:from-rule-FP0,
guo:from-rule-FP1,
guo:from-rule-IFP0,
guo:from-rule-IFP1
```

and

```guo:to-rule-FP0,
guo:to-rule-FP1,
guo:to-rule-IFP ,
guo:to-rule-IFP1 .
```

Each rule node has property `guo:order` that is an non-negative integer.

These integers enumerate all `guo:from-rule-` ... nodes, starting from zero.

When patch procedure works, these rules are used in this order, the result of each rule is a blank node that either exists in the graph or just created.

All results are remembered for use in the rest of the patch procedure.

Similarly, other sequence of these integers enumerate all `guo:to-rule-` ... nodes, also starting from zero.

Consider a sequence of `guo:from-rule-` ... nodes, because `guo:to-rule-` nodes have identical properties.

A rule node can have zero or more values of `guo:dep` property, each value is a bnode that is rule node that should be calculated before the current one.

Every rule has exactly one predicate `guo:path` that is a blank node. Each property of this blank node describes one possible "move of slider": predicate to follow is in predicate position and a node to start from is in object position. An IRI or a literal in object position is used as is, a blank node in object position should be of type `guo:from-rule-` ... and have smaller `guo:order` so it refers to already calculated result bnode of some preceding rule.

Rule of form:

```R a guo:from-rule-IFP1 ;
guo:path [ P1 O1 ; P2 O2 ; ... ; Pn On ] .
```

searches for a unique blank node `_:Rres` that is a common subject of triples:

``` _:Rres P1 O1
_:Rres P2 O2
. . .
_:Rres Pn On
```

in the gfrom graph.

If subjects differ in these triples or some triples are not found or the subject is not a blank node then an appropriate error is logged and rule fails, otherwise `_:Rres` is remembered as the result of the rule.

Similarly, rule of form:

```R a guo:from-rule-FP1 ;
guo:path [ P1 O1 ; P2 O2 ; ... ; Pn On ] .
```

searches for a unique blank node `_:Rres` that is a common object of triples:

``` O1 P1 _:Rres
O2 P2 _:Rres
. . .
On Pn _:Rres
```

in the gfrom graph.

Rule of form:

```R a guo:from-rule-IFP0 ;
guo:path [ P1 O1 ; P2 O2 ; ... ; Pn On ] .
```

ensures that the `gfrom` graph does not contain any triple like:

``` _:Rres P1 O1
_:Rres P2 O2
```

or

```_:Rres Pn On
```

It is an error if something exists. If nothing found then the result of the rule is newly created unique blank node. That's how patch procedure creates new blank nodes when it inserts "totally new" data.

Similarly, rule of form:

```R a guo:from-rule-IFP0 ;
guo:path [ P1 O1 ; P2 O2 ; ... ; Pn On ] .
```

ensures that the `gfrom` graph does not contain any triple like:

```O1 P1 _:Rres
O2 P2 _:Rres
```

or

``` On Pn _:Rres
```

Current version of patch procedure does not use rules `guo:to-rule-` ... , however they can be used by custom procedure of few sorts. First, these rules can be used to produce a "reversed diff". Next, these rules can be used to validate the result of the patch - if the patch can not be reverted then the result is "suspicious".