10.7. Cluster and RDF

The RDF tables are partitioned by default on any fresh clustered database. Thus RDF operations are not affected by clustering.

For RDF loading, use the single-threaded load functions DB.DBA.RDF_LOAD_RDFXML and DB.DBA.TTLP. These should essentially always be run in row autocommit mode and without logging. Thus do log_enable (2) on the connection before invoking these functions.

Running these functions in the default transactional mode will load within the current transaction. This will cause widespread locking and will run out of rollback space after some millions of triples. This has a strict transactional semantic but is not generally relevant in RDF applications.

Integrity between all tables and indices is guaranteed after loading the file completes, also in non-transactional mode. After all loading is complete, do a single explicit checkpoint with cl_exec ('checkpoint');

This will guarantee that the disk based image is complete. Automatic checkpoints during non-transactional file loads may have half-files and possibly partial triples in the checkpointed state.

For all SPARUL operations, row autocommit mode is likewise recommended.

Logging is not needed if one makes a manual global checkpoint after any bulk import or update operations. Logging will be useful if one has a continuous feed of smaller files, even if transactional semantics were not needed.

For best import speed, run one or two parallel streams of load commands on each cluster node. Split the data to be loaded into approximately equal chunks and load each with a call to DB.DBA.RDF_LOAD_RDFXML or DB.DBA.TTLP. There is no point in using the _MT variants of these functions on a cluster.

A single load will process about 10000 triples with only about 5 cluster round trips. Still, more of the work is done by the node doing the parsing than by other nodes. To get best use of total throughput, divide the load commands over the cluster nodes. Lock contention will be minimal if the loads are in row autocommit mode. If they are transactional, deadlocks are quite probable due to indeterminate locking order and large transaction size. As a general rule, do not mix transactions and RDF.

Prefix	IRI
n3	http://docs.openlinksw.com/virtuoso/clusterprogrammingclandrdf/
schema	http://schema.org/
n5	http://creativecommons.org/licenses/by/4.0/
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
n4	http://www.openlinksw.com/#
xsdh	http://www.w3.org/2001/XMLSchema#

Prefix	URI
xmlns:n3	http://docs.openlinksw.com/virtuoso/clusterprogrammingclandrdf/
xmlns:schema	http://schema.org/
xmlns:n5	http://creativecommons.org/licenses/by/4.0/
xmlns:rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
xmlns:n4	http://www.openlinksw.com/#
xmlns:xsdh	http://www.w3.org/2001/XMLSchema#

Prefix

URI

xmlns:n3

http://docs.openlinksw.com/virtuoso/clusterprogrammingclandrdf/

xmlns:schema

http://schema.org/

xmlns:n5

http://creativecommons.org/licenses/by/4.0/

xmlns:rdf

http://www.w3.org/1999/02/22-rdf-syntax-ns#

xmlns:n4

http://www.openlinksw.com/#

xmlns:xsdh

http://www.w3.org/2001/XMLSchema#

Subject	Predicate	Object
n3:	rdf:type	schema:TechArticle
n3:	schema:name	10.7.ÃÂ Cluster and RDF
n3:	schema:copyrightHolder	_:vb78714
n3:	schema:datePublished	2016-09-09 16:16:54
n3:	schema:headline	10.7.ÃÂ Cluster and RDF
n3:	schema:keywords	OpenLink,Virtuoso,database,RDBMS,relational,SQL,RDF,triple store,linked data,linked open data,Big Data
n3:	schema:license	n5:deed.en_US
n3:	schema:publisher	_:vb78713
n3:	schema:url	n3:
_:vb78713	rdf:type	schema:Organization
_:vb78713	schema:name	OpenLink Software
_:vb78713	schema:url	n4:this
_:vb78714	rdf:type	schema:Organization
_:vb78714	schema:name	OpenLink Software
_:vb78714	schema:url	n4:this

Subject

Predicate

Object

n3:

rdf:type

schema:TechArticle

n3:

schema:name

10.7.ÃÂ Cluster and RDF

n3:

schema:copyrightHolder

_:vb78714

n3:

schema:datePublished

2016-09-09 16:16:54

n3:

schema:headline

10.7.ÃÂ Cluster and RDF

n3:

schema:keywords

OpenLink,Virtuoso,database,RDBMS,relational,SQL,RDF,triple store,linked data,linked open data,Big Data

n3:

schema:license

n5:deed.en_US

n3:

schema:publisher

_:vb78713

n3:

schema:url

n3:

_:vb78713

rdf:type

schema:Organization

_:vb78713

schema:name

OpenLink Software

_:vb78713

schema:url

n4:this

_:vb78714

rdf:type

schema:Organization

_:vb78714

schema:name

OpenLink Software

_:vb78714

schema:url

n4:this

Prev	Up	Next
10.6. Distributed Pipe	Home	10.8. Cluster, Virtual Database and Replication

10.7. Cluster and RDF

Namespace Prefixes

Statements

Namespace Prefixes

Statements