<?xml version="1.0" encoding="ISO-8859-1" ?>
<rss version="2.0">
 <channel>
  <title>Virtuoso Cluster Programming</title>
  <link>http://docs.openlinksw.com/virtuoso/clusterprogramming.html</link>
  <description>OpenLink Virtuoso Universal Server: Documentation</description>
  <managingEditor>virtuoso.docs@openlinksw.com</managingEditor>
  <pubDate>Mon, 16 Nov 2009 14:26:59 GMT</pubDate>
  <generator>OpenLink Software Documentation Team</generator>
  <webMaster>webmaster@openlinksw.com</webMaster>
  <image>
    <title>OpenLink Virtuoso Universal Server: Documentation</title>
    <url>http://docs.openlinksw.com/virtuoso/../images/misc/logo.jpg</url>
    <link>http://docs.openlinksw.com/virtuoso/clusterprogramming.html</link>
    <description>OpenLink Virtuoso Universal Server: Documentation</description>
  </image>
  <item>
    <guid>http://docs.openlinksw.com/virtuoso/clusterprogrammingsqlexmod.html</guid>
    <author>virtuoso.docs@openlinksw.com</author>
    <category>Cluster SQL Execution Model</category>
    <link>http://docs.openlinksw.com/virtuoso/clusterprogrammingsqlexmod.html</link>
    <pubDate>Mon, 16 Nov 2009 14:26:59 GMT</pubDate>
    <title>Cluster SQL Execution Model</title>
    <description>This section explains the basics of how SQL queries work on clustered Virtuoso.

Query optimization for cluster is similar to query optimization for a single process.
The main issues of optimization have too do with join order, index choice and join type.
    

Still, the performance characteristics of a distributed memory cluster are radically
different from a single process database. Namely, the cost of a network round trip between nodes,
even if these were only different processes on a shared memory multiprocessor, is between 5 and
50 single row lookups from a big table, supposing the row being sought for is in memory. The 5x
factor applies when within the same machine, the 50 times factor applies over 1Gbit ethernet.
    

</description>
  </item>
  <item>
    <guid>http://docs.openlinksw.com/virtuoso/clusterprogrammingseqidenreg.html</guid>
    <author>virtuoso.docs@openlinksw.com</author>
    <category>Sequences, Identity and Registry</category>
    <link>http://docs.openlinksw.com/virtuoso/clusterprogrammingseqidenreg.html</link>
    <pubDate>Mon, 16 Nov 2009 14:26:59 GMT</pubDate>
    <title>Sequences, Identity and Registry</title>
    <description>Sequences and identity columns have a cluster-wide scope. Thus, an identity column can be used
as a primary key and partitioning column and the system guarantees that there will be no duplicates.
    

Sequence numbers are signed 64 bit integers.

The sequence numbers are locally ascending on each node. When a cluster node first requests
a sequence number, it is assigned a block of numbers from which it will assign subsequent numbers.
Thus, two nodes will allocate from different ranges. The global order is not necessarily ascending
but numbers stay unique.
    

</description>
  </item>
  <item>
    <guid>http://docs.openlinksw.com/virtuoso/clusterprogrammingsqlopt.html</guid>
    <author>virtuoso.docs@openlinksw.com</author>
    <category>SQL Options</category>
    <link>http://docs.openlinksw.com/virtuoso/clusterprogrammingsqlopt.html</link>
    <pubDate>Mon, 16 Nov 2009 14:26:59 GMT</pubDate>
    <title>SQL Options</title>
    <description>For purposes of debugging or writing stored procedures that are specifically meant to
work with local data only, it is useful to disable cluster functionality.
    

This is done with the NO CLUSTER table option. This can be used in the table option
clause of a table in a FROM or in an update or delete.
    

Specially when writing procedures to be called with DAQ, see below, it us necessary to
ensure that the procedures will not access data outside of the host running them.
    

</description>
  </item>
  <item>
    <guid>http://docs.openlinksw.com/virtuoso/clusterprogrammingcallproc.html</guid>
    <author>virtuoso.docs@openlinksw.com</author>
    <category>Calling Procedures in Cluster</category>
    <link>http://docs.openlinksw.com/virtuoso/clusterprogrammingcallproc.html</link>
    <pubDate>Mon, 16 Nov 2009 14:26:59 GMT</pubDate>
    <title>Calling Procedures in Cluster</title>
    <description>Normally, all interprocess communication in the cluster is transparent.
In special cases, the developer may wish to execute a given procedure on a given host
of the cluster. This is typically the case when there is affinity between data and logic.
    

A regular stored procedure or trigger is executed on the host where it is invoked.
With the distributed async queue (DAQ) system one can execute procedures on specified remote hosts.
    

Procedures invoked over DAQ are restricted to dealing with data that is held on the host
where they execute. Generic procedures or triggers may use any data from anywhere.
    

</description>
  </item>
  <item>
    <guid>http://docs.openlinksw.com/virtuoso/clusterprogrammingpartfunc.html</guid>
    <author>virtuoso.docs@openlinksw.com</author>
    <category>Partition Functions</category>
    <link>http://docs.openlinksw.com/virtuoso/clusterprogrammingpartfunc.html</link>
    <pubDate>Mon, 16 Nov 2009 14:26:59 GMT</pubDate>
    <title>Partition Functions</title>
    <description>Given a key and a set of values, the partition function can determine which cluster
nodes hold the value.
    

The table name is a case sensitive full name of a table as it appears in SYS_KEYS.
The key_name is the case sensitive name of the index. The values are key part values in the
index order. The is_update, if non-zero, specifies that if the value is stored in multiple
places, all are to be returned, otherwise just one is picked at random, preferring the local
if there is a local copy of the partition.
    

The value is a list of node numbers, corresponding to the Host&lt;n&gt; entries in the cluster.ini file.
    

</description>
  </item>
  <item>
    <guid>http://docs.openlinksw.com/virtuoso/clusterprogrammingdpipe.html</guid>
    <author>virtuoso.docs@openlinksw.com</author>
    <category>Distributed Pipe</category>
    <link>http://docs.openlinksw.com/virtuoso/clusterprogrammingdpipe.html</link>
    <pubDate>Mon, 16 Nov 2009 14:26:59 GMT</pubDate>
    <title>Distributed Pipe</title>
    <description>A distributed pipe is a single construct that can be used for map-reduce and stream
transformation. It is a further development of the DAQ.
    

A dpipe is an object which accepts a series of input rows and generates an equal amount
of output rows. It may or may not preserve order and it may or may not be transactional. The input
row of a dpipe is a tuple of values. To each element of the tuple corresponds a transformation. The
transformation is expressed as a partitioned SQL function, basically a function callable by daq_call,
with arguments specifying the partition where it is to be run. The output row is formed by gathering
together the transformation results of each element of the input tuple.
    

Conceptually, this is like a map operation, like running several DAQ&#39;s, one for each column
of the dpipe. A transformation function does not always need to produce a value. It may also produce
a second partitioned function call with new arguments which will be partitioned and scheduled by the
dpipe. Since the second function is independently partitioned, this may be used for implementing a
reduce phase. This phase may then return a value and/or further functions to be called.
    

</description>
  </item>
  <item>
    <guid>http://docs.openlinksw.com/virtuoso/clusterprogrammingclandrdf.html</guid>
    <author>virtuoso.docs@openlinksw.com</author>
    <category>Cluster and RDF</category>
    <link>http://docs.openlinksw.com/virtuoso/clusterprogrammingclandrdf.html</link>
    <pubDate>Mon, 16 Nov 2009 14:26:59 GMT</pubDate>
    <title>Cluster and RDF</title>
    <description>The RDF tables are partitioned by default on any fresh clustered database. Thus RDF
operations are not affected by clustering.
    

For RDF loading, use the single-threaded load functions DB.DBA.RDF_LOAD_RDFXML and DB.DBA.TTLP.
These should essentially always be run in row autocommit mode and without logging. Thus do
log_enable (2) on the connection before invoking these functions.
    

Running these functions in the default transactional mode will load within the current
transaction. This will cause widespread locking and will run out of rollback space after some
millions of triples. This has a strict transactional semantic but is not generally relevant in RDF
applications.
    

</description>
  </item>
  <item>
    <guid>http://docs.openlinksw.com/virtuoso/clusterprogrammingvirtdbandrepl.html</guid>
    <author>virtuoso.docs@openlinksw.com</author>
    <category>Cluster, Virtual Database and Replication</category>
    <link>http://docs.openlinksw.com/virtuoso/clusterprogrammingvirtdbandrepl.html</link>
    <pubDate>Mon, 16 Nov 2009 14:26:59 GMT</pubDate>
    <title>Cluster, Virtual Database and Replication</title>
    <description>Clustering has no relation to any virtual database, transactional or snapshot replication
mechanism on Virtuoso.
    

Transactional replication is not supported with clustering. Snapshot replication will work.

Virtual database operations work identically with single process Virtuoso databases.
All operations on remote tables are done by the cluster node running the SQL statement. For purposes
of symmetry, it is desirable to have all the remote data sources defined for all server processes so
that they can be used interchangeably.
    

</description>
  </item>
  <item>
    <guid>http://docs.openlinksw.com/virtuoso/clusterprogramminglimalpha.html</guid>
    <author>virtuoso.docs@openlinksw.com</author>
    <category>Limitations of Alpha 6.0</category>
    <link>http://docs.openlinksw.com/virtuoso/clusterprogramminglimalpha.html</link>
    <pubDate>Mon, 16 Nov 2009 14:26:59 GMT</pubDate>
    <title>Limitations of Alpha 6.0</title>
    <description>DDL

SQL

</description>
  </item>
  <item>
    <guid>http://docs.openlinksw.com/virtuoso/clusterprogrammingtrbsht.html</guid>
    <author>virtuoso.docs@openlinksw.com</author>
    <category>Troubleshooting</category>
    <link>http://docs.openlinksw.com/virtuoso/clusterprogrammingtrbsht.html</link>
    <pubDate>Mon, 16 Nov 2009 14:26:59 GMT</pubDate>
    <title>Troubleshooting</title>
    <description>If an operation seems to hang, see the output of status ().

Check for the presence of the following conditions:

</description>
  </item>
 </channel>
</rss>
