www.openlinksw.com
docs.openlinksw.com

Book Home

Contents
Preface

RDF Database and SPARQL

Overview
Data Representation
RDF and SPARQL API and SQL
SPARUL -- an Update Language For RDF Graphs
RDF Insert Methods in Virtuoso
Virtuoso Sponger
Dereferencable IRIs and RDF Linked Data
IRI Dereferencing For FROM Clauses, "define get:..." Pragmas IRI Dereferencing For Variables, "define input:grab-..." Pragmas Examples of other Web Resolvers
RDF Views -- Mapping Relational Data to RDF
RDF Inference in Virtuoso
Using Full Text Search in SPARQL
Virtuoso SPARQL Query Service
Business Intelligence Extensions for SPARQL
Debugging SPARQL queries
Virtuoso RDF Performance Tuning
RDF Store Benchmarks
SPARQL Implementation Details
Native RDF Storage Providers

15.7. Dereferencable IRIs and RDF Linked Data

There are many cases when RDF data should be retrieved from remote sources only when really needed. E.g., a scheduling application may read personal calendars from personal sites of its users. Calendar data expire quickly, so there's no reason to frequently re-load them in hope that they are queried before expired.

Virtuoso extends SPARQL so it is possible to download RDF resource from a given IRI, parse them and store the resulting triples in a graph, all three operations will be performed during the SPARQL query execution. The IRI of graph to store triples is usually equal to the IRI where the resource is download from, so the feature is named "IRI dereferencing" There are two different use cases for this feature. In simple case, a SPARQL query contains from clauses that enumerate graphs to process, but there are no triples in DB.DBA.RDF_QUAD that correspond to some of these graphs. The query execution starts with dereferencing of these graphs and the rest runs as usual. In more sophisticated case, the query is executed many times in a loop. Every execution produces a partial result. SPARQL processor checks for IRIs in the result such that resources with that IRIs may contain relevant data but not yet loaded into the DB.DBA.RDF_QUAD. After some iteration, the partial result is identical to the result of the previous iteration, because there's no more data to retrieve. As the last step, SPARQL processor builds the final result set.

15.7.1. IRI Dereferencing For FROM Clauses, "define get:..." Pragmas

Virtuoso extends SPARQL syntax of from and from named clauses. It allows additional list of options at end of clause: option ( param1 value1, param2 value2, ... ) where parameter names are QNames that start with get: prefix and values are "precode" expressions, i.e. expressions that does not contain variables other than external parameters. Names of allowed parameters are listed below.


15.7.2. IRI Dereferencing For Variables, "define input:grab-..." Pragmas

Consider a set of personal data such that one resource can list many persons and point to resources where that persons are described in more details. E.g. resource about user1 describes the user and also contain statements that user2 and user3 are persons and more data can be found in user2.ttl and user3.ttl, user3.ttl can contain statements that user4 is also person and more data can be found in user4.ttl and so on. The query should find as many users as it is possible and return their names and e-mails.

If all data about all users were loaded into the database, the query could be quite simple:

SQL>sparql
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?id ?firstname ?nick
where
  {
    graph ?g
      {
        ?id rdf:type foaf:Person.
        ?id foaf:firstName ?firstname.
        ?id foaf:knows ?fn .
        ?fn foaf:nick ?nick.
      }
   }
limit 10;

id                                                      firstname  nick
VARCHAR                                                 VARCHAR    VARCHAR
_______________________________________________________________________________

http://myopenlink.net/dataspace/person/pmitchell#this   LaRenda    sdmonroe
http://myopenlink.net/dataspace/person/pmitchell#this   LaRenda    kidehen{at}openlinksw.com
http://myopenlink.net/dataspace/person/pmitchell#this   LaRenda    alexmidd
http://myopenlink.net/dataspace/person/abm#this         Alan       kidehen{at}openlinksw.com
http://myopenlink.net/dataspace/person/igods#this       Cameron    kidehen{at}openlinksw.com
http://myopenlink.net/dataspace/person/goern#this       Christoph  captsolo
http://myopenlink.net/dataspace/person/dangrig#this     Dan        rickbruner
http://myopenlink.net/dataspace/person/dangrig#this     Dan        sdmonroe
http://myopenlink.net/dataspace/person/dangrig#this     Dan        lszczepa
http://myopenlink.net/dataspace/person/dangrig#this     Dan        kidehen

10 Rows. -- 80 msec.

It is possible to enable IRI dereferencing in such a way that all appropriate resources are loaded during the query execution even if names of some of them are not known a priori.

SQL>sparql
  define input:grab-var "?more"
  define input:grab-depth 10
  define input:grab-limit 100
  define input:grab-base "http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1300"
  prefix foaf: <http://xmlns.com/foaf/0.1/>
  prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
  prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?id ?firstname ?nick
where {
    graph ?g {
               ?id rdf:type foaf:Person.
               ?id foaf:firstName ?firstname.
               ?id foaf:knows ?fn .
               ?fn foaf:nick ?nick.
               optional { ?id rdfs:SeeAlso ?more }
            }
}
limit 10;

id                                                         firstname  nick
VARCHAR                                                    VARCHAR    VARCHAR
_______________________________________________________________________________

http://myopenlink.net/dataspace/person/ghard#this          Yrj+?n+?   kidehen
http://inamidst.com/sbp/foaf#Sean                          Sean       d8uv
http://myopenlink.net/dataspace/person/dangrig#this        Dan        rickbruner
http://myopenlink.net/dataspace/person/dangrig#this        Dan        sdmonroe
http://myopenlink.net/dataspace/person/dangrig#this        Dan        lszczepa
http://myopenlink.net/dataspace/person/dangrig#this        Dan        kidehen
http://captsolo.net/semweb/foaf-captsolo.rdf#Uldis_Bojars  Uldis      mortenf
http://captsolo.net/semweb/foaf-captsolo.rdf#Uldis_Bojars  Uldis      danja
http://captsolo.net/semweb/foaf-captsolo.rdf#Uldis_Bojars  Uldis      zool
http://myopenlink.net/dataspace/person/rickbruner#this     Rick       dangrig

10 Rows. -- 530 msec.

The IRI dereferencing is controlled by the following pragmas:

Default resolver procedure is DB.DBA.RDF_GRAB_RESOLVER_DEFAULT(). Note that the function produce two absolute URIs, abs_uri and dest_uri. Default procedure returns two equal strings, but other may return different values, e.g., return primary and permanent location of the resource as dest_uri and the fastest known mirror location as abs_uri thus saving HTTP retrieval time. It can even signal an error to block the downloading of some unwanted resource.

DB.DBA.RDF_GRAB_RESOLVER_DEFAULT (
  in base varchar,         -- base IRI as specified by input:grab-base pragma
  in rel_uri varchar,      -- IRI of the resource as it is specified by input:grab-iri or a value of a variable
  out abs_uri varchar,     -- the absolute IRI that should be downloaded
  out dest_uri varchar,    -- the graph IRI where triples should be stored after download
  out get_method varchar ) -- the HTTP method to use, should be "GET" or "MGET".

15.7.3. Examples of other Web Resolvers

Example of LSIDs: A scientific name from UBio

SQL>sparql
define get:soft "soft"
select *
from <urn:lsid:ubio.org:namebank:11815>
where { ?s ?p ?o }
limit 5;

s                                 p                                           o
VARCHAR                           VARCHAR                                     VARCHAR
_______________________________________________________________________________

urn:lsid:ubio.org:namebank:11815  http://purl.org/dc/elements/1.1/title       Pternistis leucoscepus
urn:lsid:ubio.org:namebank:11815  http://purl.org/dc/elements/1.1/subject     Pternistis leucoscepus (Gray, GR) 1867
urn:lsid:ubio.org:namebank:11815  http://purl.org/dc/elements/1.1/identifier  urn:lsid:ubio.org:namebank:11815
urn:lsid:ubio.org:namebank:11815  http://purl.org/dc/elements/1.1/creator     http://www.ubio.org
urn:lsid:ubio.org:namebank:11815  http://purl.org/dc/elements/1.1/type        Scientific Name

5 Rows. -- 741 msec.

Example of LSIDs: A segment of the human genome from GDB

SQL>sparql
define get:soft "soft"
select *
from <urn:lsid:gdb.org:GenomicSegment:GDB132938>
where { ?s ?p ?o }
limit 5;

s                                          p                                                     o
VARCHAR                                    VARCHAR                                               VARCHAR
_______________________________________________________________________________

urn:lsid:gdb.org:GenomicSegment:GDB132938  urn:lsid:gdb.org:DBObject-predicates:accessionID      GDB:132938
urn:lsid:gdb.org:GenomicSegment:GDB132938  http://www.ibm.com/LSID/2004/RDF/#lsidLink            urn:lsid:gdb.org:DBObject:GDB132938
urn:lsid:gdb.org:GenomicSegment:GDB132938  urn:lsid:gdb.org:DBObject-predicates:objectClass      DBObject
urn:lsid:gdb.org:GenomicSegment:GDB132938  urn:lsid:gdb.org:DBObject-predicates:displayName      D20S95
urn:lsid:gdb.org:GenomicSegment:GDB132938  urn:lsid:gdb.org:GenomicSegment-predicates:variantsQ  nodeID://1000027961

5 Rows. -- 822 msec.

Example of OAI: an institutional / departmental repository.

SQL>sparql
define get:soft "soft"
select *
from <oai:etheses.bham.ac.uk:23>
where { ?s ?p ?o }
limit 5;

s                           p                                           o
VARCHAR                     VARCHAR                                     VARCHAR
_____________________________________________________________________________

oai:etheses.bham.ac.uk:23   http://purl.org/dc/elements/1.1/title       A study of the role of ATM mutations in the pathogenesis of B-cell chronic lymphocytic leukaemia
oai:etheses.bham.ac.uk:23   http://purl.org/dc/elements/1.1/date        2007-07
oai:etheses.bham.ac.uk:23   http://purl.org/dc/elements/1.1/subject     RC0254 Neoplasms. Tumors. Oncology (including Cancer)
oai:etheses.bham.ac.uk:23   http://purl.org/dc/elements/1.1/identifier  Austen, Belinda (2007) A study of the role of ATM mutations in the pathogenesis of B-cell chronic lymphocytic leukaemia. Ph.D. thesis, University of Birmingham.
oai:etheses.bham.ac.uk:23   http://purl.org/dc/elements/1.1/identifier  http://etheses.bham.ac.uk/23/1/Austen07PhD.pdf

5 Rows. -- 461 msec.

Example of DOI

In order to execute correctly queries with doi resolver you need to have:

SQL>sparql
define get:soft "soft"
select *
from <doi:10.1045/march99-bunker>
where { ?s ?p ?o } ;

s                                                      p                                                 o
VARCHAR                                                VARCHAR                                           VARCHAR
_______________________________________________________________________________

http://www.dlib.org/dlib/march99/bunker/03bunker.html  http://www.w3.org/1999/02/22-rdf-syntax-ns#type   http://www.openlinksw.com/schemas/XHTML#
http://www.dlib.org/dlib/march99/bunker/03bunker.html  http://www.openlinksw.com/schemas/XHTML#title     Collaboration as a Key to Digital Library Development: High Performance Image Management at the University of Washington

2 Rows. -- 12388 msec.

Other examples

SQL>sparql
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX doap: <http://usefulinc.com/ns/doap#>
SELECT DISTINCT ?name ?mbox ?projectName
WHERE {
 <http://dig.csail.mit.edu/2005/ajar/ajaw/data#Tabulator>
doap:developer ?dev .
 ?dev foaf:name ?name .
 OPTIONAL { ?dev foaf:mbox ?mbox }
 OPTIONAL { ?dev doap:project ?proj .
            ?proj foaf:name ?projectName }
};

name          mbox              projectName
VARCHAR       VARCHAR           VARCHAR
_______________________________________________________________________________

Adam Lerer    NULL              NULL
Dan Connolly  NULL              NULL
David Li      NULL              NULL
David Sheets  NULL              NULL
James Hollenbach  NULL          NULL
Joe Presbrey  NULL              NULL
Kenny Lu      NULL              NULL
Lydia Chilton NULL              NULL
Ruth Dhanaraj NULL              NULL
Sonia Nijhawan    NULL          NULL
Tim Berners-Lee   NULL          NULL
Timothy Berners-Lee   NULL      NULL
Yuhsin Joyce Chen         NULL NULL

13 Rows. -- 491 msec.
SQL>sparql
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?friendsname ?friendshomepage ?foafsname ?foafshomepage
WHERE
 {
  <http://myopenlink.net/dataspace/person/kidehen#this> foaf:knows ?friend .
  ?friend foaf:mbox_sha1sum ?mbox .
  ?friendsURI foaf:mbox_sha1sum ?mbox .
  ?friendsURI foaf:name ?friendsname .
  ?friendsURI foaf:homepage ?friendshomepage .
  OPTIONAL { ?friendsURI foaf:knows ?foaf .
              ?foaf foaf:name ?foafsname .
              ?foaf foaf:homepage ?foafshomepage .
           }
 };

friendsname       friendshomepage                                       foafsname       foafshomepage
VARCHAR           VARCHAR                                               VARCHAR         VARCHAR
_______________________________________________________________________________

Tim Berners-Lee   http://www.w3.org/People/Berners-Lee/                 Dan Connolly    http://www.w3.org/People/Connolly/
Tim Berners-Lee   http://www.w3.org/People/Berners-Lee/                 Eric Miller     http://purl.org/net/eric/
Dave Beckett      http://www.dajobe.org/                                NULL            NULL
Richard Cyganiak  http://richard.cyganiak.de/                           Dan Connolly    http://www.w3.org/People/Connolly/

...
73 Rows. -- 1452 msec.
SQL>