1.5.73. How to optimize bif:dateadd in SPARQL query using selective index-friendly filter?

What?

Index-friendly filter for Date range ( bif:dateadd ) optimization within SPARQL query.

Why?

Achieve fast results and better performance.

How?

Assume the following SPARQL query:

SELECT ?wiki,
       ?dbp,
       bif:datediff('second',  xsd:DateTime(?extracted), now()) AS ?secondsAgo
 FROM <http://nl.dbpedia.org>
WHERE
  {
    ?wiki foaf:primaryTopic ?dbp .
    ?dbp dcterms:modified ?extracted .
    FILTER ( bif:datediff('minute', now(), xsd:DateTime(?extracted)) <= 10 )
  }
ORDER BY DESC(?extracted)
LIMIT 30

Let's take a look at the calculation of:

FILTER ( bif:datediff('minute', now(),  xsd:DateTime(?extracted)) <= 10 ) .

For each "is modified" triple we:

Convert ?extracted to xsd:dateTime;
Calculate datediff;
Make a comparison and know whether we hit or miss 10 minutes interval

Written so, this will lead to a potentially long loop, because even if the optimizer will realize that the filter is selective, it can't discover why is it so selective.

Now let's change the filter to:

FILTER ( ?extracted > bif:dateadd('minute', -10, now())) .

now() can be calculated once at the very beginning of the query execution because it does not depend on any rows in a given table. Then bif:dateadd has all arguments known and thus the whole bif:dateadd('minute', -10, now()) can be calculated only once and produce some value. Therefor FILTER ( ?extracted > some_known_value ) can be represented as a single search step: look at index and get triples with known P, known G, O greater than the given one and any S. That's pretty fast and predictable step, good for both optimizer and the runtime.

We can rephrase the query to filter index-friendly:

SELECT ?wiki,
       ?dbp,
        bif:datediff('second',  xsd:DateTime(?extracted) ,
        now()) AS ?secondsAgo
 FROM <http://nl.dbpedia.org>
WHERE
  {
    ?wiki foaf:primaryTopic ?dbp .
    ?dbp dcterms:modified ?extracted .
     FILTER ( ?extracted > bif:dateadd('minute', -10, now()))
  }
ORDER BY DESC (?extracted)
LIMIT 30

In this case the presence or the absence of the order by does not matter too much, because the query is way more straightforward: selective index-friendly filter first, and the selection could be ordered naturally via hot index used by the filter.

Note also that if you know the datatype of an object literal, there's no need to write a cast like xsd:dateTime --- it can make an expression index-unfriendly even if it will always return the argument unchanged on your specific data.

Prefix	IRI
schema	http://schema.org/
n5	http://creativecommons.org/licenses/by/4.0/
n2	http://docs.openlinksw.com/virtuoso/sparqldatediffindexfriendly/
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
n4	http://www.openlinksw.com/#
xsdh	http://www.w3.org/2001/XMLSchema#

Prefix	URI
xmlns:schema	http://schema.org/
xmlns:n5	http://creativecommons.org/licenses/by/4.0/
xmlns:n2	http://docs.openlinksw.com/virtuoso/sparqldatediffindexfriendly/
xmlns:rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
xmlns:n4	http://www.openlinksw.com/#
xmlns:xsdh	http://www.w3.org/2001/XMLSchema#

Prefix

URI

xmlns:schema

http://schema.org/

xmlns:n5

http://creativecommons.org/licenses/by/4.0/

xmlns:n2

http://docs.openlinksw.com/virtuoso/sparqldatediffindexfriendly/

xmlns:rdf

http://www.w3.org/1999/02/22-rdf-syntax-ns#

xmlns:n4

http://www.openlinksw.com/#

xmlns:xsdh

http://www.w3.org/2001/XMLSchema#

Subject	Predicate	Object
n2:	rdf:type	schema:TechArticle
n2:	rdf:type	schema:APIReference
n2:	schema:name	1.5.73.ÃÂ How to optimize bif:dateadd in SPARQL query using selective index-friendly filter?
n2:	schema:copyrightHolder	_:vb81704
n2:	schema:datePublished	2016-09-09 16:16:54
n2:	schema:headline	1.5.73.ÃÂ How to optimize bif:dateadd in SPARQL query using selective index-friendly filter?
n2:	schema:keywords	OpenLink,Virtuoso,database,RDBMS,relational,SQL,RDF,triple store,linked data,linked open data,Big Data,SPARQL
n2:	schema:license	n5:deed.en_US
n2:	schema:publisher	_:vb81703
n2:	schema:url	n2:
_:vb81703	rdf:type	schema:Organization
_:vb81703	schema:name	OpenLink Software
_:vb81703	schema:url	n4:this
_:vb81704	rdf:type	schema:Organization
_:vb81704	schema:name	OpenLink Software
_:vb81704	schema:url	n4:this

Subject

Predicate

Object

n2:

rdf:type

schema:TechArticle

n2:

rdf:type

schema:APIReference

n2:

schema:name

1.5.73.ÃÂ How to optimize bif:dateadd in SPARQL query using selective index-friendly filter?

n2:

schema:copyrightHolder

_:vb81704

n2:

schema:datePublished

2016-09-09 16:16:54

n2:

schema:headline

1.5.73.ÃÂ How to optimize bif:dateadd in SPARQL query using selective index-friendly filter?

n2:

schema:keywords

OpenLink,Virtuoso,database,RDBMS,relational,SQL,RDF,triple store,linked data,linked open data,Big Data,SPARQL

n2:

schema:license

n5:deed.en_US

n2:

schema:publisher

_:vb81703

n2:

schema:url

n2:

_:vb81703

rdf:type

schema:Organization

_:vb81703

schema:name

OpenLink Software

_:vb81703

schema:url

n4:this

_:vb81704

rdf:type

schema:Organization

_:vb81704

schema:name

OpenLink Software

_:vb81704

schema:url

n4:this

Prev	Up	Next
1.5.72. How Can I Use Expressions inside CONSTRUCT, INSERT and DELETE {...} Templates?	Home	1.5.74. How can I Determine the data usage across a Virtuoso instance?