1.5.73.How to optimize bif:dateadd in SPARQL query using selective index-friendly filter?
How?
Assume the following SPARQL query:
SELECT ?wiki, ?dbp, bif:datediff('second', xsd:DateTime(?extracted), now()) AS ?secondsAgo FROM <http://nl.dbpedia.org> WHERE { ?wiki foaf:primaryTopic ?dbp . ?dbp dcterms:modified ?extracted . FILTER ( bif:datediff('minute', now(), xsd:DateTime(?extracted)) <= 10 ) } ORDER BY DESC(?extracted) LIMIT 30
Let's take a look at the calculation of:
FILTER ( bif:datediff('minute', now(), xsd:DateTime(?extracted)) <= 10 ) .
For each "is modified" triple we:
-
Convert ?extracted to xsd:dateTime;
-
Calculate datediff;
-
Make a comparison and know whether we hit or miss 10 minutes interval
Written so, this will lead to a potentially long loop, because even if the optimizer will realize that the filter is selective, it can't discover why is it so selective.
Now let's change the filter to:
FILTER ( ?extracted > bif:dateadd('minute', -10, now())) .
now()
can be calculated once at the
very beginning of the query execution because it does not depend on
any rows in a given table. Then bif:dateadd
has all arguments known and thus the
whole bif:dateadd('minute', -10, now())
can be calculated only once and produce some value. Therefor
FILTER ( ?extracted > some_known_value
)
can be represented as a single search step: look at index
and get triples with known P, known G, O greater than the given one
and any S. That's pretty fast and predictable step, good for both
optimizer and the runtime.
We can rephrase the query to filter index-friendly:
SELECT ?wiki, ?dbp, bif:datediff('second', xsd:DateTime(?extracted) , now()) AS ?secondsAgo FROM <http://nl.dbpedia.org> WHERE { ?wiki foaf:primaryTopic ?dbp . ?dbp dcterms:modified ?extracted . FILTER ( ?extracted > bif:dateadd('minute', -10, now())) } ORDER BY DESC (?extracted) LIMIT 30
In this case the presence or the absence of the order by does not matter too much, because the query is way more straightforward: selective index-friendly filter first, and the selection could be ordered naturally via hot index used by the filter.
Note also that if you know the datatype of an object literal, there's no need to write a cast like xsd:dateTime --- it can make an expression index-unfriendly even if it will always return the argument unchanged on your specific data.