16.5.9.Lookup Optimization -- BIJECTION and RETURNS Options

There is one subtle problem with IRI class declarations. To get benefit from a relational index, SPARQL optimizer should compose equality between table column and some known SQL value, not between return value of IRI class and a known composed IRI. In addition, redundant calculations of IRIs takes time. To enable this optimization, an IRI class declaration should end with option (bijection) clause. For some simple format strings the compiler may recognize the bijection automatically but an explicit declaration is always a good idea.

[Note] Note:

See also: Wikipedia - Bijection . In mathematics, a bijection, or a bijective function is a function f from a set X to a set Y such that, for every y in Y, there is exactly one x in X such that f(x) = y.

Alternatively, f is bijective if it is a one-to-one correspondence between those sets; i.e., both one-to-one (injective) and onto (surjective).

The SPARQL compiler may produce big amounts of SQL code when the query contains equality of two calculated IRIs and these IRIs may come from many different IRI classes. It is possible to provide hints that will let the compiler check if two IRI classes form disjoint sets of possible IRI values. The more disjoint sets are found the less possible combinations remain so the resulting SQL query will contain fewer unions of joins. The SPARQL compiler can prove some properties of sprintf format strings. E.g., it can prove that set of all strings printed by "http://example.com/item%d" and the set of strings printed by "http://example.com/item%d/" are disjoint. It can prove some more complicated statements about unions and intersections of sets of strings. The IRI or literal class declaration may contain option (returns ...) clause that will specify one or more sprintf patterns that cover the set of generated values. Consider a better version of IRI class declaration listed above:

create iri class oplsioc:grantee_iri using
  function DB.DBA.GRANTEE_URI (in id integer)
    returns varchar,
  function DB.DBA.GRANTEE_URI_INVERSE (in id_iri varchar)
    returns integer
  option ( bijection,
    returns "http://myhost/sys/group?id=%d"
    union   "http://myhost/sys/user?id=%d" ) .

It is very important to keep IRI classes easily distinguishable by the text of IRI string and easy to parse.

  • Format

    %U

    is better than

    %s

    , especially in the middle of IRI, because the

    %U

    fragment can not contain characters like "/" or "="; one may prove that

    /%U/

    and

    /abra%d/cadabra/

    are disjoint but

    /%s/

    and

    /abra%d/cadabra/

    are not disjoint.

  • It is better when the variable part like

    %U

    or

    %d

    is placed between characters that may not occur in the

    %U

    or

    %d

    output, i.e.

    %U

    is placed between "/", "&" or "=" and

    %d

    is placed between non-digits;

    order_line_%d

    is better than

    order-line-%d

    because minus may be part of

    %d

    output.

  • End-of-line is treated as a special character, so placing

    %U

    or

    %d

    between "/" and end of line is as good as placing it between two "/".

In some cases option (returns ...) can be used for IRI classes that are declared using sprintf format, but actual data have more specific format. Consider a literal class declaration that is used to output strings and the application knows that all these strings are ISBN numbers:

create literal class example:isbn_ref "%s" (in isbn varchar not null)
  option ( bijection, returns "%u-%u-%u-%u" union "%u-%u-%u-X" )

Sometimes interoperability restrictions will force you to violate these rules but please try to follow them as often as possible.