Table of Contents
When setting up a repository in Sesame, you can make a number of choices: should the repository support versioning or security, or should it be as fast as possible? What database will it use, or will it be in-memory?
In this chapter, we look at several of these configuration options in more detail.
The setup for each Sesame repository is configured using Configure Sesame!. As we have already seen in Server administration, this configuration tool allows tweaking of numerous parameters, which we will discuss in more detail here.
In the repository tab (Figure 3.6, “The "Repository" tab”), the repository id and title are declared. The id is how the repository will be known by Sesame: all client access will need to use this identifier.
The title is for human convenience and can be used to give a short description of the repository's purpose. Clients such as the web interface use it to represent the repository to the end user.
The most important part of the repository configuration is the sail stack, which can be found in the "repository details" screen (Figure 3.7, “The "Repository details" window”). Here, you configure where the actual repository storage resides, whether or not inferencing, security and versioning, etc. should be used, and what additional options are needed.
The sail stack is represented top-to-bottom. In the example, we see two sail declarations: org.openrdf.sesame.sailimpl.sync.SyncRdfSchemaRepository and org.openrdf.sesame.sailimpl.rdbms.RdfSchemaRepository. The first sail is stacked on top of the second one (which means that it operates by calling methods on the Sail underneath it). The second sail is the base sail: it is the lowest of the stack and does not operate on another sail, but directly on the actual data source. In this example, the base sail is an RDF Schema-aware driver for a relational database that supports (currently) MySQL (3.23.47 and higher), PostgreSQL (7.0.2 and higher) and Oracle 9i.
The SyncRdfSchemaRepository is optional, but we strongly recommend using it. This Sail handles concurrent access issues, without it Sesame would behave unpredictably when several users access the repository simultaneously.
Other base sails to choose from include:
All base sails that work on relational databases need a number of parameters to function:
The RDBMS-based sails also take some optional parameters:
The memory-based sails take four optional parameters:
The native sail has one required parameter:
The native sail also has an optional triple-indexes parameter, with which one can specify the indexing strategy the native sail should take. We will explain this in more detail in the next section.
The native store uses B-Trees for indexing statements, where the index key consists of three fields: subject (s), predicate (p) and object (o). The order in which each of these fields is used in the key determines the usability of an index on a specify triple query pattern: searching triples with a specific subject in an index that has the subject as the first field is signifantly faster than searching these same triples in an index where the subject field is second or third. In the worst case, the 'wrong' triple pattern will result in a sequential scan over the entire set of triples.
By default, the native store only uses a single index, with a subject-predicate-object key pattern. However, it is possible to define different indexes for the native store, using the triple-indexes parameter. This can be used to optimize performance for query patterns that occur frequently.
The subject-, predicate- and object fields are represented by the characters 's', 'p' and 'o', respectively. Indexes can be specified by creating 3-letter words from these three characters. Multiple indexes can be specified by separating these words with comma's, spaces and/or tabs. For example, the string "spo, pos" specifies two indexes; a subject-predicate-object index and a predicate-object-subject index.
Of course, creating multiple indexes speeds up querying, but there is a cost factor to take into account as well: adding and removing data will become more expensive, because each index will have to be updated. Also, each index takes up additional disk space.
The native store automatically creates/drops indexes upon (re)initialization, so the parameter can be adjusted and upon the first refresh of the configuration the native store will change its indexing strategy, without loss of data.
The basic set of RDFS inference rules (as defined in the RDF(S) MT semantics) sometimes can be insufficient to build custom applications. For example, in some applications there is a need for defining one's own transitive, symmetric or inverse properties. Providing an infrastructure to define such custom inference rules helps developers to tune the Sesame inferencer so it can suit better in the application.
Since Sesame release 0.95, we provide an alternative inferencer that works with org.openrdf.sesame.sailimpl.rdbms.RdfSchemaRepository SAIL. This custom inferencer can be initialized with a set of axiomatic triples and inference rules defined in an external file. The format of these definitions is very simple and intuitive and it is explained in greater detail in the next section.
Support for inter-rule dependency is also added to the customizable inferencer. Now we can state explicitly which rules are triggered if a rule infers a new statement. This information is given within an additional tag within the 'rule' one - 'triggers_rule'. It consists of several 'rule' tags with a name attribute specifying the rules affected.
The definition file is in XML and should conform to the following DTD:
<!DOCTYPE InferenceRules [ <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'> <!ENTITY rdfs 'http://www.w3.org/2000/01/rdf-schema#'> <!ENTITY daml 'http://www.daml.org/2001/03/daml+oil#'> <!ELEMENT InferenceRules (axiom | rule)*> <!ELEMENT axiom (subject, predicate, object)> <!ELEMENT rule ((premise+, consequent, triggers_rule?) | EMPTY)> <!ATTLIST rule name CDATA #REQUIRED> <!ELEMENT premise (subject, predicate, object)> <!ELEMENT consequent (subject, predicate, object)> <!ELEMENT triggers_rule (rule)*> <!ELEMENT subject EMPTY> <!ATTLIST subject var CDATA #IMPLIED uri CDATA #IMPLIED pattern CDATA #IMPLIED escape CDATA #IMPLIED type (resource) #IMPLIED> <!ELEMENT predicate EMPTY> <!ATTLIST predicate var CDATA #IMPLIED uri CDATA #IMPLIED pattern CDATA #IMPLIED escape CDATA #IMPLIED type (resource) #IMPLIED> <!ELEMENT object EMPTY> <!ATTLIST object var CDATA #IMPLIED uri CDATA #IMPLIED pattern CDATA #IMPLIED escape CDATA #IMPLIED type (resource) #IMPLIED> ]>
If a 'uri' attribute is present within the 'subject', 'predicate' or 'object' tags, its value is assumed to be a name of a resource.
The value of the 'var' attribute of the above tags gives the name of that variable. This attribute cannot be used within an 'axiom' tag.
For example, here are two of the axiomatic triples, as they are defined in the RDF(S) MT semantics. They appear in the configuration file like this:
<axiom> <subject uri="&rdfs;subPropertyOf"/> <predicate uri="&rdfs;domain"/> <object uri="&rdf;Property"/> </axiom> <axiom> <subject uri="&rdfs;subPropertyOf"/> <predicate uri="&rdfs;range"/> <object uri="&rdf;Property"/> </axiom>
An example of an inference rule (one stating that - if a resource is used as predicate then it is of 'type' 'Property') looks like:
<rule name="rdfs1"> <premise> <subject var="xxx"/> <predicate var="aaa"/> <object var="yyy"/> </premise> <consequent> <subject var="aaa"/> <predicate uri="&rdf;type"/> <object uri="&rdf;Property"/> </consequent> <triggers_rule> <rule name="rdfs2" /> <rule name="rdfs3" /> <rule name="rdfs4a" /> <rule name="rdfs5b" /> <rule name="rdfs6" /> <rule name="rdfs9" /> </triggers_rule> </rule>
In the above example 'xxx', 'aaa' and 'yyy' are variables and 'rdf:type' and 'rdf:Property' are exact resource URIs.
A 'pattern' attribute with conjunction with an 'escape' attribute is used to define a pattern for matching resource names. They both can appear only in a triple component denoting variables, e.g. with 'var' attribute specified. Use '?' to denote any single character and '*' to match any character combination with length greater than 0.
Use a character declared in 'escape' attribute to escape '?' or '*' characters within pattern. You need to specify 'pattern' and 'escape' attributes for a given variable only once per rule (note that pattern and escape are used only once for variable 'id'.
An example of rule using pattern matching:
<rule name="rdfsXI"> <premise> <subject var="xxx"/> <predicate var="id" pattern="&rdf;_*" escape="\"/> <object var="yyy"/> </premise> <consequent> <subject var="id"/> <predicate uri="&rdf;type"/> <object uri="&rdfs;ContainerMembershipProperty"/> </consequent> <triggers_rule> <rule name="rdfs2" /> <rule name="rdfs3" /> <rule name="rdfs6" /> <rule name="rdfs9" /> <rule name="rdfs10" /> </triggers_rule> </rule>
Note that you can match these triple templates by the values to the variables used in them and the specified resources used as subjects, predicates or objects of a triple.
Consider the property URI is http://somewhere.org#partOf. In our example domain, we wish to ensure that this resource is always inserted in the repository, so we add the axiomatic triple stating that it is a property:
<axiom> <subject uri="http://somewhere.org#partOf"/> <predicate uri="&rdf;type"/> <object uri="&rdf;Property"/> </axiom>
We also wish to define that the property is transitive. To this end, we add a single inference rule:
<rule name="userPartOf"> <premise> <subject var="xxx"/> <predicate uri="http://somewhere.org#partOf"/> <object var="yyy"/> </premise> <premise> <subject var="yyy"/> <predicate uri="http://somewhere.org#partOf"/> <object var="zzz"/> </premise> <consequent> <subject var="xxx"/> <predicate uri="http://somewhere.org#partOf"/> <object var="zzz"/> </consequent> <triggers_rule> <rule name="rdfs2" /> <rule name="rdfs3" /> <rule name="rdfs6" /> <rule name="userPartOf" /> </triggers_rule> </rule>
If the repository has these two triples: T1 - (finger.1, partOf, Hand.Left) and T2 - (Hand.Left, partOf, Human.1) and if they match the condition (since the same 'yyy' variable is used in both 'premise' tags) T1.object = T2.subject, a triple corresponding to the 'consequent' tag is added to the repository, using the current variable bindings and will have the form TInfer = (T1.subject, partOf, T2.object) e.g. Tinfer=(Finger.1, partOf, Human.1).
The inferencer used by a repository based on org.openrdf.sesame.sailimpl.rdbms.RdfSchemaRepository sail is defined by a parameter passed to it during the initialization. To start using the custom inferencer on a repository, add the following extra parameter to the configuration of that repository:
An example rules file, containing the axioms and entailment rules as specified by the January 23 Working Draft of the RDF Model Theory, can be found in the Sesame source tree, specifically in src/org/openrdf/sesame/sailimpl/rdbms/entailment-rdf-mt-20030123.xml. This file is used per default by the custom inferencer if the rule-file parameter is not specified.
Changes to the rules file do not lead to automatic reapplication of the rules over the existing data in the repository. So clean the repository first to avoid inconsistency problems.
The dependency information used by the TMS system is also affected by the rules. The default inferencer uses dependency database table, that can handle cases where up to two triples leads to the inference of a new one. Since there can exist inference rules involving arbitrary number of 'premise' tags in the configuration file - the structure of the default dependency table cannot handle them. To avoid loss of data, the structure of that table is not altered and it is created only if it not exist. This check is performed during repository initialization phase. So it is better to apply new/modified inference rules on a completely clean datastorage (database).
[This section not yet available. See the documentation at http://www.ontotext.com/omm/ for details.]