Query language

From WandoraWiki
Jump to: navigation, search

Wandora uses a custom query language to select topics in a topic map. Currently the query language is used in the search tool, in Custom topic panel and in Query topic map. Image below illustrates Wandora's query interface in search tool. Search tool is opened with menu option Edit > Find....


Find query tab.gif


Contents

Introduction

Wandora does not use any standard query language. Instead queries are done by invoking a method of a Java class implementing a certain interface. The class may then perform anything whatsoever as long as in the end it returns query results in the format specified by the Java interface. Wandora does however include a number of classes designed in a way that makes it possible to build complex queries by combining these simple predefined query directive classes. This somewhat resembles a traditional query language.

The queries are defined using a generic scripting language. Wandora uses Java scripting API so it should be possible to use a number of different languages. Examples in this article should work with Mozilla Rhino 1.6 scripting engine using ECMAScript. This scripting engine should be present in most installations. ECMAScript syntax is very similar to regular Java syntax.

Following example demonstrates a query that selects the number of instances in a topic.

1 importPackage(org.wandora.query2);
2 new Count(
3   new Instances()
4 );

First line imports the query package. This is one of few common cases where ECMAScript syntax is different than normal Java syntax. Lines 2 to 4 contain the actual query. The Count directive counts the number of rows in the result of the directive inside it. The Instances directive inside Count on line 3 selects all instances of the input.

All directives may return any number of rows and get as input a single row. Each row may contain any number of values. Values are indexed with column role names which can be any text strings but are generally formed like URIs. Thus each row resembles a topic map association, only without an association type. One of the columns in a row is marked as active. This is usually the column that was last added or modified by a directive and its value is the primary input for other directives. Most directives use the default column name "#DEFAULT".

It was said above that each directive receives as input only a single row. However in the above example the Count directive counts the number of rows of the Instances directive which of course may be any number of rows. The Instances directive or its result are not actually considered input to Count directive. Count is executed first so it must have received input before we even get to Instances. The input to top-most directive is usually the currently open topic in Wandora. To be more specific, it contains a single column with the default name and the value is the currently open topic in Wandora and this single column is the active column of the row.

The Count directive passes its input to the inner Instances directive as is. The Instances directive uses the active column of its input, in this case the currently open topic in Wandora, and gets all instances of that. Generally directives add their results to the input row as new columns with the default name and set the new column as active. In this case the only column in input uses the default name and gets overwritten. Thus the result of the Instances directive is some number of rows, each containing a single column with the default name and the value is some topic which is an instance of the input.

The result of the Instances directive goes back to the Count directive which counts the rows in it. It adds this number in the input row, not the results of Instances directive. Again the single column gets overwritten because it had the default name. The final result is a single row which contains a single column with the default name and a number indicating the number of instances in the currently open topic.

#DEFAULT
9

Now lets modify the query slightly.

1 importPackage(org.wandora.query2);
2 new Count(
3   new Instances()
4 ).from(
5   new Instances()
6 )

The from method on line 4 causes the input for the Count directive to come from some other directive. In this case we use again an Instances directive. The execution of this query goes as follows. The initial input, the currently open topic, first goes to the directive inside the from part, that is Instances directive on line 5. This returns the instances of the currently open topic. Then each of these result rows are fed one at a time to the Count directive and all the results of Count are combined. The Count directive itself works exactly as in previous example, it only gets a different input this time. Now it counts the instances of all instances of the currently open topic. We still only get one column in the final result because at each step the default column gets overwritten. The final result might look something like this.

#DEFAULT
7
2
5
0
4
1
5
1
9

In the above table each number tells the number of instances of some topic which is an instance of the currently open topic. But this isn't very useful since we can't know which number corresponds to which topic. So let's modify the query a bit again.

1 importPackage(org.wandora.query2);
2 new Count(
3   new Instances()
4 ).as("#count").from(
5   new Instances().as("#instance")
6 )

The as methods on line 4 and 5 reset the column name to something other than the default. On line 5 we change the column containing the instance topics to "#instance" and on line 4 the column for the count number to "#count". Now our final result has these two columns and we can actually see which topic each of the instance counts belongs to.

#instance #count
Role 7
Wandora variant name version 2
Association type 5
Wandora class 0
Wandora language 4
Role class 1
Content type 5
Occurrence type 1
Schema type 9

As was mentioned earlier, directives usually use the active column value of the input row. This active column is the last column that was added or modified. In most cases this is the right choice for input but not always. Let's say we want to get the base name of the instance topics too and add a BaseName directive as is done on line 5 in following example.

1 importPackage(org.wandora.query2);
2 new Count(
3   new Instances()
4 ).as("#count").from(
5   new BaseName().as("#basename").from(
6     new Instances().as("#instance")
7   )
8 )

The base name column is the last column added and thus the active column. The instances directive tries to use this as input. This will fail because the base names aren't actually topics. To fix this we need to manually change the active column.

1 importPackage(org.wandora.query2);
2 new Count(
3   new Instances().of("#instance")
4 ).as("#count").from(
5   new BaseName().as("#basename").from(
6     new Instances().as("#instance")
7   )
8 )

The of method on line 3 changes the active column before the input gets to the Instances directive and this query works as expected.

Now that we have the base name, we can use it for something. We can for example only include rows where the base name contains the word "type".

1  importPackage(org.wandora.query2);
2  new Count(
3    new Instances().of("#instance")
4  ).as("#count").from(
5    new BaseName().as("#basename").from(
6      new Instances().as("#instance")
7    ).where(
8      new Regex(".*type.*")
9    )
10 )

The where method on line 7 is used to filter rows. It contains a filtering directive, in this case Regex which filters based on a regular expression. The regular expression ".*type.*" matches everything that contains the word "type" somewhere in the base name.

We could call the where method after everything else at line 10 and get the same result (we would have to add of method call too so the comparison is done using the right column). However this would cause the Count directive to be processed for all rows, even those that are eventually going to be filtered out anyway. In this case it may not have a drastic effect but the directive inside Count on line 3 could be something much more complex and take significant amount of time to process. Then the place where the filtering is done would have a significant effect on the time it takes to execute the query. In general, you should try to reduce the number of rows processed as early as possible.

The final result of the above query is.

#instance #basename #count
Association type Association type 5
Content type Content type 5
Occurrence type Occurrence type 1
Schema type Schema type 9

Some directives take TopicOperands as parameters. These are essentially things that will resolve into a topic in some way. It can be an actual Topic object, a String containing the subject identifier of a topic or a directive which produces the topic as the default value of the first result row. If the directive operand produces more than one row, all the other rows are ignored. The operand gets as input the same input row as the directive using the operand.

The most usual case is to use the subject identifier, you'll pass this to the directive simply as a String. The following example illustrates this. A topic is passed directly to IsOfType as a subject identifier String. So this gets all instances that are of the specified type.

1 importPackage(org.wandora.query2);
2 new Instances()
3 .where(
4   new IsOfType("http://www.wandora.org/core/associationtype")
5 )

Only when the operand somehow depends on the input row, do you need to use a directive as an operand. And in this case you typically just have an Of directive which picks one column from the input row, but in theory the directive could be as complex as you want. The following example gets instances of the current topic that are instances of themselves. Here the IsOfType filter depends on the input row being filtered instead of filtering with a static topic like in the previous example. The Of directive could in this case be replaced with a simple Identity to do the same thing because the active (and only) column in the input is the topic itself.

1 importPackage(org.wandora.query2);
2 new Instances()
3 .where(
4   new IsOfType(new Of("#DEFAULT"))
5 )


You can find examples of queries on all of the directive pages (see list below). There are also some more Complex example queries on a separate page.

Directives

Query Structure

Note that As, From, Join and Of directives can be constructed by calling similarly named methods in directives instead of creating them using the constructors of the classes directly. Using the methods to define the query structure usually results in much more readable queries.

  • As([String original,] String newRole) - Changes the role name of the active column or the specified column.
  • From(Directive to,Directive from) - Takes the results from one directive and feeds them one row at a time to another combining all results.
  • First([int count,] Directive directive) - Returns the first or the first N rows of specified directive.
  • If(Directive cond,Directive then[,Directive else]) - Returns results of one of two directives depending on the input.
  • Join(Directive d1,Directive d2,...) - Joins the results of inner directives by performing a cartesian product on the results of them.
  • Last([int count,] Directive directive) - Returns the last or last N rows of specified directive.
  • Of(String role) - Changes the active column of input to the specified column.
  • Recursive(Directive recursion[,int maxDepth[,boolean onlyLeaves]]) - Applies a directive recursively.
  • Roles(String s1[,...]) - Returns the input row but with only the specified roles. If a role is not found in input, it is added with null value.
  • Union(Directive d1,Directive d2[,...]) - Joins the results of inner directives by concatenating them.
  • Unique(Directive directive) - Removes duplicate rows.

Aggregate

  • Average(Directive directive) - Returns the average of values of active column.
  • Count(Directive directive) - Counts the number of rows returned by the inner directive using same input the Count directive got.
  • Concat(Directive directive) - Concatenates the String representation of active column values.
  • Sum(Directive directive) - Returns the sum of values of active column.

Primitive

  • Empty() - Returns an empty result.
  • Identity() - Returns the input row as is.
  • Literals(String s1,[...]) - Returns the strings provided to constructor.
  • Null() - Returns a null value.
  • Static(ArrayList<Result> rows) - Returns the provided result rows.

Filtering

Note: All filtering directives can be used with the where method present in all directives. A.where(B) resolves to new From(B,A).

  • And(WhereDirective d1,WhereDirective d2,[...]) - Includes rows which satisfy all inner filtering directives.
  • Compare(String operand1,String operator,String operand2[,boolean numeric]) - Compares the values of two roles in the input column.
  • Exists(Directive directive) - Includes rows where the inner directive returns a non-empty result using the row itself as input.
  • IsOfType(TopicOperand op) - Includes rows where the active column value is an instance of the specified type.
  • Not(WhereDirective directive) - Returns rows which do not satisfy the inner filtering directive.
  • Or(WhereDirective d1,WhereDirective d2[,...]) - Includes rows which satisfy at least one of inner filtering directive.
  • Regex(String regex[,int mode]) - Includes rows where the active column matches the specified regular expression.

Topic maps directives

Result and input rows may contain values of any type. Most other than topic maps related directives only use string values. Most topic maps related queries assume that input values are either topics or subject identifiers of topics. Thus it is usually not necessary to specifically convert string values to topics first. The most common case where you would want to specifically do that is to ensure that the final result of the query contains topics instead of string. This allows you to use all the Wandora topic related tools on the result table. To convert subject identifier string to topics use the Topics directive.

  • AllTopics() - Returns all topics of the topic map.
  • BaseName() - Gets the base name of the active column value of input.
  • Instances() - Gets all instances of the active column value of input.
  • IsOfType(TopicOperand op) - Includes rows where the active column value is an instance of the specified type.
  • Occurrence([TopicOperand type][,TopicOperand version]) - Gets the occurrence with specified type and version from the active column value of input. You can leave out the version to get all occurrences of a type or both parameters to get all occurrences in the topic.
  • Players(TopicOperand associationType,TopicOperand r1[,...]) - Gets players of specified roles from associations of specified type.
  • SubjectIdentifiers() - Gets all subject identifiers of the active column value of input.
  • Topics() - Gets the topic with subject identifier equal to the string value of the active column value of input.
  • Types() - Gets all types of the active column value of input.
  • Variant([TopicOperand type][,TopicOperand version]) - Gets the variant name with the specified type and version from the active column value of input. You can leave out the version to get all variants of a type or both parameters to get all variants in the topic.
  • Variant2(TopicOperand scope, [...]) - Gets the variant name with the specified scope. You can use this to get variants with any kind of scope while the other directive is specifically for variants which have exactly two topics in the scope, a type and a version.

Others

  • Eval(String expression) - Evaluates a script.
  • Regex(String regex[,String replace][,int mode]) - Filters and/or performs search and replace operations using regular expressions.

Notes, known issues, and limitations

Most of these things you can work around by using the Eval directive. It takes a custom script which can then do pretty much anything. See examples in the directive page.

  • Explicit new operator in front of directives can be omitted.
  • You can use the query language with Jython scripts too. You need version 2.5.1 of Jython or later. Jython scripts don't return the value of the last line, instead you have to assign the query directive in a variable query. The syntax is naturally slightly different, specifically you don't use the new operator when constructing new classes and there are the usual python restrictions to whitespace use. Also note that the Eval directive still uses the default scripting engine i.e. ECMAScript, even if invoked through Jython. For example one of the introduction examples in Jython:
from org.wandora.query2 import *
query=Count(
  Instances()
).as("#count").from(
  Instances().as("#instance")
)


See also

Wandora also supports TMQL queries.

Personal tools