R in Wandora

From WandoraWiki
(Difference between revisions)
Jump to: navigation, search
(Undo revision 10646 by Eero (talk))
 
Line 1: Line 1:
[http://www.r-project.org/ R] may be used to compute graph statistics related to Topic Maps or other groups of topics in Wandora. The selected group of topics is represented in R as an [http://igraph.sourceforge.net/ igraph]. The igraph library is thus a prerequisite for graph analysis of Topic Maps in R. Several use cases are presented below. Refer to the [http://igraph.sourceforge.net/doc/R/00Index.html igraph documentation] for further methods of statistical graph analysis.
+
Wandora can be used with the [http://www.r-project.org/ R language]. R is an environment for statistical computing and graphing. Properties of the topic map and its topics can be accessed from R and statistics and graphs can be generated from them.
  
== Handling topics in R ==
+
== Setting up R ==
  
The R interface is exposed through the R console found in the toolbar in Wandora. Scripts may also be run using the [http://www.wandora.org/wiki/R_topic_panel R topic panel] in '''view -> Add topic panel -> R'''.
+
To use R in Wandora you need to install R, then install a few R libraries using the package manager inside R and finally possibly adjust some environment parameters in the Wandora startup script. These steps are explained in detail below.
  
We may aquire the targeted set of topics from Wandora using methods of classes that have been exposed by the [http://www.wandora.org/api/org/wandora/application/tools/r/RBridge.html Java-R-bridge]. We may for example query for all the visible topics simply by calling <code>getAllTopics()</code> from <code>rinit.r</code> or more specifically
+
=== Installing R ===
  
# Retrieve the Wandora Java object using getWandora() from org.wandora.application.Wandora
+
Download R from the R website at http://www.r-project.org/ and follow the installation instructions there. The default installation installs both 32-bit and 64-bit versions. If you decide to only install one, make sure it matches your Java runtime environment version.
# Retrieve the topics themselves using Java mehtod calls and unwrap the returned iterator to a list suitable for futher use.
+
  
wandora <- J("org.wandora.application.Wandora")$getWandora()
+
On Linux environments R may also be available using the package repository of your Linux distribution. In Ubuntu the name of the package you need is ''r-base''.
tm <- wandora$getTopicMap()
+
ts <- unwrapIterator(getTopicMap()$getTopics())
+
  
An igraph object is next created with <code>makeGraphJava</code> where the helper class [http://www.wandora.org/api/org/wandora/application/tools/r/RHelper.html org.wandora.application.tools.r.RHelper] is utilized for heavy lifting. A similar but slower function using R is implemented in <code>makeGraph</code>. The resulting object is next used for the actual statistical analysis.
+
=== Installing required R libraries ===
  
===Example #1: Calculate the graph diameter===
+
At the very least you must install the ''rJava'' library. Do this using the package manager inside R. First run R using the administrator account. On Windows right click the icon and select "Run as Administrator". The installation may have created two icons on your desktop or the start menu, one for 32-bit R and one for 64-bit. You should use the one that matches your Java runtime environment. Note that you might have a 32-bit Java runtime environment even if your Windows is 64-bit.
  
We may now use R to replicate the functionality of the [http://www.wandora.org/wiki/Topic_map_diameter Topic Map diameter] calculator provided in Wandora. In comparison this process should prove to be faster, less memory consuming and therefore more suitable for large datasets.
+
On Linux run R with root, for example in console using "sudo R".
  
g <- makeGraphJava(ts)
+
When installing a package you will be prompted to select a mirror for download. Just select your country or one close to it. To install the package issue the command in R:
d <- diameter(g)
+
  
A full example is written out in <code>GetDiameter.r</code> found among other examples in build/resources/r.
+
  install.packages("rJava")
  
===Further statistics===
+
Most likely you will also want to install the ''igraph'' library. It is needed to plot network graphs.
  
Having constructed the graph there are a multitude of statistics we may compute using functions provided in igraph. In addition to the diameter calculation introduced above a few basic ones are listed here.
+
  install.packages("igraph")
  
* [http://en.wikipedia.org/wiki/Centrality#Eigenvector_centrality Eigenvector centrality] and [http://en.wikipedia.org/wiki/Betweenness_centrality betweenness centrality] using <code>evcent(g)</code> and <code>betweenness(g)</code>
+
Currently Wandora has a problem with the default graphics device in Windows environment. To be able to plot anything in Windows you will need to install the ''JavaGD'' graphics device. This is only needed in Windows.
* Degree distribution via <code>degree.distribution(g)</code>
+
  
 +
  install.packages("JavaGD")
  
<gallery perrow=2 widths=300px heights=300px>
+
=== Setting up enviroment variables ===
File:random500.jpg|Betweenness centrality plotted against eigenvector centrality for a random generated graph with 500 topics and 1000 associations.
+
File:service.jpg|Betweenness centrality plotted against eigenvector centrality for the [[The Service Map - Helsinki region public services]]
+
</gallery>
+
  
== Mapping vertices to topics ==
+
Next make sure that the environment variables are setup correctly in the Wandora startup script. In Windows open the ''bin/SetR.bat'' file and in Linux the ''bin/SetR.sh'' file. If you did a standard installation of R then the Linux start-up script likely needs no changes at all.
  
<code>makeGraphJava(topics)</code> transforms the Topic Map representation to a rudimentary undirected graph where topics are represented as vertices identified by a running index. Associations are in turn represented as a set of edges pairing the vertexes together. It is difficult to target a specific topic by it's SI or other identifier in Wandora since all topic data excluding associations is lost in the transformation.
+
The Windows start-up script however has two things that may need adjusting. Make sure the first line points to you R installation directory. Especially the R version number may need to be changed. Also make sure that the processor architecture matches your Java installation. Note that you may have a 32-bit Java even if your system is 64-bit. If you aren't sure which Java version you have you can simply try both settings and see which one works. The architecture is specified on lines 5 and 6. For 32-bit use
  
In order to preserve the mapping between topics and respective vertices we may construct the map by hand and use it to call <code>makeGraph(topics, indiceMap)</code> in rinit.r. More specifically
+
  set R_ARCH=i386
 +
  REM set R_ARCH=x64
  
# Fetch all the topics we want in the graph. Here we fetch all topics contained in the Topic Map.
+
And for 64-bit
# Construct the indice map where each subject identifier string maps to an index in the graph.
+
# Get the string representation of a subject identifier of each topic. This is equivalent to t.getOneSubjectIdentifier( ).toString();
+
#Append the si-index pair to the map. Indices range from 1 to <code>length(ts)</code>.
+
# Call makeGraph with indices specified in order to make sure our indices are used to construct the graph.
+
<pre>
+
ts <- getAllTopics()
+
ind <- list()
+
for(t in ts){
+
  si <- .jcall(t,"Lorg/wandora/topicmap/Locator;","getOneSubjectIdentifier")
+
  si <- .jcall(si,"Ljava/lang/String;","toString")
+
  ind [[si]]<-length(ind)+1
+
}
+
g <- makeGraph(ts,ind)
+
</pre>
+
We may now select a topic in Wandora and import it to R with <code>getContextTopics()</code>. A detailed example is detailed below.
+
  
# Get the selected topics from Wandora.
+
  REM set R_ARCH=i386
# Get the string representation of the SI and using it look up the index from the map  constructed earlier.
+
  set R_ARCH=x64
<pre>
+
cts <- getContextTopics()
+
for (ct in cts){
+
  si <- .jcall(ct, "Lorg/wandora/topicmap/Locator;","getOneSubjectIdentifier")
+
  si <- .jcall(si, "Ljava/lang/String;","toString")
+
  cind[[length(cind)+1]] <- ind[[si]]
+
}
+
</pre>
+
  
Having found the indices of vertices for the selected topics we can now compute statistics related to those vertices in the graph. For example we may compute the immediate neighborhood sizes of those topics with <code>neighborhood.size(g,1,cind)</code>.
+
Other parameters should be correct unless you have customized your R installation beyond a standard setup.
  
A full example is again written out in GetContextTopicsNeighbours.r in build/resources/r. We may also define a reverse lookup for SIs in the following manner:
+
You should now be able to use R inside Wandora.
  
<pre>
+
== Plotting in Windows ==
getSI <- function(i){
+
  for(t in ts){
+
    si <- .jcall(t,"Lorg/wandora/topicmap/Locator;","getOneSubjectIdentifier")
+
    si <- .jcall(si,"Ljava/lang/String;","toString")
+
    if(ind[[si]] == i)
+
      return(si)
+
    }
+
  }
+
  return("")
+
}
+
</pre>
+
  
===Example #2: Find the topics in a community===
+
The default graphics device doesn't work correctly in Windows. If you try to plot anything you will get an unresponsive graphics window. If at any time you accidentally open it you can close it cleanly in Wandora R Console with
  
This lookup may be utilized in finding topics for a subset of vertices from the graph. In this case we first compute communities for a set of topics. We then pick a community and find the topics in it.
+
  dev.off()
  
Again, fetch all topics in Wandora.
+
To work around this issue you need to use the JavaGD graphics device. You first need to load the JavaGD library with
ts <- getAllTopics()
+
  
Construct the vertex index to topic SI mapping discussed above.
+
  library("JavaGD")
  
<pre>
+
Then initialize the graphics device with
ind <- list()
+
for(t in ts){
+
  si <- .jcall(t,"Lorg/wandora/topicmap/Locator;","getOneSubjectIdentifier")
+
  si <- .jcall(si,"Ljava/lang/String;","toString")
+
  ind[[si]]<-length(ind)+1
+
}
+
g <- makeGraph(ts,ind)
+
</pre>
+
  
Get the communities of the graph g. Here we use random walk to distinguish communities.
+
  JavaGD()
+
ns <- walktrap.community(g)
+
  
Get all the vertex IDs in the community with the ID 10
+
This will open an empty graphics window. You can then use plot normally to plot in this window.
  
<pre>
+
== R console in Wandora ==
bigComVer <- list()
+
mem <- membership(ns)
+
for(m in mem){
+
  if(mem[[m]] == 10){
+
    bigComVer[[length(bigComVer)]] <- m
+
  }
+
}
+
</pre>
+
  
Finally find the SIs for the vertices we found above.
+
You can open the R console in Wandora by clicking the ''R console'' button in the top toolbar. Assuming that you have installed R and setup the environment correctly you will get the standard R greeting with R version and license information.
+
sis <- lapply(bigComVer,getSI)
+
  
This approach may also be used with the diameter calculations to find the vertices of the longest path in the topic. We find the vertice IDs with
+
  R version 2.11.1 (2010-05-31)
 +
  Copyright (C) 2010 The R Foundation for Statistical Computing
 +
  ISBN 3-900051-07-0
 +
 
 +
  R is free software and comes with ABSOLUTELY NO WARRANTY.
 +
  You are welcome to redistribute it under certain conditions.
 +
  Type 'license()' or 'licence()' for distribution details.
 +
 
 +
    Natural language support but running in an English locale
 +
 
 +
  R is a collaborative project with many contributors.
 +
  Type 'contributors()' for more information and
 +
  'citation()' on how to cite R or R packages in publications.
 +
 
 +
  Type 'demo()' for some demos, 'help()' for on-line help, or
 +
  'help.start()' for an HTML browser interface to help.
 +
  Type 'q()' to quit R.
 +
 
 +
  >
  
d <- get.diameter(g)
+
Otherwise you'll get an error message and instructions about how to setup your R environment.
  
and find the corresponding SIs with
+
You can issue R commands in the text area at the bottom part of the window. This includes almost everything you can do in R, one notable exception is that the help system doesn't work properly so "?plot" and the like don't do anything. Also a few other functions have been disabled because they don't work very well when R is ran inside Java. These functions include ''q'', ''quit'', ''demo'', ''contributors'' and ''citation''.
 +
 
 +
You can browse the topic map in Wandora while having the R console open. This way you can select topics in the main Wandora window and then get references to those topics in the R environment (see next section).
 +
 
 +
== Using R with topic maps in Wandora ==
 +
 
 +
There are a couple of ways to access the topic map in R. Some of these rely heavily on the Java topic map API used in Wandora. You will need to call the Java methods of the topic objects. To find out more about the API look at the [http://www.wandora.org/wandora/docs/api/ javadocs] of Wandora. Mostly you will only need to look at the [http://www.wandora.org/wandora/docs/api/org/wandora/topicmap/Topic.html Topic] and [http://www.wandora.org/wandora/docs/api/org/wandora/topicmap/TopicMap.html TopicMap] classes and a few classes related to them. The Java methods are accessed with the $ indexing operator. For example if the variable ''t'' contains a topic you can get the base name of that with
 +
 
 +
  t$getBaseName()
 +
 
 +
The R environment in Wandora is initialized by running the ''/build/resources/conf/rinit.r'' file. This file defines some functions that make accessing the topic map slightly easier. You can get a reference to the topic map object itself with ''getTopicMap'' function in R. Alternatively you can get a list of all the topics in the topic map with ''getAllTopics'' or the currently selected topics in Wandora with ''getContextTopics''. For example, to get the base names of the currently selected topics in Wandora use
 +
 
 +
  lapply(getContextTopics(),function(t) t$getBaseName())
 +
 
 +
To plot something you first need to gather the data you want to plot. For example, you could plot a histogram that visualizes the amount of associations the topics have in the topic map. '''Note that you may have to copy this and other examples on this page one line at a time'''.
 +
 
 +
  ts<-getAllTopics() # get a list of all topics
 +
  as<-lapply(ts,function(t) t$getAssociations()$size()) # as has the number of associations in each topic
 +
  fac<-factor(unlist(as)) # make a factor from as
 +
  plot(fac) # plot the factor
 +
 
 +
In Windows you should open the JavaGD graphics device before the last plot line. Do this with
 +
 
 +
  library("JavaGD") # loads the JavaGD, only need to do this once per R session
 +
  JavaGD() # opens a JavaGD graphics window
 +
 
 +
There is also a function to setup a graph object that can be used with the ''igraph'' library. use the ''makeGraphJava'' function and pass it a list of topic objects. It returns a graph object that can be plot directly. See the [http://igraph.sourceforge.net/doc/R/00Index.html igraph R documentation] for more information about how to customize the plot. Especially the plot function may be passed parameters relating to the layout of the graph. Before using the ''makeGraphJava'' you have to load the igraph library. For example to plot a network of the currently selected topics in a circular layout use the following. And of course you must have selected some topics in Wandora for this to work.
 +
 
 +
  library("igraph") # loads the library, only need to do this once per R session
 +
  plot(makeGraphJava(getContextTopics()),layout=layout.circle)
 +
 
 +
Again in Windows you need to remember to use the JavaGD library before plotting.
 +
 
 +
Instead of getting a list of selected topics you can get a list of selected associations with ''getContextAssociations''. After this you can get the players with ''getPlayers'' which takes as parameters a list of associations and a role, this can either be a topic object or a string giving the base name of the role. So for example you could select some associations and then get the topics playing the role ''value'' with
 +
 
 +
  getPlayers(getContextAssociations(),"value")
 +
 
 +
You can convert topics to strings or numbers using ''as.character'' or ''as.numeric'' respectively. These use the topic base name to do the conversion. If you want to use a variant name or get the data from an occurrence you will have to use the topic map API to get the desired value. But if you have your numeric data in the base name and can get it listed in a table in Wandora and then selected then you can get a simple vector of numbers with something like
 +
 
 +
  sapply( getPlayers(getContextAssociations(),"value"), as.numeric )
 +
 
 +
Note that the ''as.numeric'' is fairly lenient when converting topic base names to numbers and knows how to skip non numeric characters in the base name.
 +
 
 +
== Example ==
 +
 
 +
In this example we will use the [[SPARQL extractor]] to extract some data relating to the demographics of Helsinki. We will then make a map plot of the districts of Helsinki with colours indicating the total population in the district.
 +
 
 +
First we use the SPARQL extractor to extract the data. Go to ''File'' menu and select ''Extract/Other/SPARQL extractor''.
 +
 
 +
[[Image:Rexample1.png]]
 +
 
 +
We are going to use the [http://www.hri.fi/en/ Helsinki Region Infoshare] SPARQL end point so select the HRI tab. Then clear the query text area and replace it with following. It will get the population of all areas of Helsinki that themselves don't have any sub areas.
 +
 
 +
SELECT ?area ?poly ?value
 +
WHERE {
 +
  ?area rdf:type dimension:Alue;
 +
      geo:polygon ?poly.
 +
  ?item rdf:type scv:Item;
 +
      rdf:value ?value;
 +
      dimension:ikäryhmä ikäryhmä:Väestö_yhteensä;
 +
      dimension:vuosi vuosi:_2009;
 +
      dimension:yksikkö yksikkö:Henkilöä;
 +
      dimension:alue ?area.
 +
  OPTIONAL { ?narrower skos:broader ?area . }
 +
  FILTER ( !BOUND(?narrower) )
 +
}
  
sis <- lapply(d,getSI)
+
Then click ''Extract''.
  
== Importing graphs from R to Wandora ==
+
[[Image:Rexample2.png]]
  
Above we've used the rJava bridge to import topics from Wandora to R. We may also import graphs from R to Wandora as topics and associations. Auxiliary data may be specified to use as base names and occurrence data for the extracted topics.
+
After a few seconds you should get a message informing you that one result set was extracted.
  
===Example #3: the bull graph===
+
Next find the result set topic on the left hand side (labelled 1 below) and double click it to open the topic. Then select all the rows in the result set by right clicking somewhere on the association table (labelled 2). In the context menu choose ''Select/Select all'. Then open the R console by clicking the button in the toolbar (labelled 3).
  
As a simple example we use igraph to generate a [http://en.wikipedia.org/wiki/Bull_graph bull graph] which we will import to Wandora. First we create the graph.
+
[[Image:Rexample3.png]]
  
g <- graph.famous("bull");
+
Then give the following commands to R. You can copy and paste all of it at once in the input box at the lower part of the window. If you aren't doing this in Windows then you can remove lines 5 and 6 (the JavaGD part). Press enter or the ''Evaluate'' button to evaluate the commands.
  
Next we attach a name the vertices of the graph. The order in the list corresponds to the vertex IDs in the graph. We could as well use the actual names specified in the graph if the vertices are already named.
+
associations<-getContextAssociations()
 +
polygons<-getPlayers(associations,"poly")
 +
polygons<-lapply(polygons,function(p){extractPolygon(getDisplayName(p),reverse=TRUE)})
 +
values<-sapply(getPlayers(associations,"value"),as.numeric)
 +
library("JavaGD") # Remove these two lines if you
 +
JavaGD()          # aren't doing this on Windows
 +
plotPolygons(polygons,values)
  
names < -c("first","second","third","fourth","fifth")
+
[[Image:Rexample4.png]]
  
Now we can call <code>createTopics</code> in <code>rinit.r</code> with
+
You should get an R plot window containing the final plot.
  
createTopics(g,baseNames=names)
+
[[Image:Rexample5.png]]
  
We may further add to the created topics by specifying an array of occurrence data in the form
+
We'll now go through the short R code line by line to see what it does.
  
{| class="wikitable"
+
1 associations<-getContextAssociations()
| Occ. type / Vertex ID
+
2 polygons<-getPlayers(associations,"poly")
| 1
+
3 polygons<-lapply(polygons,function(p){extractPolygon(getDisplayName(p),reverse=TRUE)})
| 2
+
4 values<-sapply(getPlayers(associations,"value"),as.numeric)
| 3
+
5 library("JavaGD") # Remove these two lines if you
| 4
+
6 JavaGD()          # aren't doing this on Windows
| ...
+
7 plotPolygons(polygons,values)
|-
+
| Occurrence type 1
+
| value 1
+
| value 2
+
| value 3
+
| value 4
+
| ...
+
|-
+
| Occurrence type 2
+
| value 1
+
| value 2
+
| value 3
+
| value 4
+
| ...
+
|-
+
|...
+
|...
+
|...
+
|...
+
|...
+
|...
+
|}
+
  
As a trivial example we may add the base names as occurrences of type foo with
+
The first line gets the associations we selected in Wandora. The second line gets all the players of those associations that play the role "poly" (strictly speaking "poly" is the base name of the role topic). The third line extracts the polygon data from those topics. This is done by applying the anonymous function to all topics in the ''polygons'' list. The ''extractPolygon'' function extracts the polygon data from a string which we get from the display name of the topic with the ''getDisplayName'' function. The ''reverse'' parameter to ''extractPolygon'' swaps x and y axes. This is done because latitude (the y axis) is first in the polygon data. The fourth line gets the population for each area. This is done by getting the player topics of role "value" and converting them to numeric. Lines five and six setup the JavaGD display device. Finally on line seven the polygons and their associated values are plotted with ''plotPolygons''.
<pre>
+
occ<-list()
+
occ[["foo"]]<-names
+
createTopics(g,baseNames=names,occurrences=occ)
+
</pre>
+
  
[[File:R_1.png|600px]]
+
Note that all of ''getContextAssociations'', ''getPlayers'', ''extractPolygon'', ''getDisplayName'' and ''plotPolygons'' are defined in the ''rinit.r'' R script that is loaded when you first start the R console in Wandora. In addition to this, ''as.numeric'' is extended to handle topic objects there as well.
  
A vertex of a graph imported into Wandora: the name 'second' is set as a base name as well as an occurrence of type foo
+
== R language resources ==
  
 +
This short introduction for R language integration of Wandora applications doesn't even try to teach you the R language. If you find the R language interesting and would like to know more, please refer available on-line [http://cran.r-project.org/manuals.html manuals] such as
  
[[File:R_visualization.png|600px]]
+
* [http://cran.r-project.org/doc/manuals/R-intro.html An introduction to R]
 +
* [http://cran.r-project.org/doc/manuals/R-lang.html R language Definition]
  
The imported graph is here visualized with the [[D3 graph service module | D3 graph visualization]]
+
Also, [http://cran.r-project.org/other-docs.html Contributed Documentation] page has an excellent list of R language resources.
  
 
== See also ==
 
== See also ==
  
* [[R in Wandora]]
+
* [[Statistical analysis of Topic Maps in R]]

Latest revision as of 10:26, 15 July 2013

Wandora can be used with the R language. R is an environment for statistical computing and graphing. Properties of the topic map and its topics can be accessed from R and statistics and graphs can be generated from them.

Contents

[edit] Setting up R

To use R in Wandora you need to install R, then install a few R libraries using the package manager inside R and finally possibly adjust some environment parameters in the Wandora startup script. These steps are explained in detail below.

[edit] Installing R

Download R from the R website at http://www.r-project.org/ and follow the installation instructions there. The default installation installs both 32-bit and 64-bit versions. If you decide to only install one, make sure it matches your Java runtime environment version.

On Linux environments R may also be available using the package repository of your Linux distribution. In Ubuntu the name of the package you need is r-base.

[edit] Installing required R libraries

At the very least you must install the rJava library. Do this using the package manager inside R. First run R using the administrator account. On Windows right click the icon and select "Run as Administrator". The installation may have created two icons on your desktop or the start menu, one for 32-bit R and one for 64-bit. You should use the one that matches your Java runtime environment. Note that you might have a 32-bit Java runtime environment even if your Windows is 64-bit.

On Linux run R with root, for example in console using "sudo R".

When installing a package you will be prompted to select a mirror for download. Just select your country or one close to it. To install the package issue the command in R:

 install.packages("rJava")

Most likely you will also want to install the igraph library. It is needed to plot network graphs.

 install.packages("igraph")

Currently Wandora has a problem with the default graphics device in Windows environment. To be able to plot anything in Windows you will need to install the JavaGD graphics device. This is only needed in Windows.

 install.packages("JavaGD")

[edit] Setting up enviroment variables

Next make sure that the environment variables are setup correctly in the Wandora startup script. In Windows open the bin/SetR.bat file and in Linux the bin/SetR.sh file. If you did a standard installation of R then the Linux start-up script likely needs no changes at all.

The Windows start-up script however has two things that may need adjusting. Make sure the first line points to you R installation directory. Especially the R version number may need to be changed. Also make sure that the processor architecture matches your Java installation. Note that you may have a 32-bit Java even if your system is 64-bit. If you aren't sure which Java version you have you can simply try both settings and see which one works. The architecture is specified on lines 5 and 6. For 32-bit use

 set R_ARCH=i386
 REM set R_ARCH=x64

And for 64-bit

 REM set R_ARCH=i386
 set R_ARCH=x64

Other parameters should be correct unless you have customized your R installation beyond a standard setup.

You should now be able to use R inside Wandora.

[edit] Plotting in Windows

The default graphics device doesn't work correctly in Windows. If you try to plot anything you will get an unresponsive graphics window. If at any time you accidentally open it you can close it cleanly in Wandora R Console with

 dev.off()

To work around this issue you need to use the JavaGD graphics device. You first need to load the JavaGD library with

 library("JavaGD")

Then initialize the graphics device with

 JavaGD()

This will open an empty graphics window. You can then use plot normally to plot in this window.

[edit] R console in Wandora

You can open the R console in Wandora by clicking the R console button in the top toolbar. Assuming that you have installed R and setup the environment correctly you will get the standard R greeting with R version and license information.

 R version 2.11.1 (2010-05-31)
 Copyright (C) 2010 The R Foundation for Statistical Computing
 ISBN 3-900051-07-0
 
 R is free software and comes with ABSOLUTELY NO WARRANTY.
 You are welcome to redistribute it under certain conditions.
 Type 'license()' or 'licence()' for distribution details.
 
   Natural language support but running in an English locale
 
 R is a collaborative project with many contributors.
 Type 'contributors()' for more information and
 'citation()' on how to cite R or R packages in publications.
 
 Type 'demo()' for some demos, 'help()' for on-line help, or
 'help.start()' for an HTML browser interface to help.
 Type 'q()' to quit R.
 
 >

Otherwise you'll get an error message and instructions about how to setup your R environment.

You can issue R commands in the text area at the bottom part of the window. This includes almost everything you can do in R, one notable exception is that the help system doesn't work properly so "?plot" and the like don't do anything. Also a few other functions have been disabled because they don't work very well when R is ran inside Java. These functions include q, quit, demo, contributors and citation.

You can browse the topic map in Wandora while having the R console open. This way you can select topics in the main Wandora window and then get references to those topics in the R environment (see next section).

[edit] Using R with topic maps in Wandora

There are a couple of ways to access the topic map in R. Some of these rely heavily on the Java topic map API used in Wandora. You will need to call the Java methods of the topic objects. To find out more about the API look at the javadocs of Wandora. Mostly you will only need to look at the Topic and TopicMap classes and a few classes related to them. The Java methods are accessed with the $ indexing operator. For example if the variable t contains a topic you can get the base name of that with

 t$getBaseName()

The R environment in Wandora is initialized by running the /build/resources/conf/rinit.r file. This file defines some functions that make accessing the topic map slightly easier. You can get a reference to the topic map object itself with getTopicMap function in R. Alternatively you can get a list of all the topics in the topic map with getAllTopics or the currently selected topics in Wandora with getContextTopics. For example, to get the base names of the currently selected topics in Wandora use

 lapply(getContextTopics(),function(t) t$getBaseName())

To plot something you first need to gather the data you want to plot. For example, you could plot a histogram that visualizes the amount of associations the topics have in the topic map. Note that you may have to copy this and other examples on this page one line at a time.

 ts<-getAllTopics() # get a list of all topics
 as<-lapply(ts,function(t) t$getAssociations()$size()) # as has the number of associations in each topic
 fac<-factor(unlist(as)) # make a factor from as
 plot(fac) # plot the factor

In Windows you should open the JavaGD graphics device before the last plot line. Do this with

 library("JavaGD") # loads the JavaGD, only need to do this once per R session
 JavaGD() # opens a JavaGD graphics window

There is also a function to setup a graph object that can be used with the igraph library. use the makeGraphJava function and pass it a list of topic objects. It returns a graph object that can be plot directly. See the igraph R documentation for more information about how to customize the plot. Especially the plot function may be passed parameters relating to the layout of the graph. Before using the makeGraphJava you have to load the igraph library. For example to plot a network of the currently selected topics in a circular layout use the following. And of course you must have selected some topics in Wandora for this to work.

 library("igraph") # loads the library, only need to do this once per R session
 plot(makeGraphJava(getContextTopics()),layout=layout.circle)

Again in Windows you need to remember to use the JavaGD library before plotting.

Instead of getting a list of selected topics you can get a list of selected associations with getContextAssociations. After this you can get the players with getPlayers which takes as parameters a list of associations and a role, this can either be a topic object or a string giving the base name of the role. So for example you could select some associations and then get the topics playing the role value with

 getPlayers(getContextAssociations(),"value")

You can convert topics to strings or numbers using as.character or as.numeric respectively. These use the topic base name to do the conversion. If you want to use a variant name or get the data from an occurrence you will have to use the topic map API to get the desired value. But if you have your numeric data in the base name and can get it listed in a table in Wandora and then selected then you can get a simple vector of numbers with something like

 sapply( getPlayers(getContextAssociations(),"value"), as.numeric )

Note that the as.numeric is fairly lenient when converting topic base names to numbers and knows how to skip non numeric characters in the base name.

[edit] Example

In this example we will use the SPARQL extractor to extract some data relating to the demographics of Helsinki. We will then make a map plot of the districts of Helsinki with colours indicating the total population in the district.

First we use the SPARQL extractor to extract the data. Go to File menu and select Extract/Other/SPARQL extractor.

Rexample1.png

We are going to use the Helsinki Region Infoshare SPARQL end point so select the HRI tab. Then clear the query text area and replace it with following. It will get the population of all areas of Helsinki that themselves don't have any sub areas.

SELECT ?area ?poly ?value
WHERE { 
  ?area rdf:type dimension:Alue;
     geo:polygon ?poly.
  ?item rdf:type scv:Item;
     rdf:value ?value;
     dimension:ikäryhmä ikäryhmä:Väestö_yhteensä;
     dimension:vuosi vuosi:_2009;
     dimension:yksikkö yksikkö:Henkilöä;
     dimension:alue ?area.
  OPTIONAL { ?narrower skos:broader ?area . }
  FILTER ( !BOUND(?narrower) )
}

Then click Extract.

Rexample2.png

After a few seconds you should get a message informing you that one result set was extracted.

Next find the result set topic on the left hand side (labelled 1 below) and double click it to open the topic. Then select all the rows in the result set by right clicking somewhere on the association table (labelled 2). In the context menu choose Select/Select all'. Then open the R console by clicking the button in the toolbar (labelled 3).

Rexample3.png

Then give the following commands to R. You can copy and paste all of it at once in the input box at the lower part of the window. If you aren't doing this in Windows then you can remove lines 5 and 6 (the JavaGD part). Press enter or the Evaluate button to evaluate the commands.

associations<-getContextAssociations()
polygons<-getPlayers(associations,"poly")
polygons<-lapply(polygons,function(p){extractPolygon(getDisplayName(p),reverse=TRUE)})
values<-sapply(getPlayers(associations,"value"),as.numeric)
library("JavaGD") # Remove these two lines if you
JavaGD()          # aren't doing this on Windows
plotPolygons(polygons,values)

Rexample4.png

You should get an R plot window containing the final plot.

Rexample5.png

We'll now go through the short R code line by line to see what it does.

1 associations<-getContextAssociations()
2 polygons<-getPlayers(associations,"poly")
3 polygons<-lapply(polygons,function(p){extractPolygon(getDisplayName(p),reverse=TRUE)})
4 values<-sapply(getPlayers(associations,"value"),as.numeric)
5 library("JavaGD") # Remove these two lines if you
6 JavaGD()          # aren't doing this on Windows
7 plotPolygons(polygons,values)

The first line gets the associations we selected in Wandora. The second line gets all the players of those associations that play the role "poly" (strictly speaking "poly" is the base name of the role topic). The third line extracts the polygon data from those topics. This is done by applying the anonymous function to all topics in the polygons list. The extractPolygon function extracts the polygon data from a string which we get from the display name of the topic with the getDisplayName function. The reverse parameter to extractPolygon swaps x and y axes. This is done because latitude (the y axis) is first in the polygon data. The fourth line gets the population for each area. This is done by getting the player topics of role "value" and converting them to numeric. Lines five and six setup the JavaGD display device. Finally on line seven the polygons and their associated values are plotted with plotPolygons.

Note that all of getContextAssociations, getPlayers, extractPolygon, getDisplayName and plotPolygons are defined in the rinit.r R script that is loaded when you first start the R console in Wandora. In addition to this, as.numeric is extended to handle topic objects there as well.

[edit] R language resources

This short introduction for R language integration of Wandora applications doesn't even try to teach you the R language. If you find the R language interesting and would like to know more, please refer available on-line manuals such as

Also, Contributed Documentation page has an excellent list of R language resources.

[edit] See also

Personal tools