SOM classifier
Wandora's SOM (Self Organizing Map) classifier generates two dimensional artificial neural network and trains the network with given topic map associations. As a consequence topics with similar associations end up to locate near each other in the neural network. After training Wandora user can view, analyze, and create new association using the neural network. For example, a grouping association can be created to represent topic similarity. Consider Wandora's SOM classifier as an experimental feature.
To start with SOM classifier select large number of binary associations and select SOM classifier... in context of your selection. Classifier requests first grouping role. Grouping role players specify labels for the training vectors. Lets say user has selected eight binary associations
a-q a-e a-t b-e b-l c-q c-o d-e
Here letters represent association players. All associations have same type and roles. User selects first player as the grouping role. Training vector labels are a, b, c, and d. Training vectors for this selection are:
a: [ q=1, e=1, t=1, l=0, o=0 ] b: [ q=0, e=1, t=0, l=1, o=0 ] c: [ q=1, e=0, t=0, l=0, o=1 ] d: [ q=0, e=1, t=0, l=0, o=0 ]
in other words
a: [ 1, 1, 1, 0, 0 ] b: [ 0, 1, 0, 1, 0 ] c: [ 1, 0, 0, 0, 1 ] d: [ 0, 1, 0, 0, 0 ]
When training vectors are ready, Wandora creates a neuron matrix. Matrix is a two dimensional plane with neuron cells. Each neuron contains a random vector with equal dimension to training vectors. In our example this dimension is 5. Random vector contains 5 randomly selected ones (1) and zeros (0). Example of random vector in our matrix is [1, 0, 1, 0, 1]. Then Wandora trains the matrix with training vectors. Training could be described with algorithm:
loop select training vector for selected training vector select best matching matrix node for best matching matrix node adjust the best matching vector in node and it's neighbours with training vector
When training ends, the neuron matrix contains adapted neurons. Due to the training method, neurons resemble training vectors and nearby neurons are generally more similar than distant neurons. Multi dimensional input space i.e. training vectors can now be visualized with two dimensional plane i.e the matrix.
Finally Wandora opens up a visualization of the neuron matrix and places grouping topics to the matrix cell that best matches the training vector. Visualization allows the user to create new associations grouping similar topics i.e. topics near each other for example.
Additional notes
- If your training set is very large (>1000 topics), the matrix training may take a while.
- Different runs with same source data may result different SOMs due initial random vectors in neurons matrix.
- You probably get best results if training vectors are not orthogonal.
SOM classification example
Lets say we have a large topic map containing artworks and keywords. Each artwork has been tagged with relevant keywords i.e topic map contains association between artworks and keywords. In our example each keyword tags multiple artworks and each artwork has multiple tags. We select all the artwork-keyword associations as shown below and start SOM classifier in context of selection. We select Artworks as the grouping role.
Wandora builds up training vectors and neuron matrix, and trains the matrix with vectors. Finally training sequence ends and Wandora opens up a visualization of the matrix. Then Wandora places all artwork topics to best matching nodes of the matrix. Below is an example of such visualization.
Now Wandora user can browse the visualization and perform topic map operations using the collected data. For example Wandora user may create associations that group topics in selected nodes. In general topics placed into same node should somehow resemble each other. Below is a screenshot of SOM visualization where user has selected three nodes and is about to create such grouping associations.
Below is screenshot of associations created in previous screenshot. Note the association type and roles. As the semantic interpretation of topics in same node is not evident, a general association type SOM-Group-1220093644251 is used. The number sequence is there just to distinguish different groups. User can change the base name and subject identifiers of group's association type later in Wandora, if necessary. Each group is an instance of SOM-Group topic.
In this example I have used a dataset containing artworks and keywords provided by Finnish National Gallery. Each artwork is labeled with artwork's identification code and keywords are in Finnish.