SOM classifier
Wandora's SOM (Self Organizing Map) classifier generates two dimensional artificial neural network and trains the network with given topic map associations. As a consequence topics with similar associations end up to locate near each other in the neural network. After training Wandora user can view, analyze, and create new association using the neural network. For example, a grouping association can be created to represent topic similarity. Consider Wandora's SOM classifier as an experimental feature.
To start with SOM classifier select large number of binary associations and select SOM classifier... in context of your selection. Classifier requests first grouping role. Grouping role players specify labels for the training vectors. Lets say user has selected eight binary associations
a-q a-e a-t b-e b-l c-q c-o d-e
Here letters represent association players. All associations have same type and roles. User selects first role as the grouping role. Grouping topics are first players of associations: a, b, c, and d. Thus training vector labels are a, b, c, and d. About training vectors:
- Training vector dimension is equal to the number of different player topics in second role. In our example second role players are q, e, t, l, and o. Thus training vector dimension is 5.
- If grouping topic is associated with the second role player the value in vector slot is 1.
- If grouping topic is not associated with the second role player the value in vector slot is 0.
This gives us training vectors for the selection:
a: [ q=1, e=1, t=1, l=0, o=0 ] b: [ q=0, e=1, t=0, l=1, o=0 ] c: [ q=1, e=0, t=0, l=0, o=1 ] d: [ q=0, e=1, t=0, l=0, o=0 ]
In other words
a: [ 1, 1, 1, 0, 0 ] b: [ 0, 1, 0, 1, 0 ] c: [ 1, 0, 0, 0, 1 ] d: [ 0, 1, 0, 0, 0 ]
When training vectors are ready, Wandora creates a neuron matrix. Matrix is a two dimensional plane with neuron cells. Each neuron contains a random vector with equal dimension to training vectors. In our example this dimension is 5. Random vector contains 5 randomly selected ones (1) and zeros (0). Example of random vector in our matrix is [1, 0, 1, 0, 1]. Then Wandora trains the matrix with training vectors. Training can be described with algorithm:
loop select training vector for the selected training vector select best matching matrix node for the best matching matrix node adjust the best matching vector in node and it's neighbors with the training vector
When training ends, the matrix contains adapted neurons. Due to the training method, neurons resemble training vectors and nearby neurons are generally more similar than distant neurons. Multi dimensional input space i.e. training vectors can now be visualized with two dimensional plane i.e the matrix.
Finally Wandora opens up a visualization of the neuron matrix and places grouping topics to the matrix cell that best matches the training vector. Visualization allows the user to create new associations grouping similar topics i.e. topics near each other for example.
Additional notes
- If your training set is very large (>1000 grouping topics and/or training vector dimension is >1000), the matrix training may take a while.
- Different runs with same source data may result different SOMs due initial random vectors in neurons matrix.
- My intuition says you get best results if training vectors are not orthogonal.
- SOM classifier assumes source associations are binary.
- For more information about Self Organizing Maps read
SOM classification example
Lets say we have a large topic map containing artworks and keywords. Each artwork has been tagged with relevant keywords i.e topic map contains associations between artworks and keywords. In our example each keyword tags multiple artworks and each artwork has multiple tags. We select all the artwork-keyword associations as shown below and start SOM classifier in context of selection. We select Artworks as the grouping role.
Wandora builds up training vectors and neuron matrix, and trains the matrix with vectors. Finally training sequence ends and Wandora opens up a visualization of the matrix. Then Wandora places all artwork topics to best matching nodes of the matrix. Below is an example of such visualization. White edge squares represent neuron matrix cells. Empty squares have no matching training vectors that is artworks in our example. Cells with tiny text have matching training vectors. Some cells have even multiple matching training vectors. Each text line in the cell represents one match. Black horizontal pixel line at the bottom of the cell is a counter for matches. One pixel in the line represents one match.
User can now draw conclusions using the visualization
- Training vectors (=artworks) locating same cell ought to be very similar, even identical.
- Training vectors (=artworks) locating near cells ought to be similar.
- Training vectors (=artworks) locating far away cells ought to be different.
Now Wandora user can browse the visualization and perform topic map operations using the collected data. Bottom left slider is used to zoom in the matrix. Clicking a cell in the matrix selects cells. To select multiple cells hold SHIFT or CTRL key down. Right clicking the cell reveals context menu with options:
- Selecting and deselecting cells
- Copying cell data to system clipboard. This may be useful if you want further analyze outcome of the SOM learning process.
- Copy matrix as an image.
- Create associations between topics labeling training vectors. You can create associations with Group and Permutate options. Group options create a specific group topic and associates each training vector label topic with the group topic (see image below). Permutate option is used create associations between training vector label topics.
For example Wandora user may create associations that group topics in selected nodes. In general topics placed into same node should somehow resemble each other. Below is a screenshot of SOM visualization where user has selected three nodes and is about to create such grouping associations.
Below is screenshot of associations created in previous screenshot. Note the association type and roles. As the semantic interpretation of topics in same node is not clear, a general association type SOM-Group-1220093644251 is used. The number sequence is there just to distinguish different groups. User can change the base name and subject identifiers of group's association type later in Wandora, if necessary. Each group is an instance of SOM-Group topic.
In this example I have used a dataset containing artworks and keywords provided by Finnish National Gallery. Each artwork is labeled with artwork's identification code and keywords are in Finnish.