Excel extractors

From WandoraWiki
Jump to: navigation, search

Excel is a spreadsheet application created by Microsoft. Excel saves spreadsheets in XSL file format. Wandora contains several Excel extractors that read and transform XSL files into a topic maps format. Wandora's Excel extractors locate under menu File > Extract > Excel. Available Excel extractors are

  • Excel deep topic extractor
  • Excel adjacency list extractor
  • Excel adjacency matrix extractor
  • Excel topic tree extractor
  • Excel topic occurrences extractor
  • Excel topic names extractor
  • Excel sheet extractor

Excel deep topic extractor

Excel deep topic extractor interprets each spreadsheet cell into a topic and associates the topic with property-topics derived from cell's properties. Transformed cell properties include cell's location i.e. coordinates in the spreadheet, cell's format, cell's formula, cell's comment and cell's color.

Reader should note that the Excel deep topic extractor doesn't associate topics created out of different spreadsheet cells.

Excel adjacency list extractor

Excel adjacency list extractor interprets each spreadsheet row as an association. First row specifies association role types. Each column represents one player in the association. If column has no value, the player is ignored. Association type is a default Excel association type topic.

Next screen capture views a simple adjacency list in OpenOffice Calc. First row contains roles and all next rows association player pairs.


Excel adjacency list.png


When the XSL file is extracted, the Wandora creates a topic for each spreadsheet cell and an association for each row (except the first row). Next screen capture views Wandora's association type topic and all associations.


Excel adjacency list extracted.png

Excel adjacency matrix extractor

Excel adjacency matrix extractor interprets the spreadsheet as a matrix where first row and first column are topics. Whenever a spreadsheet cell has a value, Wandora creates an association between column and row topics.

Next screen capture views an adjacency matrix in OpenOffice Calc. First row and column contains labels that will be topics in Wandora. Cell value X marks an association between row and column topics.


Excel adjacency matrix.png


When the XSL file is extracted, Wandora creates topics and associations. Next screen capture views Wandora after extraction. Wandora views all created associations.


Excel association matrix extracted.png


Wandora features a Graph topic panel used to visualize topics. Next screen capture views created adjacency matrix associations in Graph topic panel. Graph contains three separate subgraphs. Graph topic panel doesn't view topic's associations with itself. Hence the Player9 seems to be alone.


Excel association matrix extracted as graph.png

Excel topic tree extractor

Excel topic tree extractor interprets the spreadsheet as a vertical list of topics where indented topics are associated with the previous outer topic. For example, topics created from B2 and B3 are associated with the topic created from A1. More over, a topic created from C4 is associated with the topic created from B3.

Excel topic occurrences extractor

Excel topic occurrences extractor assumes spreadsheet's first column contains occurrence carrier topics. First row contains occurrence types. All other cells in the spreadsheet contain occurrence values.

Excel topic names extractor

Excel topic names extractor is similar to the Excel topic occurrence extractor but creates variant names. Extractor assumes spreadsheet's first column contains named topics. First row contains scope topics i.e. language topics. All other cells in the spreadsheet contain names.

Excel sheet extractor

Extractors described above process all sheets in the spreadsheet file. Sometimes this is not a good idea. Different sheets may have different cell layout. Excel sheet extractor is a meta extractor that collects all other extractors under a single interface and allows the user to decide which extractor is applied to which spreadsheet sheet. User can also skip sheets.

Spreadsheet cell topics

Previous chapters described different Excel extractors. Wandora creates a topic out of each non-empty spreadsheet cell. By default topic's subject identifier is created out of cell value by attaching a prefix http://www.wandora.org/si/excel/cell/ into the url encoded cell value. Unmodified cell value is used as topic's basename. Each cell topic is typed with a special cell type topic. Cell type topic's subject identifier is http://www.wandora.org/si/excel/cell and basename Excel cell.

See also


Personal tools