Database extractor
Wandora has a feature to import any relational database. Feature introduces a general conversion from SQL database to topic maps.
Contents |
Using the database import tool
Database conversion is done with the Database Extractor tool found in File > Extract > Other menu. After launching the tool, it will ask for the database to import. Either select the database if it is already configured or create a new database entry and fill in appropriate connection details.
After selecting the database the tool will show the database schema with options on how to process each column. You can ignore some columns by unchecking the include check box. To ignore a complete table, uncheck all its columns. You can also choose to make topics from column values instead of making occurrences (see conversion details below). You can also choose which column will be used as topic base name.
After confirming import options the import process will start. Note that the import process may take anywhere from seconds to hours depending on the size of the database.
Postprocessing after conversion
The conversion works for (nearly) any kind of relational database and thus it is very general. For this reason converted databases usually need post prosessing, either manually or with other tools, before using in a production environment.
Post processing should at least include renaming association, role and topic types and possibly merging some of these topics. Some occurrences or association members may be unneeded and can be deleted with built in Wandora tools.
Traditional databases will also often refer to same concepts in several different ways. For example names of people may have been written last or first name first, may or may not include middle initials and may contain typing errors. All these should be corrected so that a single concept (in this case a person) only has a single topic.
Conversion details
Basic idea is that each row of each table is converted into a topic, there are some exceptions to this rule. Since primary key(s) identify the row inside the table, it is natural to use the pimary keys with the table name as the subject identifier of the topic.
Values of each normal column, that is column that is not a reference to some other (or same) table, is converted into either an occurrence or a new topic which is then associated with the topic representing the row. Creating topics for column values is natural when the value is shared by many rows, for example when the value represents a year. Creating occurrences is better suited for situations where most or all rows have unique values and creating separate topics would only make the topic map more complex without offering any additional linkage between topics.
Columns that reference some other column in another or same table are converted into associations. This is the natural thing to do as the referenced row is also a topic and the reference associates these two topics (rows). Note: Current version of Wandora does not support database schemas where several columns together reference another row.
Tables that only contain references are treated differently. Each row in such a table is converted into an association instead of a topic. This is done because each row links several topics without adding any other information. Thus creating an association for each row is more meaningful than creating a topic.
Tables with only one primary key that is also a foreign key are also treated slightly differently. This kind of table can usually be thought to be extending another row in the table that is referenced. For this reason subject identifier is created with the table name of the referenced table. This will cause topics of both this and the referenced tables to merge. This means that information from both tables will be in the same topic.
Troubleshooting
Zero dates in MySQL database may cause some trouble in SQL database import. Default behavior for MySQL JDBC driver is to throw an exception whenever such a value is encountered. This results in an error message in Wandora:
org.wandora.topicmap.TopicMapException: java.sql.SQLException: Cannot convert value '0000-00-00 00:00:00' from column 3 to TIMESTAMP.
This behavior can be changed with connection parameter as documented here. However, using the MySQL template in database configuration hides the actual connection string so you can't add any connection parameters. You must use database type Other. Then you have complete control over the connection. For MySQL the database driver is com.mysql.jdbc.Driver and connection string is jdbc:mysql://server/database. You can add other parameters after the conection string, in this case we want to specify zeroDateTimeBehavior so use connection string jdbc:mysql://server/database?zeroDateTimeBehavior=convertToNull. Remember to replace server and database with your database IP address and database name respectively.