Open data movement is taking steps forward here in Finland. Some time ago Helsinki city library [1] released all their library data as a huge MARCXML dump [2] using rather liberal license of Creative Commons Attribution-ShareAlike 1.0. I though it could be a nice thing to have the MARCXML dump converted to a Topic Maps format, and asked Topic Maps mailing list whether anybody has made such conversions [3].
It was a bit surprise to me that the question raised such a vivid debate resulting more than twenty submissions to the mailing list in a very short time. Thanks to all who shared their experiences. Another surprising observation was that there really was no MARCXML to Topic Maps transformation available except [4]. Thank you Maria.
After some investigations I decided to write the MARCXML to Topic Maps conversion from scratch. Reading peoples opinions and experiences about MARC, it became clear that one shouldn't try to make a semantic transformation but more likely a wrapper transformation. In other words, to try model MARCXML schema using Topic Maps. It took couple days to make a first draft of the conversion feature [6]. Then few more days to fix bugs and add some new features such as batch conversion [7]. The MARCXML to Topic Maps conversion feature will be part of next Wandora release published late August 2010.
Back to the MARCXML dumps of Helsinki city library. The data has been divided into 69 MARCXML dumps, each containing 10000 records. Having converted some of these dumps, it looks like 10000 records explodes to ~100000 topics and ~200000 associations. It is already clear one just can't make a single (memory) topic map out of all that data. Near 7 million topics and 14 million associations is just too much to Wandora. One option could be a database topic map. Another option, perhaps the best one is to divide the topic map also to 69 sub-topic maps, and leave merge to the user.
Now, what can you to with the Topic Maps conversion of Helsinki city library data then. Well, I found it very interesting to browse all that data using Wandora. Associative data model of Topic Maps makes it very enjoyable to float around the data using keywords, person names, organizations, etc as pathways to the next room. When the data is a topic map, user can also easily merge his/her own data to the whole, and of course export the data in some interesting formats. One friend figured immediately a solution where users can make libraries of their own, give recommendations, and simply thumb books they like. The road is open.
Kind Regards,
Aki
[1] http://www.helmet.fi/
[2] http://data.kirjastot.fi/
[3] http://www.infoloom.com/pipermail/topic ... 08227.html
[4] http://www.infoloom.com/pipermail/topic ... 08231.html
[5] http://www.springerlink.com/content/e2q56340t07g795w/
[6] http://www.infoloom.com/pipermail/topic ... 08298.html
[7] http://www.infoloom.com/pipermail/topic ... 08314.html