YLE Elävä arkisto and Arkivet
Finnish National Broadcasting company (YLE) has published metadata of their Elävä arkisto and Arkivet archives. Wandora team has created the Wandora application several extractors that transform the published metadata into a topic maps format. Created extractors will be part of Wandora applications released after 2015-05-06. Before that we are releasing the topic maps created with the Wandora extractors. The data packages used in the topic maps transformation were downloaded 2015-05-06. This page provides download links to the topic maps of transformed data. Also, this page shows some easy use cases for the topic maps data.
Created topic maps may interest Finnish developers exploring YLE's metadata and building applications and visualizations with the data. The information in original metadata and topic maps transformations is mostly in Finnish.
Contents |
Download
The YLE Elävä arkisto and Arkivet topic map is available as Wandora project file:
- yle_elava-arkisto.wpr (6 927 KB)
As the project is very large, we suggest you load the project into the Wandora application with a memory footprint of 16G. Start the Wandora application with a startup script Wandora-16g.bat or Wandora-16g.sh.
Addition to Wandora project file, the topic maps data is available as XTM and JTM topic maps:
- yle_elava-arkisto_xtm.zip (7 133 KB, uncompressed 236 896 KB)
- yle_elava-arkisto_jtm.zip (10 678 KB, uncompressed 281 279 KB)
These topic map files can be used in other topic maps application too.
License
Metadata license is Creative Commons Attribution-ShareAlike 4.0.
Original metadata provider is Finnish National Broadcasting company (YLE) and Elävä arkisto and Arkivet archives.
About the topic maps
The Elävä arkisto and Arkivet topic maps are wrapped into a Wandora project file. The project file is loaded into the Wandora application with a menu option File > Open project. The project contains five topic map layers. Each layer contains information from one source file. As all source files use identifiers consistently, same topics in different layers merge. The topic map layers are
- Base layer holds Wandora's base ontology.
- yle - elava-arkisto - articles includes metadata from articles.cvs available at http://elavaarkisto.kokeile.yle.fi/data/articles.csv . The topic map contains topics for article web pages including article identifier, name, source service and article language.
- yle - elava-arkisto - article addons includes metadata from articles-additional-fields.cvs available at http://elavaarkisto.kokeile.yle.fi/data/articles-additional-fields.csv . The topic map contains article editors, contributors and additional identifiers.
- yle - elava-arkisto - article media includes metadata from media-article.cvs available at http://elavaarkisto.kokeile.yle.fi/data/media-article.csv . The topic map contains media identifiers that are linked to the articles. Media identifiers can be used to resolve videos and audio tracks of the articles.
- yle - elava-arkisto - tags includes metadata from article-tags.cvs available at http://elavaarkisto.kokeile.yle.fi/data/article-tags.csv . The topic map contains tags linked to the articles.
Once the project file is opened in the Wandora application, the user should see all available topic map layers left bottom of the application window. The topic tree left top views Elava-arkisto topic that can be used to start browsing the data. Next screen capture views the Wandora application after the user has opened the topic Elava-arkisto article to the Traditional topic panel and has scrolled downwards to the instances of the article. The Layer info panel right shows some statistics of the topic map layer yle - elava-arkisto - articles. The number of articles is close to 12067 (the layer contains also some extra topics that are not articles).
Next the Wandora user opens up the topic Elava-arkisto article in the tree left and locates article Nalle puh (20-92781). The number that follows article name is an identifier for the article. The identifier is also stored into the article topic as an occurrence. The user double mouse clicks the Nalle puh (20-92781) article and Wandora opens it into the Traditional topic panel. The use switches current layer to yle - elava-arkisto - tags and the layer statistics in the right change.
The Wandora user opens up the article topic Nalle puh (20-92781) to Webview and chooses to view topic's subject locator. Article's subject locator is the actual WWW address of the article in YLE Elävä arkisto. Wandora views the WWW page. Unfortunately Wandora can't view embedded videos as they require Flash plugin. Wandora's Webview is based on Java's Webview component that doesn't support Flash videos at the moment.
The Wandora user closes the Webview and scrolls down to the Elava-arkisto tag associations of topic Nalle puh (20-92781). Article tags include Matti Pellonpää (tag). Matti Pellonpää is a known Finnish actor.
Wandora user double mouse clicks the topic Matti Pellonpää (tag) to view all articles that are tagged with the topic. It appears that Elävä arkisto contains 13 articles that somehow relate to the Matti Pellonpää. Now the user could continue browsing to articles tagged with Matti Pellonpää (tag).
Finally the user opens up the Webview once again and chooses to view the Timeline. This feature is not available in current (2015-05-06) official Wandora release yet. The articles are viewed in the timeline along their publishing date.
See also
You can do a lot more with YLE Elävä arkisto and Arkivet topic maps. Look at Wandora's documentation for ideas. Some obvious ideas:
- Create a simple Embedded HTTP server service that wraps media identifiers into a HTML/Javascript code that implement a viewer for the media pieces.
- Create a simple Embedded HTTP server service API that outputs article data in a machine readable format for a given tag.
- Export articles and tags into GML graph and visualize it in Gephi application.
- Try GATE/ANNIE to extract person and organization topics out of editors and contributors occurrences.
- Try Stanford Named Entity Recognizer integration to extract person and organization topics out of editors and contributors occurrences.
- Expand tag topics with Freebase extractor.
If you are new to Wandora, see Wandora's web site. You can download Wandora application here. Addition to YLE Elävä arkisto and Arkivet topic maps transformation, we have converted several other data packages into topic maps format. See Topic map gallery.