YLE Elävä arkisto and Arkivet
(→About the topic maps) |
(→About the topic maps) |
||
Line 33: | Line 33: | ||
− | Next the Wandora user opens up the topic '''Elava-arkisto article''' in the tree left and locates article '''Nalle puh (20-92781)'''. The number that follows article name is an identifier for the article. The identifier is also stored into the article topic as occurrence. The user double mouse clicks the '''Nalle puh (20-92781)''' article and Wandora opens it into the [[Traditional topic panel]]. The use switches current layer to '''yle - elava-arkisto - tags''' and the layer statistics in the right change | + | Next the Wandora user opens up the topic '''Elava-arkisto article''' in the tree left and locates article '''Nalle puh (20-92781)'''. The number that follows article name is an identifier for the article. The identifier is also stored into the article topic as an occurrence. The user double mouse clicks the '''Nalle puh (20-92781)''' article and Wandora opens it into the [[Traditional topic panel]]. The use switches current layer to '''yle - elava-arkisto - tags''' and the layer statistics in the right change. |
Line 39: | Line 39: | ||
− | The Wandora user opens up article topic '''Nalle puh (20-92781)''' to [[Webview]] and chooses to view topic's subject locator. Article's subject locator is the actual WWW address of the article in YLE Elävä arkisto. Wandora views the WWW page. Unfortunately Wandora can't view embedded videos as they require Flash plugin. Wandora's Webview is based on Java's Webview component that doesn't support Flash videos at the moment. | + | The Wandora user opens up the article topic '''Nalle puh (20-92781)''' to [[Webview]] and chooses to view topic's subject locator. Article's subject locator is the [http://yle.fi/aihe/artikkeli/2008/02/19/nalle-puh actual WWW address of the article] in YLE Elävä arkisto. Wandora views the WWW page. Unfortunately Wandora can't view embedded videos as they require Flash plugin. Wandora's Webview is based on Java's Webview component that doesn't support Flash videos at the moment. |
Revision as of 11:33, 7 May 2015
Finnish National Broadcasting company (YLE) has published metadata of their Elävä arkisto and Arkivet archives. Wandora team has created the Wandora application several extractors that transform the published metadata into a topic maps format. Created extractors will be part of Wandora applications released after 2015-05-06. The data packages used in the topic maps transformation were downloaded 2015-05-06. This page provides download links to the topic maps of transformed data. Also, this page shows some easy use cases for the topic maps data.
Created topic maps may interest Finnish developers exploring YLE's metadata and building applications and visualizations with the data. The information in original metadata and topic maps transformations is mostly in Finnish.
Contents |
Download
The YLE Elävä arkisto and Arkivet topic map is available as Wandora project file:
As the project is very large, we suggest you load the project into Wandora application with a memory footprint of 16G. Start Wandora application with a startup script Wandora-16g.bat or Wandora-16g.sh.
License
Metadata license is Creative Commons Attribution-ShareAlike 4.0.
Original metadata provider is Finnish National Broadcasting company (YLE) and Elävä arkisto and Arkivet archives.
About the topic maps
The Elävä arkisto and Arkivet topic maps are wrapped into a Wandora project file. The project file is loaded into the Wandora application with a menu option File > Open project. The project contains five topic map layers. Each layer contains information from one source file. As all source files use identifiers consistently, same topics in different layers merge. The topic map layers are
- Base layer holds Wandora's base ontology.
- yle - elava-arkisto - articles includes metadata from articles.cvs available at http://elavaarkisto.kokeile.yle.fi/data/articles.csv . The topic map contains topics for article web pages including article identifier, name, source service and article language.
- yle - elava-arkisto - article addons includes metadata from articles-additional-fields.cvs available at http://elavaarkisto.kokeile.yle.fi/data/articles-additional-fields.csv . The topic map contains article editors, contributors and additional identifiers.
- yle - elava-arkisto - article media includes metadata from media-article.cvs available at http://elavaarkisto.kokeile.yle.fi/data/media-article.csv . The topic map contains media identifiers that are linked to the articles. Media identifiers can be used to resolve videos and audio tracks of the articles.
- yle - elava-arkisto - tags includes metadata from article-tags.cvs available at http://elavaarkisto.kokeile.yle.fi/data/article-tags.csv . The topic map contains tags linked to the articles.
Once the project file is opened in the Wandora application, the user should see all available topic map layers left bottom of the application window. The topic tree left top views Elava-arkisto topic that can be used to start browsing the data. Next screen capture views the Wandora application after the user has opened the topic Elava-arkisto article to the Traditional topic panel and has scrolled downwards to the instances of the article. The Layer info panel right shows some statistics of the topic map layer yle - elava-arkisto - articles. The number of articles is close to 12067 (the layer contains also some extra topics that are not articles).
Next the Wandora user opens up the topic Elava-arkisto article in the tree left and locates article Nalle puh (20-92781). The number that follows article name is an identifier for the article. The identifier is also stored into the article topic as an occurrence. The user double mouse clicks the Nalle puh (20-92781) article and Wandora opens it into the Traditional topic panel. The use switches current layer to yle - elava-arkisto - tags and the layer statistics in the right change.
The Wandora user opens up the article topic Nalle puh (20-92781) to Webview and chooses to view topic's subject locator. Article's subject locator is the actual WWW address of the article in YLE Elävä arkisto. Wandora views the WWW page. Unfortunately Wandora can't view embedded videos as they require Flash plugin. Wandora's Webview is based on Java's Webview component that doesn't support Flash videos at the moment.
The Wandora user closes the Webview and scrolls down to the Elava-arkisto tag associations of topic Nalle puh (20-92781). Article tags include Matti Pellonpää (tag). Matti Pellonpää is a known Finnish actor.
Wandora user double mouse clicks the topic Matti Pellonpää (tag) to view all articles that are tagged with the topic. It appears that Elävä arkisto contains 13 articles that somehow relate to the Matti Pellonpää.
Finally the user opens up the Webview once again and chooses to view the Timeline. This feature is not available in current (2015-05-06) official Wandora release yet. The articles are viewed in the timeline along their publishing date.
See also
You can do a lot more with YLE Elävä arkisto and Arkivet topic maps. Look at Wandora's documentation for ideas. Some obvious ideas:
- Create a simple Embedded HTTP server service that wraps media identifiers into a HTML/Javascript code that implement a viewer for the media pieces.
- Create a simple Embedded HTTP server service API that outputs article data in a machine readable format for a given tag.
- Export articles and tags into GML graph and visualize it in Gephi application.
- Try GATE/ANNIE to extract person and organization topics out of editors and contributors occurrences.
- Try Stanford Named Entity Recognizer integration to extract person and organization topics out of editors and contributors occurrences.
- Expand tag topics with Freebase extractor.