YLE Elävä arkisto and Arkivet

From WandoraWiki
(Difference between revisions)
Jump to: navigation, search
(Download)
(See also)
 
(22 intermediate revisions by one user not shown)
Line 1: Line 1:
[http://yle.fi/ Finnish National Broadcasting company] (YLE) has [http://elavaarkisto.kokeile.yle.fi/data/ published metadata of their Elävä arkisto and Arkivet archives]. We have created Wandora several extractors that transform the published metadata into Topic Maps format and publish now the topic maps metadata. Extractors used for metadata transformation will be part of Wandora releases released after 2015-05-6.  
+
[http://yle.fi/ Finnish National Broadcasting company] (YLE) has [http://elavaarkisto.kokeile.yle.fi/data/ published metadata of their Elävä arkisto and Arkivet archives]. Wandora team has created the Wandora application several extractors that transform the published metadata into a topic maps format. Created extractors will be part of Wandora applications released after 2015-05-06. Before that we are releasing the topic maps created with the Wandora extractors. The data packages used in the topic maps transformation were downloaded 2015-05-06. This page provides download links to the topic maps of transformed data. Also, this page shows some easy use cases for the topic maps data.
Data packages used in the topic maps transformation were downloaded 2015-05-06.
+
  
Created topic maps may interest Finnish developers exploring YLE's metadata and building applications and visualizations with the data.  
+
Created topic maps may interest Finnish developers exploring YLE's metadata and building applications and visualizations with the data. The information in original metadata and topic maps transformations is mostly in Finnish.
  
 
== Download ==
 
== Download ==
Line 8: Line 7:
 
The YLE Elävä arkisto and Arkivet topic map is available as Wandora project file:
 
The YLE Elävä arkisto and Arkivet topic map is available as Wandora project file:
  
* [http://www.wandora.org/download/other/yle_elava-arkisto.wpr yle_elava-arkisto.wpr]
+
* [http://www.wandora.org/download/other/yle_elava-arkisto.wpr yle_elava-arkisto.wpr] (6 927 KB)
 +
 
 +
As the project is very large, we suggest you load the project into the Wandora application with a memory footprint of 16G. Start the Wandora application with a startup script '''Wandora-16g.bat''' or '''Wandora-16g.sh'''.
 +
 
 +
Addition to Wandora project file, the topic maps data is available as XTM and JTM topic maps:
 +
 
 +
* [http://www.wandora.org/download/other/yle_elava-arkisto_xtm.zip yle_elava-arkisto_xtm.zip] (7 133 KB, uncompressed 236 896 KB)
 +
* [http://www.wandora.org/download/other/yle_elava-arkisto_jtm.zip yle_elava-arkisto_jtm.zip] (10 678 KB, uncompressed 281 279 KB)
 +
 
 +
These topic map files can be used in other topic maps application too.
  
 
== License ==
 
== License ==
  
 
Metadata license is [http://creativecommons.org/licenses/by-sa/4.0/legalcode Creative Commons Attribution-ShareAlike 4.0].
 
Metadata license is [http://creativecommons.org/licenses/by-sa/4.0/legalcode Creative Commons Attribution-ShareAlike 4.0].
 +
 +
Original metadata provider is [http://yle.fi/ Finnish National Broadcasting company] (YLE) and Elävä arkisto and Arkivet archives.
 +
 +
== About the topic maps ==
 +
 +
The Elävä arkisto and Arkivet topic maps are wrapped into a [[How to save and load project|Wandora project file]]. The project file is loaded into the Wandora application with a menu option '''File > Open project'''. The project contains five topic map layers. Each layer contains information from one source file. As all source files use identifiers consistently, same topics in different layers merge. The topic map layers are
 +
 +
* '''Base''' layer holds Wandora's base ontology.
 +
* '''yle - elava-arkisto - articles''' includes metadata from '''articles.cvs''' available at http://elavaarkisto.kokeile.yle.fi/data/articles.csv . The topic map contains topics for article web pages including article identifier, name, source service and article language.
 +
* '''yle - elava-arkisto - article addons''' includes metadata from '''articles-additional-fields.cvs''' available at http://elavaarkisto.kokeile.yle.fi/data/articles-additional-fields.csv . The topic map contains article editors, contributors and additional identifiers.
 +
* '''yle - elava-arkisto - article media''' includes metadata from '''media-article.cvs''' available at http://elavaarkisto.kokeile.yle.fi/data/media-article.csv . The topic map contains media identifiers that are linked to the articles. Media identifiers can be used to resolve videos and audio tracks of the articles.
 +
* '''yle - elava-arkisto - tags''' includes metadata from '''article-tags.cvs''' available at http://elavaarkisto.kokeile.yle.fi/data/article-tags.csv . The topic map contains tags linked to the articles.
 +
 +
Once the project file is opened in the Wandora application, the user should see all available topic map layers left bottom of the application window. [[Working with topic trees|The topic tree]] left top views '''Elava-arkisto''' topic that can be used to start browsing the data. Next screen capture views the Wandora application after the user has opened the topic '''Elava-arkisto article''' to the [[Traditional topic panel]] and has scrolled downwards to the instances of the article. The Layer info panel right shows some statistics of the topic map layer '''yle - elava-arkisto - articles'''. The number of articles is close to 12067 (the layer contains also some extra topics that are not articles).
 +
 +
 +
[[File:yle_elavaarkisto_example01.gif|center]]
 +
 +
 +
Next the Wandora user opens up the topic '''Elava-arkisto article''' in the tree left and locates article '''Nalle puh (20-92781)'''. The number that follows article name is an identifier for the article. The identifier is also stored into the article topic as an occurrence. The user double mouse clicks the '''Nalle puh (20-92781)''' article and Wandora opens it into the [[Traditional topic panel]]. The use switches current layer to '''yle - elava-arkisto - tags''' and the layer statistics in the right change.
 +
 +
 +
[[File:yle_elavaarkisto_example04.gif|center]]
 +
 +
 +
The Wandora user opens up the article topic '''Nalle puh (20-92781)''' to [[Webview]] and chooses to view topic's subject locator. Article's subject locator is the [http://yle.fi/aihe/artikkeli/2008/02/19/nalle-puh actual WWW address of the article] in YLE Elävä arkisto. Wandora views the WWW page. Unfortunately Wandora can't view embedded videos as they require Flash plugin. Wandora's Webview is based on Java's Webview component that doesn't support Flash videos at the moment.
 +
 +
 +
[[File:yle_elavaarkisto_example06.gif|center]]
 +
 +
 +
The Wandora user closes the [[Webview]] and scrolls down to the '''Elava-arkisto tag''' associations of topic '''Nalle puh (20-92781)'''. Article tags include '''Matti Pellonpää (tag)'''. [http://fi.wikipedia.org/wiki/Matti_Pellonp%C3%A4%C3%A4 Matti Pellonpää] is a known Finnish actor.
 +
 +
 +
[[File:yle_elavaarkisto_example07.gif|center]]
 +
 +
 +
Wandora user double mouse clicks the topic '''Matti Pellonpää (tag)''' to view all articles that are tagged with the topic. It appears that Elävä arkisto contains 13 articles that somehow relate to the Matti Pellonpää. Now the user could continue browsing to articles tagged with '''Matti Pellonpää (tag)'''.
 +
 +
 +
[[File:yle_elavaarkisto_example08.gif|center]]
 +
 +
 +
Finally the user opens up the [[Webview]] once again and chooses to view [[Timeline service module|the Timeline]]. This feature is not available in current (2015-05-06) official Wandora release yet. The articles are viewed in the timeline along their publishing date.
 +
 +
 +
[[File:yle_elavaarkisto_example09.gif|center]]
 +
 +
== See also ==
 +
 +
You can do a lot more with YLE Elävä arkisto and Arkivet topic maps. Look at Wandora's [[Wandora|documentation]] for ideas. Some obvious ideas:
 +
 +
* Create a simple [[Embedded HTTP server]] service that wraps media identifiers into a HTML/Javascript code that implement a viewer for the media pieces.
 +
* Create a simple [[Embedded HTTP server]] service API that outputs article data in a machine readable format for a given tag.
 +
* Export articles and tags into [[Graph Modeling Language export|GML graph]] and visualize it in [https://gephi.github.io/ Gephi application].
 +
* Try [[GATE/ANNIE integration|GATE/ANNIE]] to extract person and organization topics out of editors and contributors occurrences.
 +
* Try [[Stanford Named Entity Recognizer integration]] to extract person and organization topics out of editors and contributors occurrences.
 +
* Expand tag topics with [[Freebase extractor]].
 +
 +
If you are new to Wandora, see [http://wandora.org Wandora's web site]. You can download Wandora application [http://wandora.org/www/download here]. Addition to YLE Elävä arkisto and Arkivet topic maps transformation, we have converted several other data packages into topic maps format. See [[Topic map gallery]].

Latest revision as of 09:51, 12 May 2015

Finnish National Broadcasting company (YLE) has published metadata of their Elävä arkisto and Arkivet archives. Wandora team has created the Wandora application several extractors that transform the published metadata into a topic maps format. Created extractors will be part of Wandora applications released after 2015-05-06. Before that we are releasing the topic maps created with the Wandora extractors. The data packages used in the topic maps transformation were downloaded 2015-05-06. This page provides download links to the topic maps of transformed data. Also, this page shows some easy use cases for the topic maps data.

Created topic maps may interest Finnish developers exploring YLE's metadata and building applications and visualizations with the data. The information in original metadata and topic maps transformations is mostly in Finnish.

Contents

[edit] Download

The YLE Elävä arkisto and Arkivet topic map is available as Wandora project file:

As the project is very large, we suggest you load the project into the Wandora application with a memory footprint of 16G. Start the Wandora application with a startup script Wandora-16g.bat or Wandora-16g.sh.

Addition to Wandora project file, the topic maps data is available as XTM and JTM topic maps:

These topic map files can be used in other topic maps application too.

[edit] License

Metadata license is Creative Commons Attribution-ShareAlike 4.0.

Original metadata provider is Finnish National Broadcasting company (YLE) and Elävä arkisto and Arkivet archives.

[edit] About the topic maps

The Elävä arkisto and Arkivet topic maps are wrapped into a Wandora project file. The project file is loaded into the Wandora application with a menu option File > Open project. The project contains five topic map layers. Each layer contains information from one source file. As all source files use identifiers consistently, same topics in different layers merge. The topic map layers are

Once the project file is opened in the Wandora application, the user should see all available topic map layers left bottom of the application window. The topic tree left top views Elava-arkisto topic that can be used to start browsing the data. Next screen capture views the Wandora application after the user has opened the topic Elava-arkisto article to the Traditional topic panel and has scrolled downwards to the instances of the article. The Layer info panel right shows some statistics of the topic map layer yle - elava-arkisto - articles. The number of articles is close to 12067 (the layer contains also some extra topics that are not articles).


Yle elavaarkisto example01.gif


Next the Wandora user opens up the topic Elava-arkisto article in the tree left and locates article Nalle puh (20-92781). The number that follows article name is an identifier for the article. The identifier is also stored into the article topic as an occurrence. The user double mouse clicks the Nalle puh (20-92781) article and Wandora opens it into the Traditional topic panel. The use switches current layer to yle - elava-arkisto - tags and the layer statistics in the right change.


Yle elavaarkisto example04.gif


The Wandora user opens up the article topic Nalle puh (20-92781) to Webview and chooses to view topic's subject locator. Article's subject locator is the actual WWW address of the article in YLE Elävä arkisto. Wandora views the WWW page. Unfortunately Wandora can't view embedded videos as they require Flash plugin. Wandora's Webview is based on Java's Webview component that doesn't support Flash videos at the moment.


Yle elavaarkisto example06.gif


The Wandora user closes the Webview and scrolls down to the Elava-arkisto tag associations of topic Nalle puh (20-92781). Article tags include Matti Pellonpää (tag). Matti Pellonpää is a known Finnish actor.


Yle elavaarkisto example07.gif


Wandora user double mouse clicks the topic Matti Pellonpää (tag) to view all articles that are tagged with the topic. It appears that Elävä arkisto contains 13 articles that somehow relate to the Matti Pellonpää. Now the user could continue browsing to articles tagged with Matti Pellonpää (tag).


Yle elavaarkisto example08.gif


Finally the user opens up the Webview once again and chooses to view the Timeline. This feature is not available in current (2015-05-06) official Wandora release yet. The articles are viewed in the timeline along their publishing date.


Yle elavaarkisto example09.gif

[edit] See also

You can do a lot more with YLE Elävä arkisto and Arkivet topic maps. Look at Wandora's documentation for ideas. Some obvious ideas:

If you are new to Wandora, see Wandora's web site. You can download Wandora application here. Addition to YLE Elävä arkisto and Arkivet topic maps transformation, we have converted several other data packages into topic maps format. See Topic map gallery.