Parsing LTM files in Wandora, problems.....

Forum is for miscellaneous user help requests.

Parsing LTM files in Wandora, problems.....

Postby athanassios » Wed Nov 07, 2012 1:29 pm

Hi,
I am trying to import a very simple ltm file with only three lines
@"utf-8"
#VERSION "1.3"
#BASEURI "http://neurorganon.org/TIR"

The file is generated in a Windows environment from the output of VBA stream into a file.
I am getting the following strange warnings and 4 broken association ???? from Wandora when I am trying to import the file.
In case of topics included in file content, they are found and added on Wandora topic map as normal.

Any ideas why this is happening ?

Reading file 'test.ltm'.
Merging '/home/athanassios/Work/NULO/test.ltm' to context layer while reading.
Found no base URI for topic map. Using default base 'http://www.wandora.org/ltm-import/'.

Warning: Unrecognized element: "@"utf-8"" near line number 1, after topic number 0 and association number 0
Warning: Unrecognized element: "#VERSION "1.3"" near line number 2, after topic number 0 and association number 0
Warning: Unrecognized element: "#BASEURI "http://neurorganon.org/TIR"" near line number 2, after topic number 0 and association number 0

Found total 0 topics, 0 associations and 0 occurrences.
Real number of topics, associations and merges in topic map may be smaller due to merges.
Found also 4 broken associations.
Total 1 files imported!
Done
Last edited by athanassios on Fri Nov 09, 2012 12:04 pm, edited 1 time in total.
athanassios
 
Posts: 47
Joined: Wed Sep 07, 2011 12:16 pm
Location: Greece

Re: Importing automatically generated LTM file, problems....

Postby akivela » Wed Nov 07, 2012 2:30 pm

Hi Athanassios

Your problem sounds very much like a BOM problem. Many text editors in Windows (including Notepad) add a special character in front of a text file to mark UTF-8 encoding. This character is called as a Byte Order Mark (BOM). Wandora's LTM parser doesn't recognize the BOM character and parse fails. To prevent LTM parse errors, I suggest you remove the BOM character in your LTM file with a text editor such as Notepad++.

For more information about BOM see http://en.wikipedia.org/wiki/Byte_order_mark

Kind Regards,
Aki / Wandora Team
akivela
Site Admin
 
Posts: 256
Joined: Tue Sep 18, 2007 10:20 am
Location: Helsinki, Finland

Re: Parsing LTM files in Wandora, BOM problem solved

Postby athanassios » Fri Nov 09, 2012 11:53 am

Thanks Aki, yes it was indeed a BOM problem. First time I met the endianness issue was back in 1995 when I was processing raw sound data from/to files in Unix Workstations. But the term BOM is new to me ;-) Another example on how important is the ontology and the mapping of terms. Problem is now solved, I skipped the first two bytes in the stream and wrote the rest of the content into a file without BOM. The solution was not mine, the searching with the right terms was :) .
Googling with "VBA file BOM" gave me 961,000 results. My solution was ranked 11th, and I had to spend a few minutes to discard the rest.

ResourceTitle: "Spotty Alley: How to export data into UTF-8 without BOM in Visual Basic: "
ResourceURL: http://axlr8r.blogspot.gr/2011/05/how-t ... thout.html
-----------

I am making this as an example to highlight the main issue in searching that is the indexing of resources based on standard terms and the relation of these terms in ontologies. For example in our case the term BOM (Binari Order Mark) should have been characterised as a type of Unicoced character and related closely to the "endianness" term. I assume that many computer scientists are familiar with the later term, but searching for a solution to the problem using this term would have produced poor results in ranking !!!

Nevertheless the problem is that we have not aggreed on a standard method of describing terms and how these terms will be related to binary information resources. I believe NULO is making progress on both of these issues, visitors of your blog may read more about TIRs and BIRs and about indexing methods in NULO at (http://neurorganon.org/NULO).
Last edited by athanassios on Fri Nov 09, 2012 2:35 pm, edited 6 times in total.
athanassios
 
Posts: 47
Joined: Wed Sep 07, 2011 12:16 pm
Location: Greece

Re: Parsing LTM files, the nested quoted string in occcurren

Postby athanassios » Fri Nov 09, 2012 12:03 pm

OK this is another problem on importing an LTM file.
--------------------------------------------------------------
Parsing nested quoted strings inside an occurrence is causing errors, i.e. quotes insdise a quoted string.

Warning: Unrecognized element: "......" in at least one respect. bla bla bla bla bla "} /RW-CYC" near line number 2929, after topic number 295 and association number 0

Found also 25 broken associations.

Found also 4 broken occurrences.
This occurred at occurrences of Topic IDs of the form TOPIC-(s). Wandora did not like paranthesized topic names.

The easy solution is to avoid quotes in the content of the value property of occurrences, but that of course adds limitations on the user side. Perhaps you would like to condider modifications on the parser of ltm files.

Thanks for your help so far
athanassios
 
Posts: 47
Joined: Wed Sep 07, 2011 12:16 pm
Location: Greece

Re: Parsing LTM files in Wandora, problems.....

Postby akivela » Mon Nov 12, 2012 1:01 pm

Hi Athanassios

You probably mean actual occurrence resource when you write about an occurrence.

Wandora expects the occurrence resource starts with double square bracket characters [[ and ends similarly with double ending square brackets ]]. Any characters (except double ending square brackets) can be inside the occurrence resource. And you should be able to include quote (") characters in occurrence resource. Here is an example of an LTM occurrence line Wandora recognizes:

Code: Select all
{TimeSpan_1700_Century,at_some_time_within,[[1700-1799]]}


Kind regards,
Aki / Wandora Team
akivela
Site Admin
 
Posts: 256
Joined: Tue Sep 18, 2007 10:20 am
Location: Helsinki, Finland


Return to How to... and problems

Who is online

Users browsing this forum: No registered users and 1 guest

cron