April 2017

S M T W T F S
      1
2345678
9101112 131415
16171819202122
23242526272829
30      

Page Summary

Style Credit

Expand Cut Tags

No cut tags

Oct. 8th, 2006

orinoco77: (Default)
Ok, you 'orrible lot. I've got my Tcl hat back on at last and I'm doing some work on my GEDCOM parser, specifically rewriting the bugger from scratch. The way I did it the first time worked, but wasn't exactly optimal and it was a sod to extend. I'm trying a different tack this time. Instead of reading the file in line-by-line and trying to work out what each line is and who it belongs to, I'm reading the whole file in, splitting it into GEDCOM records (delimited by a line beginning with 0) and parsing each individual record. Much better on my sanity, I think. The question is, if any of you have done this before, am I doing it a sensible way here? It seems sensible to me, at least I know where my records start and end and I don't need to keep track of that while I'm processing anymore. I can also take account of oddly formatted files, where the family records can be interspersed with the person records without doing my checks line-by-line, this way I can just test each record and parse it accordingly.

So, any thoughts? The theory is that, properly done, this will allow me to add new types of information more easily. The only fly in the ointment is the various hierarchies of data, dates within events and so forth. I can build a tree of these relationships quite easily, but I'm not sure that'll help when I come to put it all in a database.

When I'm done and confident in it, I'll most likely package it up as a Tcl package in its own right, in the hope that it'll save others this hassle. GEDCOM parsers seem to exist for practically every language except Tcl, which, while I know it's a tiny minority language, still seems a tad unusual. It has packages for dealing with just about everything else.
Page generated Aug. 21st, 2025 03:00 pm
Powered by Dreamwidth Studios