Oct. 8th, 2006 03:49 pm
GEDCOM madness
Ok, you 'orrible lot. I've got my Tcl hat back on at last and I'm doing some work on my GEDCOM parser, specifically rewriting the bugger from scratch. The way I did it the first time worked, but wasn't exactly optimal and it was a sod to extend. I'm trying a different tack this time. Instead of reading the file in line-by-line and trying to work out what each line is and who it belongs to, I'm reading the whole file in, splitting it into GEDCOM records (delimited by a line beginning with 0) and parsing each individual record. Much better on my sanity, I think. The question is, if any of you have done this before, am I doing it a sensible way here? It seems sensible to me, at least I know where my records start and end and I don't need to keep track of that while I'm processing anymore. I can also take account of oddly formatted files, where the family records can be interspersed with the person records without doing my checks line-by-line, this way I can just test each record and parse it accordingly.
So, any thoughts? The theory is that, properly done, this will allow me to add new types of information more easily. The only fly in the ointment is the various hierarchies of data, dates within events and so forth. I can build a tree of these relationships quite easily, but I'm not sure that'll help when I come to put it all in a database.
When I'm done and confident in it, I'll most likely package it up as a Tcl package in its own right, in the hope that it'll save others this hassle. GEDCOM parsers seem to exist for practically every language except Tcl, which, while I know it's a tiny minority language, still seems a tad unusual. It has packages for dealing with just about everything else.
So, any thoughts? The theory is that, properly done, this will allow me to add new types of information more easily. The only fly in the ointment is the various hierarchies of data, dates within events and so forth. I can build a tree of these relationships quite easily, but I'm not sure that'll help when I come to put it all in a database.
When I'm done and confident in it, I'll most likely package it up as a Tcl package in its own right, in the hope that it'll save others this hassle. GEDCOM parsers seem to exist for practically every language except Tcl, which, while I know it's a tiny minority language, still seems a tad unusual. It has packages for dealing with just about everything else.