Genes Reunited Blog
Welcome to the new Genes Reunited blog!
- We regularly add blogs covering a variety of topics. You can add your own comments at the bottom.
- The Genes Reunited Team will be writing blogs and keeping you up to date with changes happening on the site.
- In the future we hope to have guest bloggers that will be able to give you tips and advice as to how to trace your family history.
- The blogs will have various privacy settings, so that you can choose who you share your blog with.
By Peter Christian, author of The Genealogist's Internet (Fully revised 5th edition out now from Bloomsbury £16.99) If you've spent any time using online resources to explore your family tree, you will almost certainly have come across information that seems to be wrong. And while you try to think all the ways in which your surname could have become mangled, you undoubtedly ask yourself: why isn't this material more accurate?
Of course, one of the reasons may be, as David Annal has pointed out in a recent blog post, that the original document contains an error. This may be inadvertent, as where a census enumerator mishears a name, or the head of household doesn't actually know the ages and birthplaces of everyone in the house (my own great-grandfather William Christian was clearly ignorant of where his wife was born, even though they were first cousins). Or it can be deliberate, as when someone lies about their age on a marriage or military enlistment record.
Another possibility is that the information is in fact correct, but just doesn't fit in with what you think you have already established. If you're working from the childhood memories of aged relatives, or from family legends, or you have been kept in the dark about an illegitimacy or pre-marital pregnancy, you may well have accepted as true something which documentary evidence will later disagree with. That marriage certificate you can't find may well never have been issued!
But even if we take these possibilities into account, there are self-evident mistakes in online records. If you find someone called 'Smth', for example, you can be sure that no registrar or census enumerator ever seriously thought this was the correct spelling for a surname; it's most likely going to be a modern transcription error.
So how do such mistakes get through, when they appear to be so obvious? The answer, quite simply, is that getting a perfectly accurate transcription for any sizable collection of records is a very challenging task. The handwriting of previous centuries, even where documents have been well preserved, is often difficult to read. In the case of a 16th century parish register or will, it can be nigh impossible without serious study. The fact that many transcriptions are done from digital scans of monochrome microfilm also means that the transcriber is already working from a lower quality source, and that's even if there were no scratches or specks of dust on the film when it was scanned.
In any case, a more accurate transcription is by definition one which takes longer. This may not matter where volunteers are transcribing or indexing in their own time without a deadline. But for any large project, with a national rather than local set of records, there is a limit on how much time can be devoted to attaining perfection. For a commercial data service any additional costs will have to be passed on to customers. Even an academic project which is made available free of charge has to work within the limits of its agreed funding. And with very large datasets, even a tiny error rate will leave many errors: an almost certainly unachievable accuracy rate of 99.9% would still leave around 30,000 people with misspelt surnames in a typical Victorian census.
There's also an important constraint on transcribers, who are instructed to write down what they see, not what they think ought to be on the page. In fact the last thing you want is for the transcriber to try and second-guess a historical document. Of course it should be 'Smith' not 'Smth', but if the name was hastily written and there is no clear dot for the i, the transcriber can't easily tell what the registrar, enumerator or parish clerk thought he was writing. How can the transcriber know it isn't meant to be 'Snith'? (There is a Canadian sportsman Justin Snith, and current electoral registers show several Sniths in the UK.)
You might think that typed and printed records, however, ought to be 100% accurate - clearly they are not open to the same sources of error. But they have their own problems: most such documents are transcribed automatically by optical character recognition (OCR), where specialist software attempts to identify each character of the text from a scanned image. If you've got an e-book reader, you#ll probably already know that even for modern printed books OCR is not foolproof: spaces between words are occasionally dropped, similar looking letters confused, adjacent letters run together - even while I was preparing this blog post, I came across 'stifi' for 'still' and 'stem' for 'stern' in a John le Carré e-book). Then consider the problems of projects like the British Newspaper Archive: hastily and cheaply printed text on thin paper (perhaps so thin that you can see through it to the print on the other side of the page), where both the ink and paper may have had a couple of centuries to deteriorate. Add to that the wide variety of letter-forms and ligatures, and the range of type sizes and styles found in old newspapers and you can readily understand why sources like these cannot necessarily be more accurately transcribed than handwritten records.
The only records immune to all of these problems are the relatively few modern records which were created electronically and which therefore have not had to be transcribed at all. Civil registration records for England and Wales have been electronic since 1985, so the indexes to these available online are as accurate as the official original versions. Modern cemetery indexes are generally taken directly from the electronic records of the local authorities responsible for the cemetery in question.
These errors are certainly inevitable - it's in the nature of the material and the processes of transcription - but that doesn't mean that there's nothing to be done about them. Data services can do their part by allowing fuzzy searches which turn up near matches, or by simply being more flexible about indexing their transcriptions. (There's no reason why 'Smth' can't be indexed as 'Smth', 'Smith' and 'Snith'.) An additional, possibly incorrect index entry is of no consequence to a family historian - it's the missing presumed misspelt individuals that are the problem. Likewise, genealogists need to be flexible about what they search for. It's certainly worth keeping a note of genuine variants and likely misspellings of an ancestral surname. And a useful trick is to search on fewer fields - even if a surname is completely misspelt, searching on a forename will turn up the individual sought, though it may take a while to sort through all the entries.
For all the irritation caused by errors in digitised records and the occasional unfindable ancestor, it's important not to forget what online records have brought us. The ability to conduct a single search over the census records for the entire nation can save the weeks of work required to look through endless reels of microfilm. And it's not possible to search more than the tiniest percentage of the newspapers in the British Newspaper Archive if you have to work with the original documents. In fact, even if you do search microfilms or originals, it's very easy to miss the entry you're looking for, something which won't happen with online records.
Peter Christian, April 2013.
For more information about online genealogy, visit Peter Christian's website at http://spub.co.uk/