Thursday, September 10, 2009

Google Books and bad metadata

What was I saying earlier about the need for good metadata in order to open up free content? The Times Higher Tweeted today on the Google Books metadata fiasco and ensuing online discussion on linguist and Professor Geoffrey Nunberg's blog. At the end of August he blogged about the millions of metadata errors on Google Books. At first you almost think this is funny but then it dawns on you how much information Google holds and how much trust is invested in them by their billions of users.

One thing that's pointed out on Geoff's blog post is that a crucial thing to remember is that Google Books will likely be 'The Library' in the future. No-one else will repeat the scanning they've done, so we're stuck with Google Books as our one online source for digitised books. This means that they absolutely have to get this right and, more importantly, that they must alert their users to its ongoing limitations.

Frustration of different database platforms

I've just come back from co-running an information skills session for GP tutors. I've run this session, along with NHS librarian/trainer Elizabeth Saunders from Worcester Trust, a few times now. Each time we seem to spend longer explaining how the different Medline platforms work than the GPs do actually get to practice using the database. I would like someone to explain to me how it benefits our users (the GPs and eventually the medical students) to develop different versions of databases. I'm assuming that it's partly down to the requirements of different networks/systems within different institutions and partly down to the data on the database being licenced for use by more than one platform provider.

This reminded me of my recent post (below) about the design of databases and how this links to being able to find information. Today, I first demonstrated how to use the Ovid Medline platform via our eLibrary. There are several versions of Medline available this way, mostly different date ranges (just to confuse users further) but the searching interface is actually fairly intuitive and the searching functionality is very powerful. It's what I'll train the medical students on so it's useful for the GP tutors to see it in action.

Elizabeth then took over, demonstrated the NHS Evidence site (which I think is excellent for finding information) and then went on to demo the NHS version of Medline. This version looks very different to Ovid and kind of behaves differently too. It should be just as powerful and bring back the same results. However, this isn't always the case. Elizabeth and I did some testing last year and found that with some searches, sometimes, slightly different results came up.

There are several reasons why this is frustrating:

  1. Having to demonstrate two versions of the same database is time consuming and confusing, especially when demonstrating just one database can be confusing enough.
  2. If the same database (different platform) is bringing back even slightly different results then this kind of goes against the idea that you are undertaking a comprehensive literature search. If 'good enough' is okay, then fine, but users need to understand the limits of a database and if the limits are different for each version you will lose them pretty quickly.
  3. It means that as a librarian I should be getting to know different versions before showing them to users and I don't want to have to do this. I want to get to know the best version and show my users this one.
So, how does this link to re-using learning material? Well, journal literature is learning material and if I can't demonstrate an easy(ish) way of finding relevant material to my users, then someone is doing something wrong, and it ain't me.