Category: Search

Feb 28 2010

Accent Folding


A List Apart has been a steady source of thought-provoking inspiration over the years, not only from a website building perspective, but also because much of what they publish crosses boundaries and impacts other projects and interests in my life.

Their current article, Accent Folding, greatly impacts library data in general, and library catalogs in particular.  It deals with the issue of Unicode and pattern recognition, namely how one creates search tools that allow for variations in how words containing accents, stress marks, and other non-ascii characters.  The most succinct example:

There is no excuse for your software to play dumb when the user types “cafe” instead of “café.”

The article presents methods of “normalizing” text to allow for proper matching, and should be read by anyone who gets to deal with library data for reports and searching aids.  If you know how to use regular expressions, you will likely be in for a treat.

The other example they present, this time to demonstrate the limitations of accent folding, uses Japanese to illustrate just how differently the same data can be presented:

These four sentences all say “Children like to watch television” in Japanese:

  • Kanji: 子供はテレビを見るのが好きです。
  • Hiragana: こども は てれび を みる の が すき です 。
  • Romaji: kodomo wa terebi o miru noga suki desu.
  • Cyrillic: кодомо ва тэрэби о миру нога суки дэсу.

Even if you don’t end up applying this directly to your work, the information in this article will help your appreciation for the challenges contained within your data, and how tough it can be to make it “just work” sometimes.

  • Share/Bookmark
Jan 03 2010

Unusual Articles


If you spend any amount of time perusing Wikipedia, you will encounter articles on topics that are either exceedingly trivial, offbeat, or hard to classify.  For those who seek these articles, they have a page for it:

Wikipedia: Unusual Articles

This could be considered a place to check for offbeat reference questions (although the Wikipedia search function should offer better results).  Note that you would not want to assume that an article listed on this page would remain, as “all such lists have a risk of being deleted because of lack of neutral definition of what really is ‘unusual’.”

A few articles of note:

found via ResourceShelf

  • Share/Bookmark
Dec 02 2009

How College Students Seek Information in the Digital Age


How College Students Seek Information in the Digital Age (pdf) is a report from Project Information Literacy, maintained by the Information School at the University of Washington that contains a few surprises for libraries:

  • Course readings were the first place most students turn to for course-related research (97%).
  • Over 80% of students used library-provided research databases.
  • Usage of library offering (research databases, OPAC, print materials, and study areas) were all above 50%.

Now the not so good:

  • All interactive library research (talk to a librarian, attend a training session, use chat, e-mail or other online “Ask A Librarian” service) fell below 25%.
  • Students are missing out on potential resources (including library research assistance), simply because those resources are not within their range of research activity.

Where are students going for assistance?  They tend to go to their instructors for guidance and assistance, but otherwise they simply use the resources they already know about, or discover in the course of their research.

What might this mean for libraries?  We should push for better interaction with instructors, so that they will be more likely to understand the full range of resources available for students to use, and will be more likely to refer students to an interactive library resource (which was only done 26% of the time — and the only result on the survey below 60%).

We also should examine our online presence.  How does it present research resources?  Will someone looking for a particular type of information be able to locate all the resources that the library has to offer?  Print and online library guides for these activities can also be very beneficial.

This report should be read, and reviewed, with each of our libraries in mind.  By understanding that the people we interact with are only one-fourth of the population using our resources, we can begin to re-focus our efforts to ensure that what we have to offer will be used effectively.

found via Bill Drew, who found it via the Free Range Librarian

  • Share/Bookmark
Nov 10 2009

Springo


I have always been a fan of the idea of creating a mediated search engine – one in which the results have been reviewed in order to ensure a lean, relevant results list.

The reality of the idea is, shall we say, a different matter.  The process is time-consuming, labor-intensive, and usually falls short in the relevance category.

Springo may be the reality that falls closer to the ideal.  Focusing on topic-based searches, they provide results that reflect sites that people most use when they are seeking solutions to more generalized questions, such as e-mail software, movie reviews, or open source software.

The results aren’t perfect, but they do appear relevant.  Most of what I notice are the sites I would expect to be top-tier, but aren’t.  It is a great resource, especially for those who might otherwise find it challenging to wade through several dozen results to find what they need.

My other observation is that I almost immediately began to use the URL to form my search strings, rather than use the provided interface.  It just seems faster and easier to do so.  Plus, it would be nice to be able to right-click (Windows-centric) in order to open results in a new tab.  Minor quibbles, though, for an effective tool that has been a long time coming.

found via Library Journal (print edition, 1 November 2009)

  • Share/Bookmark
Sep 03 2009

HealthBase, continued


A follow-up to yesteday’s post on HealthBase:

It wasn’t what I was thinking of when I provided a caveat regarding Wikipedia being used as a source, but apparently some search results have offbeat listings, occasionally with negative associations.

My caveat was in regards to the fact that Wikipedia is what I call a “starting point” for information search, not a source itself.  This actually places it in the same category as HealthBase, in that one should not take any particular piece of information as accurate, but to use the information to follow to primary sources and develop search terms and a broader understanding of the topic.

On HealthBase’s problem:  this is to be expected with new methods of indexing and searching, and this type of problem will tend to show itself with a broad base of users.  I still think it is a great starting point for health information searches, and that the searching algorithm will become more sophisticated over time.

  • Share/Bookmark
Sep 02 2009

HealthBase


HealthBase is a health information search tool created by Netbase that should be on your short list of resources.

Search results are drawn from a diverse range of resources, including WebMD, PubMed, Medline Plus, and the Mayo Clinic.  NetBase uses a semantic-based indexing system to obtain the context of articles, and provides targeted results categories to allow the user to find specific information on their topic.

It is a very useful service, with the usual caveats regarding health information on the web (they also index health information from Wikipedia, so be sure to check sources).

found via ResourceShelf and TechCrunch

  • Share/Bookmark
Aug 27 2009

Google Library?


As the Google Books settlement works its way to becoming reality, it is becoming apparent that Google Books will be transformed into something very much resembling a library.

Think of how this might change our roles in society.

found via LISNews

  • Share/Bookmark
Aug 23 2009

BookChaser


I wrote a post about the BookChaser Editions service last year, but encountered a reference to another service they offer, and this led me to others:

  • BookChaser Covers : compares cover images available from Amazon, Google Books, LibraryThing, and Open Library.
  • BookChaser BookInfo : compares information about a book obtained from Amazon, Google Books, ISBNDB, Library of Congress, LibraryThing, Open Library, and WorldCat.
  • ISBN Analysis Tool : compares x-ISBN-like service availability for a given ISBN obtained from Amazon, Google Books, LibraryThing and WorldCat.

All lookup services are by ISBN.

  • Share/Bookmark
Aug 20 2009

PLOS Currents : Influenza


For those who have been following  H1N1 influenza virus news (and those who might expect to get questions about it), the Public Library of Science (PLOS) and Google have launched a new mashup service:

PLOS Currents : Influenza is built utilizing Google Knol and a new service from the National Center for Biotechnology Information (NCBI) called Rapid Research Notes.  This service allows the user an easy way to follow current research and search for relevant scientific information.

As we approach influenza season, expect greater levels of concern and interest in H1N1.

found via the Official Google Blog

  • Share/Bookmark
Aug 09 2009

TinEye Reverse Image Search


TinEye is an image search engine with a twist:  you provide it with an image, and it returns a list of websites utilizing that image.

I can quickly think of a few really good uses of this service:

  • You have an image that you downloaded sometime in the past, but you don’t remember where you got it.
  • You have found an image that would be perfect for a project, but you aren’t sure who the owner is, or what the usage terms might be.
  • You are the owner of an image, and want to ensure that it isn’t being mis-appropriated by others.
  • You have an image, and want to locate a site with a higher quality or different version of the image.

found via a comment thread in MetaFilter

  • Share/Bookmark
Jun 17 2009

ISBN-UPC-EAN Lookups


If you are involved with the selection or ordering process, then you are very likely to be familiar with searching for items by the International Standard Book Number, or ISBN.  The newer, 13-digit ISBN is actually based on the European Article Number, or EAN, which makes books consistent with most international trade goods.  The EAN was developed as an expansion of the common Universal Product Code, or UPC.

Enough theory?  How about web sites that offer lookup services that can help you find booksellers, prices, and even reviews and summaries of the books you wish to acquire?

  • BookFinder – This site returns a large number of booksellers (many, many used booksellers!), although it seems odd that it doesn’t display the book’s title.
  • CheckUPC.com – A good summary, and a variety of printable bar codes make this a decent site for book information.
  • ISBN.nu – This is one I have used for years, and is still the one I turn to when our primary vendors don’t have a book in stock.
  • ISBNdb.com – With summaries, subjects, similar items, and physical details, this site is a great resource for information about books.
  • OCLC’s xISBN service – This service returns a list of related ISBNs, other editions of the book whose ISBN you append to their base URL ( http://xisbn.worldcat.org/webservices/xid/isbn/ ), in XML format.  It isn’t pretty, but when you need it, it is very helpful.
  • ThingISBN – Similar to xISBN, LibraryThing provides a service where you append your ISBN to the end of their base URL ( http://www.librarything.com/api/thingISBN/ ) and get a list of related ISBNs in XML format.
  • UPC Database – This site returns a large number of booksellers of the group; it also lets you know that the UPC is associated with that fictional country that so many people enjoy visiting:  Bookland.
  • Wikipedia’s Book Sources – If you want a service that can give you dozens (and dozens!) of places where you can “Find This Book”, then you need to try this one.

For comparison, here are links to results for the same book (Stephen King : The Dark Tower):

Sources and further information:

  • Share/Bookmark
Jun 01 2009

National Library of Australia’s Search Prototype


The National Library of Australia has launched the beta of their new search interface, SBDS Prototype (SBDS stands for Single Business Discovery Service, I think), and the search experience is not only better than any other library-related search I have used, it is faster than most of them as well!

Other reactions:

This is an excellent example of what is possible today, and what we should all strive for in our search interfaces.  There is such a diversity of resources, and unifying these into a usable and fast single-search service is a credit to the developers at the National Library of Australia.

  • Share/Bookmark
May 23 2009

Open Jurist


Open Jurist is a great add-on to the free case law resources I wrote about a few days ago.  Consisting of over 600,000 opinions from the federal court system, including the United States Supreme Court and the Federal Appellate Courts.  This looks to be a great resource for research into federal court cases.

One minor negative:  one of the first searches I performed, Bush v. Gore (the Supreme Court decision regarding the 2000 presidential election) didn’t work as it should have because the case is listed as “George Bushs v. Albert Gore“.  The “s” at the end of the word kept it from the first page of results.  I initially thought it might have referred to the plurality of petitioners (“et al.”), but a Google search actually produces no results when the “s” is included.  I looked for a way to notify those who run the site, but only found an e-mail address that was to be used “if you have access to more cases or know where we can get more of them”.  Any website meant to offer a service should have (at least) a method for general contact.

(Note:  see comment regarding the “one minor negative” paragraph!)

found via ResourceShelf

  • Share/Bookmark
May 18 2009

Feedmil


Feedmil is a search engine for RSS feeds.  It does this specific task very, very well.  Search for feeds relating to any keywords you wish, and modify your results using a set of slides that emphasize/de-emphasize words that show up in your results.

I wish they had a more detailed “about” page, especially information on how they determine popularity, authority, quality, and relevance.

found via RSS4Lib

  • Share/Bookmark
May 18 2009

Common Chemistry


Common Chemistry is a resource from Chemical Abstracts Service (CAS) which allows one to search for chemical information using a variety of search terms, whether the terms be common names (aspirin, table salt), basic chemical names (acetylsalicylic acid, sodium chloride), or even the official CAS registration number (50-78-2, 7647-14-5).

Although this does not search the entire CAS database, it is an excellent starting point for most of the chemical information questions posed students and the general public.

Note the link to the Wikipedia entry (just above the disclaimer) for many, but not all, results; not many “authoritative” resources are confident enough in their users to connect them with resources developed and maintained by the crowd.

found via ResourceShelf

  • Share/Bookmark
May 16 2009

Wolfram Alpha


Wolfram|Alpha is a new type of internet resource that has just gone “live”.  Many are calling it a new type of “search engine”, which it technically is, but it isn’t a search engine in the way we are used to envisioning one.  Others are calling it an “answer engine”, which isn’t a bad description.  Wolfram takes the user’s query and builds a response from a variety of resources that resembles what one might get from an almanac or or encyclopedic resource.  Wolfram’s own “about” page avoids any particular label.

Right now it has a relatively limited set of resources.  It does well with towns, states, people, movies, word definitions and many scientific questions in the areas of mathematics and chemistry (which is understandable considering that it was built using Mathematica as the foundation.  It doesn’t do well with music, books, television shows, and abstract concepts.  Much of this has to do with the data sets and methodology that is in place, and I expect to see much greater depth and breadth represented as time goes on.

I have spent a bit of time last night and this afternoon playing with it, and am fairly impressed.  Try a few queries, and be sure to view the “Source Information” link at the bottom of the results.  This provides a fairly generic listing of the resources used for that type of query.  Not all of the sources were necessarily used for your particluar query, but were the sources from which the data was extracted.  One can see that with the addition of additional resources, Wolfram could become powerful first step for research.

Some sample queries:

and it doesn’t know quite how to handle leading articles, either:

This is going to be a great reference; it needs time to mature and for additional data sets to be included.  This is certainly one place where the use of library cataloging information would be very, very beneficial.  Imagine being able to connect the dots between books, authors, publication dates, settings/locales, etc. and other data sets.

There are many sources of data that are mostly silos waiting to be tapped, Infochimps.org being one example.  How quickly and effectively they are able to incorporate useful data will partly determine how successful this resource becomes.  Improving their context recognition (i.e. figuring out what a person wants to know based on their typed input) is the other, perhaps harder, challenge.

Other sources of commentary and information:

In closing, I like their nod to 2001, A Space Odyssey in their load exceeded error screen:

I'm sorry Dave, I'm afraid I can't do that...

I'm sorry Dave, I'm afraid I can't do that...

  • Share/Bookmark
May 14 2009

Google and Microformats


Google has made the jump into supporting Microformats as well as RDFa, calling their implementation “Rich Snippets”!

This is great news on several different levels. Semantic markup within web pages provides a way to target searches much more effectively.  TechCrunch provides an excellent example:

“If I was to write a post that mentioned “The President” without naming him, Google probably wouldn’t realize that I was talking about President Obama – it might think I was referring to another US president, or perhaps the leader of a company. But using RDFa I could tag the words “The President” with “Barack Obama”. That tag would be visible to machines spidering the page for indexing (resulting in smarter search results), but wouldn’t be shown to users reading the post. In effect, it’s a way to tell search engines about your content without exposing your visitors to extraneous text.”

In addition, sites that provide well-structured metadata have the potential to be much more usable (and useful).  Library web sites, especially OPACs and Resource pages, should include structured information that details the context of the displayed content.  Using microformats in our web sites will benefit everyone involved over time.  As David Peterson notes on the SitePoint blog:

“Now that Google is supporting structured data it is high time to learn how to use this stuff.”

  • Share/Bookmark
Apr 24 2009

Stupid Disclaimer


A brief rant, if you will accomodate me for a moment:

I encountered a disclaimer in an e-mail that strikes me as extreme enough to mention:

This email, and any attachment, is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, copying, dissemination or other use of this information by persons or entities other than the intended recipient is prohibited.

This came as part of a response from a company I had asked about the availability of an item.  Note that, by a strict interpretation of the statement, only the specific recipient of the message can use the information contained within.  If the e-mail had been from the company’s legal department, or if it hadn’t been about a product with a great deal of publicity and interest, there might have been some justification.

I know that legal boilerplate such as this seems to go along with incorporation, and that many of the employees of this company must stifle a groan every time they send information on their products, but these statements can be worded in such a way that they don’t throw a giant blanket of silence over simple sale information.  Or, perhaps, the statement can be reserved for those departments that handle legal, fiscal, and personnel matters, and a “lighter” disclaimer be used for general public communication.

This is something that falls into the same general category as Copyfraud, in that it attempts to place a much stronger restriction on something that doesn’t legally deserve it.

Or am I supposed to take the information about whether a particular item is available for sale to the grave?

My own disclaimer:  I changed the language of the disclaimer a tad, even though a quick internet search revealed several companies using the same wording as the e-mail.

  • Share/Bookmark
Apr 16 2009

Evernote


Evernote is an online service that serves an interesting purpose:  it allows you to indicate digital items that you wish to remember, it stores them, and then makes the entire collection searchable.

Or more specifically, you can have it remember all your blog posts, tweets, iPhone items (photographs, etc.), typed notes, e-mails… whatever you tell it to store.  Everything gets indexed in their database, and will be there for you to retrieve at whatever time you wish to do so.

Right now this is simply a neat idea, and assuming that it works as smoothly as it’s description, a good way of archiving the wide varieties of communication and digital storage we use in our daily lives.  However, I think it is more than that… I suspect that this is the social leading edge of what is becoming more and more necessary in the digital age: the necessity of having some sort of structure to the hodge-podge of data that accumulates like peanut shells in a sports bar.

Another way of viewing this is that it is similar to the ideas behind the Semantic Web.  This isn’t a perfect match, of course, but the ability to match up commonalities between different chunks of data is the goal in each of these endeavors.  Understand that the amount and variation of the data is not going to be reduced in the years to come… we are going to need tools like this just to keep abreast of the tide of information that we will encounter.

Watch for other companies to address this idea; I will likely wait for something that can reside on my own server space (perhaps syncing indexes with others for greater effect), and preferably open source, rather than trust that this or some other cloud will achieve permanence.

found via the Proverbial Lone Wolf Librarian

  • Share/Bookmark
Mar 31 2009

Social Backrub


This is just one of my passing thoughts, which I suspect is understood by many, but not necessarily expressed this way:

Google’s PageRank is, for all practical purposes, a form of social networking applied to the concept of a particular html tag.  The ranking system is built upon the idea that someone, somewhere, decided that something on their web page was so associated with another web page that it needed to be wrapped in <a> </a> tags with the web page’s address referenced.  Thousands (millions!) of people finding it imperative to add these tags around their text, and thereby making it possible to judge the importance of specific web sites by aggregating these millions (billions!) of tags.  Will we look back at this and call it the beginning of social networking on the web?

the thought passed through my head while reading Stefano’s Linotype

  • Share/Bookmark
FireStats icon Powered by FireStats