Posts tagged: Search

Feb 28 2010

Accent Folding


A List Apart has been a steady source of thought-provoking inspiration over the years, not only from a website building perspective, but also because much of what they publish crosses boundaries and impacts other projects and interests in my life.

Their current article, Accent Folding, greatly impacts library data in general, and library catalogs in particular.  It deals with the issue of Unicode and pattern recognition, namely how one creates search tools that allow for variations in how words containing accents, stress marks, and other non-ascii characters.  The most succinct example:

There is no excuse for your software to play dumb when the user types “cafe” instead of “café.”

The article presents methods of “normalizing” text to allow for proper matching, and should be read by anyone who gets to deal with library data for reports and searching aids.  If you know how to use regular expressions, you will likely be in for a treat.

The other example they present, this time to demonstrate the limitations of accent folding, uses Japanese to illustrate just how differently the same data can be presented:

These four sentences all say “Children like to watch television” in Japanese:

  • Kanji: 子供はテレビを見るのが好きです。
  • Hiragana: こども は てれび を みる の が すき です 。
  • Romaji: kodomo wa terebi o miru noga suki desu.
  • Cyrillic: кодомо ва тэрэби о миру нога суки дэсу.

Even if you don’t end up applying this directly to your work, the information in this article will help your appreciation for the challenges contained within your data, and how tough it can be to make it “just work” sometimes.

  • Share/Bookmark
Nov 10 2009

Springo


I have always been a fan of the idea of creating a mediated search engine – one in which the results have been reviewed in order to ensure a lean, relevant results list.

The reality of the idea is, shall we say, a different matter.  The process is time-consuming, labor-intensive, and usually falls short in the relevance category.

Springo may be the reality that falls closer to the ideal.  Focusing on topic-based searches, they provide results that reflect sites that people most use when they are seeking solutions to more generalized questions, such as e-mail software, movie reviews, or open source software.

The results aren’t perfect, but they do appear relevant.  Most of what I notice are the sites I would expect to be top-tier, but aren’t.  It is a great resource, especially for those who might otherwise find it challenging to wade through several dozen results to find what they need.

My other observation is that I almost immediately began to use the URL to form my search strings, rather than use the provided interface.  It just seems faster and easier to do so.  Plus, it would be nice to be able to right-click (Windows-centric) in order to open results in a new tab.  Minor quibbles, though, for an effective tool that has been a long time coming.

found via Library Journal (print edition, 1 November 2009)

  • Share/Bookmark
Sep 03 2009

HealthBase, continued


A follow-up to yesteday’s post on HealthBase:

It wasn’t what I was thinking of when I provided a caveat regarding Wikipedia being used as a source, but apparently some search results have offbeat listings, occasionally with negative associations.

My caveat was in regards to the fact that Wikipedia is what I call a “starting point” for information search, not a source itself.  This actually places it in the same category as HealthBase, in that one should not take any particular piece of information as accurate, but to use the information to follow to primary sources and develop search terms and a broader understanding of the topic.

On HealthBase’s problem:  this is to be expected with new methods of indexing and searching, and this type of problem will tend to show itself with a broad base of users.  I still think it is a great starting point for health information searches, and that the searching algorithm will become more sophisticated over time.

  • Share/Bookmark
Sep 02 2009

HealthBase


HealthBase is a health information search tool created by Netbase that should be on your short list of resources.

Search results are drawn from a diverse range of resources, including WebMD, PubMed, Medline Plus, and the Mayo Clinic.  NetBase uses a semantic-based indexing system to obtain the context of articles, and provides targeted results categories to allow the user to find specific information on their topic.

It is a very useful service, with the usual caveats regarding health information on the web (they also index health information from Wikipedia, so be sure to check sources).

found via ResourceShelf and TechCrunch

  • Share/Bookmark
Jun 01 2009

National Library of Australia’s Search Prototype


The National Library of Australia has launched the beta of their new search interface, SBDS Prototype (SBDS stands for Single Business Discovery Service, I think), and the search experience is not only better than any other library-related search I have used, it is faster than most of them as well!

Other reactions:

This is an excellent example of what is possible today, and what we should all strive for in our search interfaces.  There is such a diversity of resources, and unifying these into a usable and fast single-search service is a credit to the developers at the National Library of Australia.

  • Share/Bookmark
Apr 16 2009

Evernote


Evernote is an online service that serves an interesting purpose:  it allows you to indicate digital items that you wish to remember, it stores them, and then makes the entire collection searchable.

Or more specifically, you can have it remember all your blog posts, tweets, iPhone items (photographs, etc.), typed notes, e-mails… whatever you tell it to store.  Everything gets indexed in their database, and will be there for you to retrieve at whatever time you wish to do so.

Right now this is simply a neat idea, and assuming that it works as smoothly as it’s description, a good way of archiving the wide varieties of communication and digital storage we use in our daily lives.  However, I think it is more than that… I suspect that this is the social leading edge of what is becoming more and more necessary in the digital age: the necessity of having some sort of structure to the hodge-podge of data that accumulates like peanut shells in a sports bar.

Another way of viewing this is that it is similar to the ideas behind the Semantic Web.  This isn’t a perfect match, of course, but the ability to match up commonalities between different chunks of data is the goal in each of these endeavors.  Understand that the amount and variation of the data is not going to be reduced in the years to come… we are going to need tools like this just to keep abreast of the tide of information that we will encounter.

Watch for other companies to address this idea; I will likely wait for something that can reside on my own server space (perhaps syncing indexes with others for greater effect), and preferably open source, rather than trust that this or some other cloud will achieve permanence.

found via the Proverbial Lone Wolf Librarian

  • Share/Bookmark
Apr 11 2008

Berkeley Accord


ILS Basic Discovery Interfaces, a.k.a the Berkeley Accord

In what may turn out to be a historically significant event in the history of library tech, a group called the ILS Discovery Task Force has generated an outline detailing what amounts to an Application Programming Interface (API) for the library OPAC. They are calling this the Berkeley Accord. Not only have they hashed out the basic understanding, but the following companies/organizations have undersigned the document:

  1. Talis
  2. Ex Libris
  3. LibLime
  4. BiblioCommons
  5. SirsiDynix
  6. Polaris Library Systems
  7. VTLS
  8. California Digital Library
  9. OCLC
  10. AquaBrowser

What does this mean? This means that sometime in the hopefully not-too-distant future, someone can create an online search tool and know that it will work with OPACs from many different ILSs. Much like browsing the web is a similar experience with Internet Explorer 7, or Opera 9, or Firefox 3 (because they use a shared understanding of how to display the html and css found on the web) searching various libraries using the same interface (because they use a shared understanding of how to access the information in the ILS) can make research more effective for everyone.

If this is realized, it will make our jobs easier, our patrons happier, and the institution of the library more powerful and effective. It can be a “win” for everyone who recognizes that the future is dependent on advancing search technology and interoperability.

Of note is the lone abstention: Innovative Interfaces, Inc (III). They indicate that while they agree with the general principles, they cannot offer their support until much greater detail is known about the framework. My initial thought is to question this: if you feel that this is a good foundation, then agree to it and work to build upon it. If there are flaws, express them and work to build support on an improved foundation. What comes to mind is a zen koan:

“When walking, just walk. When sitting, just sit. Above all, don’t wobble.”

found via a posting on NGC4lib (Next Generation Catalog for Libraries) by Eric Leese Morgan

  • Share/Bookmark
Dec 26 2007

Monopolies, Libraries, and Challenges


A somewhat rambling essay, but one that is important nonetheless:

Joe Wilcox has posted an interesting essay at Microsoft Watch regarding Google’s merger with DoubleClick, the internet advertising company.  I strongly disagree with some of his interpretations (he tries to have it both ways, and by defending Microsoft and chastising Google, he simply muddies the water), but the essay has me thinking about the good and bad of monopolies in libraryland.

First, is the love-hate relationship I have with “monopolies”.  Oftentimes a monopoly reduces choices for the user/consumer, and oftentimes the litmus test for this is whether the company/organization channels its energy towards preventing competition, rather than out-performing competition.  Efforts towards providing a better product/service than one’s competitor are rarely in vain.  Even if a company fails, the level of product/service is usually improved across the board.

Next, the concept of open standards is, for better or worse, tied up with monopolies.  A group with a monopoly is able to set standards much more effectively.  If the standards are set in a fair manner, i.e. not simply to prevent competition against one’s own product/service, then the monopoly can actually be more efficient.  If not, it isn’t truly an open standard, as much as it is a proprietary standard.

Libraries, then… we are swimming in a sea of standards, and companies that create them.  We are living with standards that work only for us, such as MARC, and aren’t of much (if any) benefit outside libraries.  The bibliographic information contained within them is of great benefit and value, but the standard is not very useful.

However, so much of our energies are tied up in this standard (and others, if we think about it), and it is dragging us down.  It is important to understand that the information is what has value; the value in how we store and access it is reflected in the ease of use, and the interest in using that storage/access method.

MARC has lost it’s luster, and we should move forward.  The information, however, is more valuable than ever, and we need to figure out how to maximize this value.  Making it easy for everyone to use, not only libraries, should be our top priority.  When Amazon or Google (or companies/groups like them) really want to access our bibliographic records, and use their structure, this will be when we know we have fixed the worst of our problems.  Is FRBR/RDA the answer?  I suspect not, simply because a new way needs to be much easier to describe and apply.

Google is, and has been for a while, the 800 pound gorilla in the search business.  This came about because their search tools were, and are, simply better than their competitors.  I don’t think this will last forever, but there are many benefits to their dominance.  They are able to set “standards” for web design that encourage compliant web site design and discourage  link farms and spam sites.  They have mastered, to a large extent, the art of interpreting the keyword search.  People now think in keywords when they search (which is why the natural language search engines are languishing in obscurity).

In libraryland, OCLC is our 800 pound gorilla.  When they come out with something new (and the last couple of years have been fantastic, with WorldCat leading the way), libraries pay attention.  If they set a particular course, it makes a great deal of sense to follow that same path.

Is this the best way, though?  Should the 800 pounders lead the way in information discovery?  How might they prevent innovation from happening, or are we doing that to ourselves already?  Is the slow pace of FRBR/RDA a reflection of the size of the beast as it slouches towards Bethlehem to be born, or simply the complexity of the solution?

One thing I have noticed on many blogs and listservs is that we love to talk about what is wrong and right about libraries and technology and search, but it is usually individuals and small groups taking the lead and deciding to blaze a new trail.  Open-ILS and LibraryThing are but two examples of dozens where people saw a need and decided to take charge of fulfilling it.

Why haven’t we come up with a new way to deal with bibliographic information?  Does one person, or a group, need to simply decide to do it?  The library community seems to be spinning its wheels on the issue, so perhaps this is the case.

Who wants to take on the challenge?

  • Share/Bookmark
Jun 27 2007

Google as Publisher?


Google as Publisher : Is Google Poised for a New Push into the Information Industry? is a report for sale ( for $1,295.00!!!) by Outsell Inc. detailing how the world’s largest search company would be able to become one of the world’s top publishers “with the flip of a switch.”

For those of us without the chunk of change to purchase the article, their Press Release will just have to do.

In regards to the report, it would be a financial gain for Google to start selling the public domain books they have scanned. They would become the world’s largest print-on-demand company, and could reduce the number of out-of-print books significantly.  Food for thought….

found on Open Access News

  • Share/Bookmark
FireStats icon Powered by FireStats