new order

Even as I become progressively less librarian-ish (for someone still officially a LS rather than IS degree-seeking), I still find reference librarian Brian Herzog’s Swiss Army Librarian to be my most compelling professional blog read. More so than the more technically minded Jessamyn West. He’s my antidote to Society for Librarians Who Say Mofo, so it shouldn’t surprise me that he has me reflecting semi-favorably on the concept of abandoning Dewey in the public library.

I’m not an organizational traditionalist – I have 50+ tags for my emails and I still frequently find that I need to search for items, but not because I’ve forgotten what tag something is filed under – so it is probably hypocritical of me to feel that knowledge is something people should have to work for. I’m not so much endorsing Dewey as I am saying that there needs to be places where people learn how to navigate information systems and a public library is a good place for this. Still, Herzog is right and Ranganathan in pointing out that a lot of Dewey organization is counter-intuitive to how people look for information these days. Rigor has it’s place, but any information system should be concerned with saving the time of the user.


i am extremely morose?

I can’t explain why I find this so fascinating. It’s like watching an internet train wreck.

Samuel Morse’s birthday edition:


tagging and tagging

In response to:

Guy, Marieke & Tonkin, Emma. (2006) “Folksonomies: Tidying up tags? .” D-Lib Magazine 12(1).

Marshall, Catherine C. (2009) “Do tags work?.” Tekka 4(1).

I’ve been reading a lot about tagging for a paper in another class and it amazes me how every article that cites tagging on the web uses delicious and flickr as examples.

Regarding power law, it would be interesting to do a study where participants are asked to tag items without being shown the tags ascribed by others and see if there is an organic consensus.

I think an equally important issue to the power law objection is the design of how tags can be added and searched for. Not all tag-enabled platforms accept tags the same way. If users are only able to tag with single-word tags retrieval is hampered. “Information Retrieval” is not the same thing as “information” “retrieval.” In order to combine multi-words tags into single-word tags, we get “information-retrieval”, “information_retrieval”, and “informationRetrieval.” None of these search terms will bring up the others. In addition, there is the issue of conventional shorthand to contend with. I and many others are more likely to tag with “info retrieval” than writing out information.


In response to:

Hertzum, Morten. (2004) “Small-scale classification schemes: A Field study of requirements engineering.” Computer supported cooperative work 13(1): 35-61.

In the SILS lab this semester we have been working on reorganizing both our internal and external help documentation. I had not previously thought about this as a classification issue, or considered the different alternatives for that structure. Naturally, things must be organized along multiple dimensions. Our public documentation is all stored and shared as XML or static webpages. Our internal documentation is changing from a sort of mixed bag of pages to a more structured wiki-style format.

In each case, the structure is arranged so that the information can be browsed to through multiple facets, but the documents still need a primary classification for ready access. For the purpose of restructuring the pages, I have sifted through all of our public documentation (roughly 75 pages and 100-125 links) and arranged them visually using a drawing program (omnigraffle, though for the document to be available and useful to everyone, I will need to save it as a visio file). In doing so, I have color coded the pages so that they are classified according to the primary subject of the document. In drawing out the site map of our help pages, I’ve been better able to understand the roles of these documents, and should be able to rearrange the hierarchical collection of links on the right side of the lab webpages, instead of having all of our technical documentation misleading labeled “FAQs.”

why I wish I had taken stats…

In response to:

Dumais, Susan. (2003) “Data-driven approaches to information access.” Cognitive science. 27: 491-524.

I was having a hard time understanding what was going on with LSA, and had to write up a description of what I thought was happened while going through the explination in section 2.1. What I got from the list of steps was as follows:

They create a matrix (glorified spreadsheet) of the count of every word in every document, choose a variable (number of terms or frequency of occurrence in a single doc) above which the count is retained and below which the term is discarded (or is that inverse?). Then they create sets of documents based on the similarity of term frequency.

Then immediately afterward, I realized I was wrong. What LSA is actually about is tracking relationships between words. So with the TOEFL example, it is really a huge accomplishment when viewed in context, but it cuts away at the impressiveness that LSA isn’t able to out-perform the ESL students who actually take the test. That is, LSA doesn’t perform as well at interpreting meaning as a native speaker. It is a step, and it outperforms word-matching, but we aren’t there yet.  Given the physician as a synonym for doctor or nurse example, if the LSA dimensions were also controlled for synonymy, it would out perform the ESL students. But the whole point of LSA is that it doesn’t require a thesaurus to control for synonymy. The net effect is, LSA performs at the same level as ESL students, but in different ways. Related to the gmail April Fool’s joke on “Autopilot”, it manes me wonder: if LSA were a student, would it pass it’s classes? After all, the ESL students continue to learn after the test is administered.

I am probably still missing the point on how the magical math works, but it seems to me that physician and doctor would have significantly lower co-occurrence with doctor than it would with nurse, so even though doctor and physician show up in the same contexts, physician and nurse OR doctor and nurse would have higher co-occurrence. Nurse appears some-variable more often in context than either physician or doctor, which seems like it could be a major problem throughout LSA.

understanding the Semantic Web

In response to:

Berners-Lee, Tim; Hendler, James & Lassila, Ora. (2001) “The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities.” Scientific American: 1-6.

Shadbolt, Nigel; Hall, Wendy & Berners-Lee, Tim. (2006) “The Semantic Web Revisited.” IEEE Intelligent Systems : 96-101.

A doubter’s (amusing) view of The Semantic Web: Marshall, Catherine C. (2004) Taking a stand on the Semantic Web.

Shadbolt admits that there isn’t a way to effortlessly use ontologies, which limits their application. I am inclined to follow his argument that the implementation of the Semantic Web will depend upon a small core of devoted early-adopters from professional/expert communities. Regardless of sceptics like Marshall, if the promise can be fulfilled in these niche communities, it has the possibility of spreading. I do think that Marshall makes an interesting point in raising the question of possible dangers of full-Semantic Web adoption, though I don’t know enough about information security to know whether her concerns related to identity theft are spurious or well-founded. The narrative presented by Berners-Lee certainly brings to mind potential privacy concerns.

Shadbolt’s comparison of folksonomies (tagging) and ontologies is very useful. The advantage of an ontology, is that by using URIs, an ontology can differentiate between – from Marshall’s example – a beehive (hair style) and a beehive (in which bees live), where as a folksonomy cannot because it depends exclusively on words, and can only be clarified by adding terms to the search string. As an example of the ridiculous places that a casual search can take an information seeker, a circulation clerk describes a catalog search for “harmonica” (please excuse the language). An ontology should allow for more sophisticated searching than a folksonomy, though it is possible that unexpected results can appear in either.

As an interesting note, I had no idea that Creative Commons was an RDF-based representation. My impression was that CC was an information policy advocacy group, and I had never considered how their rights licenses were applied to documents.

verbose and ambiguous

In response to:

Gruber, Thomas R.. (1995) “Toward principles for the design of ontologies used for knowledge sharing?.” International Journal of Human-Computer Studies 43(5-6): 907-928.


Sharman, Raj; Kishore, Rajiv & Ramesh, Ram. (2004) “Computational ontologies and information systems II: formal specification.” The Communications of the Association for Information Systems 14: 1-25.

Gruber’s first case study was a struggle to understand, but having muddled through it, I think I can see the advantages of an ontology for this purpose. To create context, the development of the unit of measure ontology would answer a potential problem in the cutlery: measurements are significant parts of item attributes (the length of a chef’s knife blade, 8″ or 10″), but they can be expressed in different units (English or metric) by different manufacturers. This sort of ontological feature would mean that the retailer doesn’t have to standardize the units when data is entered, but can still retrieve information with a standard metric, yes?

Nicely, the bibliographic case study made a lot more sense to me. However, Gruber failed to define some of the symbols used in the ontologies (at least so far as I could identify. I’m unclear on what =>, <=, and = stands for, and I don’t understand what defining as “if and only if” means. Nor do I understand the meaning of “float” or “double float.” Though I think I understand the general concepts well enough that I can make use of the reading.