Category Archives: Uncategorized

new order

Even as I become progressively less librarian-ish (for someone still officially a LS rather than IS degree-seeking), I still find reference librarian Brian Herzog’s Swiss Army Librarian to be my most compelling professional blog read. More so than the more technically minded Jessamyn West. He’s my antidote to Society for Librarians Who Say Mofo, so it shouldn’t surprise me that he has me reflecting semi-favorably on the concept of abandoning Dewey in the public library.

I’m not an organizational traditionalist – I have 50+ tags for my emails and I still frequently find that I need to search for items, but not because I’ve forgotten what tag something is filed under – so it is probably hypocritical of me to feel that knowledge is something people should have to work for. I’m not so much endorsing Dewey as I am saying that there needs to be places where people learn how to navigate information systems and a public library is a good place for this. Still, Herzog is right and Ranganathan in pointing out that a lot of Dewey organization is counter-intuitive to how people look for information these days. Rigor has it’s place, but any information system should be concerned with saving the time of the user.

Advertisements

i am extremely morose?

I can’t explain why I find this so fascinating. It’s like watching an internet train wreck.

Samuel Morse’s birthday edition:

iamextremely2009april27

tagging and tagging

In response to:

Guy, Marieke & Tonkin, Emma. (2006) “Folksonomies: Tidying up tags? .” D-Lib Magazine 12(1).

Marshall, Catherine C. (2009) “Do tags work?.” Tekka 4(1).

I’ve been reading a lot about tagging for a paper in another class and it amazes me how every article that cites tagging on the web uses delicious and flickr as examples.

Regarding power law, it would be interesting to do a study where participants are asked to tag items without being shown the tags ascribed by others and see if there is an organic consensus.

I think an equally important issue to the power law objection is the design of how tags can be added and searched for. Not all tag-enabled platforms accept tags the same way. If users are only able to tag with single-word tags retrieval is hampered. “Information Retrieval” is not the same thing as “information” “retrieval.” In order to combine multi-words tags into single-word tags, we get “information-retrieval”, “information_retrieval”, and “informationRetrieval.” None of these search terms will bring up the others. In addition, there is the issue of conventional shorthand to contend with. I and many others are more likely to tag with “info retrieval” than writing out information.

classification

In response to:

Hertzum, Morten. (2004) “Small-scale classification schemes: A Field study of requirements engineering.” Computer supported cooperative work 13(1): 35-61.

In the SILS lab this semester we have been working on reorganizing both our internal and external help documentation. I had not previously thought about this as a classification issue, or considered the different alternatives for that structure. Naturally, things must be organized along multiple dimensions. Our public documentation is all stored and shared as XML or static webpages. Our internal documentation is changing from a sort of mixed bag of pages to a more structured wiki-style format.

In each case, the structure is arranged so that the information can be browsed to through multiple facets, but the documents still need a primary classification for ready access. For the purpose of restructuring the pages, I have sifted through all of our public documentation (roughly 75 pages and 100-125 links) and arranged them visually using a drawing program (omnigraffle, though for the document to be available and useful to everyone, I will need to save it as a visio file). In doing so, I have color coded the pages so that they are classified according to the primary subject of the document. In drawing out the site map of our help pages, I’ve been better able to understand the roles of these documents, and should be able to rearrange the hierarchical collection of links on the right side of the lab webpages, instead of having all of our technical documentation misleading labeled “FAQs.”

why I wish I had taken stats…

In response to:

Dumais, Susan. (2003) “Data-driven approaches to information access.” Cognitive science. 27: 491-524.

I was having a hard time understanding what was going on with LSA, and had to write up a description of what I thought was happened while going through the explination in section 2.1. What I got from the list of steps was as follows:

They create a matrix (glorified spreadsheet) of the count of every word in every document, choose a variable (number of terms or frequency of occurrence in a single doc) above which the count is retained and below which the term is discarded (or is that inverse?). Then they create sets of documents based on the similarity of term frequency.

Then immediately afterward, I realized I was wrong. What LSA is actually about is tracking relationships between words. So with the TOEFL example, it is really a huge accomplishment when viewed in context, but it cuts away at the impressiveness that LSA isn’t able to out-perform the ESL students who actually take the test. That is, LSA doesn’t perform as well at interpreting meaning as a native speaker. It is a step, and it outperforms word-matching, but we aren’t there yet.  Given the physician as a synonym for doctor or nurse example, if the LSA dimensions were also controlled for synonymy, it would out perform the ESL students. But the whole point of LSA is that it doesn’t require a thesaurus to control for synonymy. The net effect is, LSA performs at the same level as ESL students, but in different ways. Related to the gmail April Fool’s joke on “Autopilot”, it manes me wonder: if LSA were a student, would it pass it’s classes? After all, the ESL students continue to learn after the test is administered.

I am probably still missing the point on how the magical math works, but it seems to me that physician and doctor would have significantly lower co-occurrence with doctor than it would with nurse, so even though doctor and physician show up in the same contexts, physician and nurse OR doctor and nurse would have higher co-occurrence. Nurse appears some-variable more often in context than either physician or doctor, which seems like it could be a major problem throughout LSA.

Ontology Alphabet-soup

In response to:

Coffin, R. Let Semantics bring sophistication to your applications.

and

Fonseca, Frederico. (2007) “The double role of ontologies in information science research.” Journal of the American Society for Information Science and Technology 58(6): 786-793.

Initialisms and Acronyms are necessary tools of information science, and the bane of understanding. OWL does not make me think of Web Ontology Language. It makes me think of owls. Fly at night, rodent-catching owls. IS is ingrained in my brain as Information Science, and yet it is quite accurately used by Fonseca as Information Systems. Yet every time Fonseca is trying to differentiate ontologies for and of information systems, I was reading information science, muddling my understanding of an abstract subject.

An ontology is a model of that which exists. An ontology can assist in creating modeling tools, or it can be a collection of statements about a domain. Using Coffin’s examples:

Green Pepper is-a software.

Green Pepper is-a vegetable.

This is an important distinction that a simple search for “Green Pepper” on a search engine isn’t going to make. I’m not sure, however, that it really makes the case for ontologies clearer. The domain of Google is everything that has been indexed by Google. Searching “Green Pepper software” would have located items relevant to Coffin’s search, but this is about keyword searching rather than by identifying a defined domain. He explains ontologies as statements about a domain, but Google has a high retrieval relevancy rate (indicated by their market share in searching) without differentiating the domain of food from the domain of software. The majority of Google users don’t know the search strategies for getting more out of Google, and just manage by adding and removing terms from their search.

classification and yes, I’m talking about tagging again

In response to:

Kwasnik, Barbara H.. (1999) “The role of classification in knowledge representation and discovery.” Library Trends 48(1): 22-47.

Hunter, Eric. Classificaton made simple. Chapters 2 and 3.

The a priori requirement of hierarchies and trees contrasts with tagging, and highlights one of the advantages of tagging that I have been considering recently. Time is an important aspect of any organization system, and for a while I had been seeing the issue one-sidedly. User-generated tags are more likely to suffer a degradation in usefulness over time as the trends in assigning keywords may change. As stricter, authority-controlled classification, like Library of Congress Subject Headings, may serve better for recall since the keys to access are consistently applied. However, with may of the things that tags are used to organize (frequently, emerging technology), there isn’t sufficient knowledge before hand or thought on the subject is changing to rapidly to create a hierarchical classification. Tagging serves as an interim measure, making blog posts and the like retrievable in the short-term. LCSH is effective once a subject has made it to printed books.