Ontology Alphabet-soup

In response to:

Coffin, R. Let Semantics bring sophistication to your applications.

and

Fonseca, Frederico. (2007) “The double role of ontologies in information science research.” Journal of the American Society for Information Science and Technology 58(6): 786-793.

Initialisms and Acronyms are necessary tools of information science, and the bane of understanding. OWL does not make me think of Web Ontology Language. It makes me think of owls. Fly at night, rodent-catching owls. IS is ingrained in my brain as Information Science, and yet it is quite accurately used by Fonseca as Information Systems. Yet every time Fonseca is trying to differentiate ontologies for and of information systems, I was reading information science, muddling my understanding of an abstract subject.

An ontology is a model of that which exists. An ontology can assist in creating modeling tools, or it can be a collection of statements about a domain. Using Coffin’s examples:

Green Pepper is-a software.

Green Pepper is-a vegetable.

This is an important distinction that a simple search for “Green Pepper” on a search engine isn’t going to make. I’m not sure, however, that it really makes the case for ontologies clearer. The domain of Google is everything that has been indexed by Google. Searching “Green Pepper software” would have located items relevant to Coffin’s search, but this is about keyword searching rather than by identifying a defined domain. He explains ontologies as statements about a domain, but Google has a high retrieval relevancy rate (indicated by their market share in searching) without differentiating the domain of food from the domain of software. The majority of Google users don’t know the search strategies for getting more out of Google, and just manage by adding and removing terms from their search.

classification and yes, I’m talking about tagging again

In response to:

Kwasnik, Barbara H.. (1999) “The role of classification in knowledge representation and discovery.” Library Trends 48(1): 22-47.

Hunter, Eric. Classificaton made simple. Chapters 2 and 3.

The a priori requirement of hierarchies and trees contrasts with tagging, and highlights one of the advantages of tagging that I have been considering recently. Time is an important aspect of any organization system, and for a while I had been seeing the issue one-sidedly. User-generated tags are more likely to suffer a degradation in usefulness over time as the trends in assigning keywords may change. As stricter, authority-controlled classification, like Library of Congress Subject Headings, may serve better for recall since the keys to access are consistently applied. However, with may of the things that tags are used to organize (frequently, emerging technology), there isn’t sufficient knowledge before hand or thought on the subject is changing to rapidly to create a hierarchical classification. Tagging serves as an interim measure, making blog posts and the like retrievable in the short-term. LCSH is effective once a subject has made it to printed books.

google users now more tired that depressed

Still a FAIL for international mental health status.

googleAlso increasingly lonely and jealous, possibly because we are no longer happy or pleased. But why are we terrified of Chinese people?

CVs in the bibliographic sense

In response to:

Svenonius, Elaine. (2000) “The intellectual foundation of information organization.” Cambridge: MIT Press. Chapter 8.

Bailey, Penny. (2007) “Always start with structure.” Library + information gazette, 19 October–1 November.p. 9.

Zeng, Marcia Lei. 2005. Construction of Controlled Vocabularies, A Primer (based on Z39.19).

In spite of my personal love for tagging, I’ve been developing a higher degree of skepticism regarding their long-term utility in information retrieval and am more prone to agree with Bailey in regard to controlled vocabularies. Of course, it still depends on what one is attempting to classify. The entirety of the internet, for example, is outside the possible scope of a controlled vocabulary simply because the domain expands exponentially every day. I doubt that the proposed librarian-vetted search engine Reference Extract will get out of the planning stages. Admittedly, this project isn’t really about creating a controlled vocabulary, but rather about automatically extracting keywords from a query and matching them to a body of ranked web resources. Still, the project will have to find a way to resolve the issues of automated keyword extraction brought up in the Svenonius chapter. 

In other instances, a controlled vocabulary is key to information retrieval. Using the cutlery catalog project as an example, some knives have a special feature called kullenschliff or scallops. They appear regularly on santoku, though they are not a traditional feature of this style. The presence or absence of kullenschliff is not generally a searchable feature in online retail catalogs – there isn’t a standard for whether kullenschliff are even mentioned, just within the Sur la Table catalog. However, if it were indexed, the synonym ring would need to include kullenschliff, kullens, scallops, possibly other words, and there is the possibility that the understanding of “scallop” might not be the same for all customers. Some customers might think of a serrated edge rather than the kullens in the side of the blade. The term also touches on the issues of international vocabulary. The kullenschliff is German, and yet the feature appears more regularly on a Japanese-style blade. What’s more, it is a challenging word. If I hadn’t picked up a singular obsession with this term and feature because of the cataloging project, it is extremely doubtful that I would be able to remember it.

 

santoku with kullenschliff

santoku with kullenschliff

 

 

Incidentally, as of this post, flickr doesn’t have a single image tagged with “kullenschliff”, but the first image returned for “santoku” is of a blade with this feature.

“aboutness”

In response to:

Hjørland, Birger.(2001) “Towards a Theory of Aboutness, Subject, Topicality, Theme, Domain, Field, Content … And Relevance.” Journal of the American Society for Information Science and Technology 52(9): 774-778.

Layne, Sara Shatford. (1994) “Some issues in the indexing of images.” Journal of the American Society for Information Science 45(8): 583-588

I am against the term “aboutness.” I fail to see how it adds anything that “subject” doesn’t cover already and refer to more clearly. Frequently, or perhaps I should say formerly, I have been fond of the ten-cent word when a five-cent word would do. However, buried under acronyms and initialisms, and swimming a sea of vocabulary drawn from many disciplines, I believe that information scientists should be cautious in adding new terms. I would say, this is even more important when the new term is drawn from a frequently used word. Rheme is a bit arcane to me, but once looked up is relatively clear. A rheme being the comment made about a theme (the topic). It pairs well with theme, and I have no quarrel with its use. “Aboutness”, however, seems to obscure more than illuminate.For example, the photograph here could be said to be about loneliness from an artistic perspective, but the subject of the image is a flower. Alternately, I could say that the subjects of this image are loneliness, a flower, etc. However, it feels unnatural to me to say that this image is about a flower, loneliness, etc. I will also note that Layne states the subject may be both of and about, which I am using as support of Hjørland’s rejection of the term. I am interested in hearing argument in support of the term, but as it stands it bothers me. One can only say that something is “the picture of loneliness” metaphorically.

Lapsana apogonoides

enterprise information contexts

In response to:

Arnon Rosenthal, Len Seligman & Scott Renner. (2004) “From semantic integration to semantics management: case studies and a way forward.” ACM SIGMOD Record 33(4): 44-50.

Philip A. Bernstein & Laura M. Haas. (2008 ) “Information integration in the enterprise.” Communications of the ACM 51(9): 72-79.

Philip A. Bernstein & Sergey Melnik. (2007) “Model management 2.0: manipulating richer mappings.” In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, p. 1-12.”

Data standards, like the metric standard, have to be flexible enough that they can describe anything thrown at them (ergo, the organization projects we are working on, from kitchen knives to computer programs). The metric system began as a French project to derive a unit of measure from the arc of the earth, and now tells time by the jiggle of an atom. Of course, historians for the meter have discovered that the measure of the meter was fudged from the beginning and scientists know that there is no material from which we can derive a unit of measure. All that can be done is to get very, very close.

This is to say that from the outset, I was, for the first sentence of the case study, skeptical of the DoD Data Administration Program’s (or any very large organization’s) ability to create a standard that will meet all information needs. The pre-implementation system described rather resembles the measurement standards of pre-meter Europe, but information is not quantifiable in the same way that a measure of length or weight or time is. Unified systems, like the DoD program, thus seem impractical. I might be thinking to narrowly about this, but the case studies cited by Rosenthal appear to support me on this.

Sticking to the analogy, the communities of interest (COI) strategy might be like acknowledging that while oats are best sold in kilograms, a horse can still be measured in hands.

document language

In response to:

Svenonius, Elaine. (2000) “The intellectual foundation of information organization.” Cambridge: MIT Press. Chapter 7.

and

Dunsire, Gordon. (2007) “Distinguishing content from carrier: the RDA/ONIX framework for resource categorization.” D-Lib Magazine 12(1/2): n.p.

Document language is seemingly the easiest concept we have discussed and so far as traditional bibliographic data is concerned, it might require many rules but it is fairly clear. A book, after all, has a fixed number of pages and is of a particular size. My copy of Jonathan Strange and Mr Norrell is 846 pages (plus a few pages of positive reviews, a title page, and about the author page), is softcover (though here I start having a problem because I don’t know if AACR would prefer the use of “paperback”, but that has connotations for me of a light summer read and at over 800 pages JS & Mr N is pretty hefty for beach reading). It has most certainly been printed since the copyright date of 2004, though the softcover is usually released after the hardcover.

But then there are digital documents. The marketing website for the book I’ve been describing is an example. It has probably been available since shortly before the first release of the book, and it still in existance. I can, however, rely on the book being persistent for years, provided I care for it properly. The website could disappear at anytime. It is, however, primarily text-based, so if I wanted to preserve the “about the author” statements “written” by the main characters of the book I could do that easily. In contrast, the site for the new Coraline film is likely to only be available for a limited time, and is so intensely interactive there would be no way for someone other than the owners to preserve it. Its transitory nature perhaps makes it a bad example, but it does require many of the attributes of a RDA/ONIX record (ImageDimensionality: two dimentional, ImageMovement: mixed?, Interactivity: interactive).

It makes sense to me that many of the attributes of carrier are open to being user defined since they are likely to change over time – at this point we still burn CDs and DVDs fairly frequently, but it is probable that this FixationMethod will be supplanted, though we don’t entirely know how yet. It is interesting that fewer of the content attributes can be user defined, but those that can include subject and genre. I’m not clear on why that would be. I’m also bemused by the SensoryMode attribute (sight; hearing; touch; taste; smell; none).  It comes very close to being able to describe a cake.