Category Archives: uni

understanding the Semantic Web

In response to:

Berners-Lee, Tim; Hendler, James & Lassila, Ora. (2001) “The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities.” Scientific American: 1-6.

Shadbolt, Nigel; Hall, Wendy & Berners-Lee, Tim. (2006) “The Semantic Web Revisited.” IEEE Intelligent Systems : 96-101.

A doubter’s (amusing) view of The Semantic Web: Marshall, Catherine C. (2004) Taking a stand on the Semantic Web.

Shadbolt admits that there isn’t a way to effortlessly use ontologies, which limits their application. I am inclined to follow his argument that the implementation of the Semantic Web will depend upon a small core of devoted early-adopters from professional/expert communities. Regardless of sceptics like Marshall, if the promise can be fulfilled in these niche communities, it has the possibility of spreading. I do think that Marshall makes an interesting point in raising the question of possible dangers of full-Semantic Web adoption, though I don’t know enough about information security to know whether her concerns related to identity theft are spurious or well-founded. The narrative presented by Berners-Lee certainly brings to mind potential privacy concerns.

Shadbolt’s comparison of folksonomies (tagging) and ontologies is very useful. The advantage of an ontology, is that by using URIs, an ontology can differentiate between – from Marshall’s example – a beehive (hair style) and a beehive (in which bees live), where as a folksonomy cannot because it depends exclusively on words, and can only be clarified by adding terms to the search string. As an example of the ridiculous places that a casual search can take an information seeker, a circulation clerk describes a catalog search for “harmonica” (please excuse the language). An ontology should allow for more sophisticated searching than a folksonomy, though it is possible that unexpected results can appear in either.

As an interesting note, I had no idea that Creative Commons was an RDF-based representation. My impression was that CC was an information policy advocacy group, and I had never considered how their rights licenses were applied to documents.


verbose and ambiguous

In response to:

Gruber, Thomas R.. (1995) “Toward principles for the design of ontologies used for knowledge sharing?.” International Journal of Human-Computer Studies 43(5-6): 907-928.


Sharman, Raj; Kishore, Rajiv & Ramesh, Ram. (2004) “Computational ontologies and information systems II: formal specification.” The Communications of the Association for Information Systems 14: 1-25.

Gruber’s first case study was a struggle to understand, but having muddled through it, I think I can see the advantages of an ontology for this purpose. To create context, the development of the unit of measure ontology would answer a potential problem in the cutlery: measurements are significant parts of item attributes (the length of a chef’s knife blade, 8″ or 10″), but they can be expressed in different units (English or metric) by different manufacturers. This sort of ontological feature would mean that the retailer doesn’t have to standardize the units when data is entered, but can still retrieve information with a standard metric, yes?

Nicely, the bibliographic case study made a lot more sense to me. However, Gruber failed to define some of the symbols used in the ontologies (at least so far as I could identify. I’m unclear on what =>, <=, and = stands for, and I don’t understand what defining as “if and only if” means. Nor do I understand the meaning of “float” or “double float.” Though I think I understand the general concepts well enough that I can make use of the reading.

CVs in the bibliographic sense

In response to:

Svenonius, Elaine. (2000) “The intellectual foundation of information organization.” Cambridge: MIT Press. Chapter 8.

Bailey, Penny. (2007) “Always start with structure.” Library + information gazette, 19 October–1 November.p. 9.

Zeng, Marcia Lei. 2005. Construction of Controlled Vocabularies, A Primer (based on Z39.19).

In spite of my personal love for tagging, I’ve been developing a higher degree of skepticism regarding their long-term utility in information retrieval and am more prone to agree with Bailey in regard to controlled vocabularies. Of course, it still depends on what one is attempting to classify. The entirety of the internet, for example, is outside the possible scope of a controlled vocabulary simply because the domain expands exponentially every day. I doubt that the proposed librarian-vetted search engine Reference Extract will get out of the planning stages. Admittedly, this project isn’t really about creating a controlled vocabulary, but rather about automatically extracting keywords from a query and matching them to a body of ranked web resources. Still, the project will have to find a way to resolve the issues of automated keyword extraction brought up in the Svenonius chapter. 

In other instances, a controlled vocabulary is key to information retrieval. Using the cutlery catalog project as an example, some knives have a special feature called kullenschliff or scallops. They appear regularly on santoku, though they are not a traditional feature of this style. The presence or absence of kullenschliff is not generally a searchable feature in online retail catalogs – there isn’t a standard for whether kullenschliff are even mentioned, just within the Sur la Table catalog. However, if it were indexed, the synonym ring would need to include kullenschliff, kullens, scallops, possibly other words, and there is the possibility that the understanding of “scallop” might not be the same for all customers. Some customers might think of a serrated edge rather than the kullens in the side of the blade. The term also touches on the issues of international vocabulary. The kullenschliff is German, and yet the feature appears more regularly on a Japanese-style blade. What’s more, it is a challenging word. If I hadn’t picked up a singular obsession with this term and feature because of the cataloging project, it is extremely doubtful that I would be able to remember it.


santoku with kullenschliff

santoku with kullenschliff



Incidentally, as of this post, flickr doesn’t have a single image tagged with “kullenschliff”, but the first image returned for “santoku” is of a blade with this feature.

enterprise information contexts

In response to:

Arnon Rosenthal, Len Seligman & Scott Renner. (2004) “From semantic integration to semantics management: case studies and a way forward.” ACM SIGMOD Record 33(4): 44-50.

Philip A. Bernstein & Laura M. Haas. (2008 ) “Information integration in the enterprise.” Communications of the ACM 51(9): 72-79.

Philip A. Bernstein & Sergey Melnik. (2007) “Model management 2.0: manipulating richer mappings.” In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, p. 1-12.”

Data standards, like the metric standard, have to be flexible enough that they can describe anything thrown at them (ergo, the organization projects we are working on, from kitchen knives to computer programs). The metric system began as a French project to derive a unit of measure from the arc of the earth, and now tells time by the jiggle of an atom. Of course, historians for the meter have discovered that the measure of the meter was fudged from the beginning and scientists know that there is no material from which we can derive a unit of measure. All that can be done is to get very, very close.

This is to say that from the outset, I was, for the first sentence of the case study, skeptical of the DoD Data Administration Program’s (or any very large organization’s) ability to create a standard that will meet all information needs. The pre-implementation system described rather resembles the measurement standards of pre-meter Europe, but information is not quantifiable in the same way that a measure of length or weight or time is. Unified systems, like the DoD program, thus seem impractical. I might be thinking to narrowly about this, but the case studies cited by Rosenthal appear to support me on this.

Sticking to the analogy, the communities of interest (COI) strategy might be like acknowledging that while oats are best sold in kilograms, a horse can still be measured in hands.

document language

In response to:

Svenonius, Elaine. (2000) “The intellectual foundation of information organization.” Cambridge: MIT Press. Chapter 7.


Dunsire, Gordon. (2007) “Distinguishing content from carrier: the RDA/ONIX framework for resource categorization.” D-Lib Magazine 12(1/2): n.p.

Document language is seemingly the easiest concept we have discussed and so far as traditional bibliographic data is concerned, it might require many rules but it is fairly clear. A book, after all, has a fixed number of pages and is of a particular size. My copy of Jonathan Strange and Mr Norrell is 846 pages (plus a few pages of positive reviews, a title page, and about the author page), is softcover (though here I start having a problem because I don’t know if AACR would prefer the use of “paperback”, but that has connotations for me of a light summer read and at over 800 pages JS & Mr N is pretty hefty for beach reading). It has most certainly been printed since the copyright date of 2004, though the softcover is usually released after the hardcover.

But then there are digital documents. The marketing website for the book I’ve been describing is an example. It has probably been available since shortly before the first release of the book, and it still in existance. I can, however, rely on the book being persistent for years, provided I care for it properly. The website could disappear at anytime. It is, however, primarily text-based, so if I wanted to preserve the “about the author” statements “written” by the main characters of the book I could do that easily. In contrast, the site for the new Coraline film is likely to only be available for a limited time, and is so intensely interactive there would be no way for someone other than the owners to preserve it. Its transitory nature perhaps makes it a bad example, but it does require many of the attributes of a RDA/ONIX record (ImageDimensionality: two dimentional, ImageMovement: mixed?, Interactivity: interactive).

It makes sense to me that many of the attributes of carrier are open to being user defined since they are likely to change over time – at this point we still burn CDs and DVDs fairly frequently, but it is probable that this FixationMethod will be supplanted, though we don’t entirely know how yet. It is interesting that fewer of the content attributes can be user defined, but those that can include subject and genre. I’m not clear on why that would be. I’m also bemused by the SensoryMode attribute (sight; hearing; touch; taste; smell; none).  It comes very close to being able to describe a cake.

principles of organizational language

In response to:

Svenonius, Elaine. (2000) “The intellectual foundation of information organization.” Cambridge: MIT Press. Chapter 5.


Jacob, Elin K. & Shaw, Deborah. (1998 ) “Sociocognitive Perspectives on Representation.” Annual Review of Information Science and Technology 33.

I am still reading through the lens of user-generated versus professional organization systems. I do think that Weinberger’s third order of order is a new reality. Perhaps not a completely new, but a difference from previous organization environments. Certainly while items could be found or collocated by title, author, and subject in book-catalogs and then with somewhat greater ease in card catalogs, the degree to which it is possible in digital environments makes it significant. In perspective of the principles of organization, one of the ways that the third order relates to organization is through tagging. Individual knowledge structures in the form of tags attached to an item can reflect only one person’s organizational interests, but one user’s tags will probably reflect the interests and cognitive patterns  of the discourse domains discussed by Hjørland and Albrectsen. As a discourse domain is a construct of individual knowledge structures, individual’s tag contributions taken in aggregate should create a functional organization system for a relatively homogeneous discourse community. The problem comes from the need to collogate according to the knowledge and needs of a heterogeneous group. Tibbo points the problem out in relation to abstracting language, but anyone interested in tagging for long-term organization needs to be concerned about the changing nature of language and meaning. Tags may bring up current documents, but there is a likelihood that this sort of organizational structure would suffer degradation of value over time.

project planning

system users