Post Archive

› January 15, 2005

Tagalicious - some thoughts on tagging content

  • Reported by Andreas

Recently, there is a lot to do about tagging online content. Needless to say, much of the hype started with services such as del.icio.us and Flickr, amazing pieces of social software that allow easy catalogizing (tagging) of digital content. At the same time, they make it easy for the user to keep track of the catalogue as a whole by means of RSS technology. The uncontrolled vocabularies that emerge through user-generated tagging are also known as folksonomies (see also Adam Mathes' fine paper on folksonomies + Louis Rosenfeld's criticism on the concept). To continue the story: on January 2nd, Brian Dear was dreaming about Taggle, "a service where you type in a keyword, and you get back all the hits that have that word as a tag". And a week or so later, Technorati launched its Technorati Tag service, allowing people to browse through blog posts, Flickr photos and del.icio.us links by means of, guess what, tags.

Now, I think all this tagging is a good thing — metadata are important, certainly when talking about visual data such as pictures. What I don't quite agree with though, are the ways in which tags are linked to the content they describe.

Take a look at an individual photo posted at Flickr (example): everybody understands that the tags on the right describe the picture on the left, but some semantic/structural glue is lacking, in my opinion. (This is also true for the picture's title and copyright information.) A quick peek at the page's source, for instance, reveals that nor the tags, title or copyright info are reused as attribute values in the page's head section, or else, as alternative text for the picture they describe. The all sizes page doesn't offer much more: no alt or title attribute values here either. Furthermore, the keyword, comment and title EXIF fields of the JPEG file you download remain empty. Once off the web, the tags, the CC license, etc. are gone...

Del.icio.us has another problem. It doesn't allow spaces in tag names, resulting in the use of non-existing words: tocheckout is one of them. Thus, in certain cases, you only have the choice to assign flawed metadata to the content you tag, thereby lowering the value of your tagging effort.

Technorati Tag then hooks in on the categories people use for posting on their blog. Sounds fine to me. However, for people without RSS/Atom or categories, Technorati suggests you insert <a href="http://technorati.com/tag/[tagname]" rel="tag">[tagname]</a> under your blog posts. It might well work with Technorati's Tag service, but in fact it is nothing more than a (meaningless) link under a blog entry. Same story here: I have the feeling some semantic glue (à la Dublin Core, or XFN) is lacking.

It might well be that the semantic glue level will be increased in future versions of the various services (Flickr is still in Beta, for instance) or else, that some interesting API implementations will emerge.

What do Web-Graphics readers think of this?

Comments

1. January 15, 2005 09:13 PM

Quote this comment

Jim Posted…

Take a look at an individual photo posted at Flickr (example): everybody understands that the tags on the right describe the picture on the left, but some semantic/structural glue is lacking, in my opinion.

This is where RDF shines. You have multiple resources, and RDF defines the relationship between them. For instance, you could have the following resources:

  • An image
  • Person A
  • Person B
  • Location C

They all have URIs (no, people don't need websites, just invent a URN for them). An RDF file can describe the relationship between them. For instance:

  • Image -> taken by -> Person A
  • Image -> subject -> Person B
  • Image -> location -> Location C

Then you can query an RDF store with something like "show me a list of the photos of Person B", or "show me a list of the photos taken at Location C" and so on. The RDF store can even describe relationships between resources not under the control of the person running the RDF store.

I'm not surprised the HTML and EXIF information is lacking, they aren't suitable for encoding this type of information.

As far as Technorati's markup for tagging is concerned, I don't like the fact that you have to insert elements that have existing semantics into your pages. What was wrong with adding a meta element to the page? The fact that it reminds people that this information is, ultimately, untrustworthy? Or because the link pumps up Technorati's profile, both for Google Pagerank and for readers of the weblogs?

2. January 16, 2005 02:25 AM

Quote this comment

tom sherman Posted…

Interesting thoughts. All this talk of "tags" reminds of the brouhaha surrounding Microsoft's "Smart Tags" technology. Remember that? There's Chris Kaminski's classic article on the subject and the All My FAQs Wiki page on the subject.

I'm sure you'll scoff, but if you want to see a great example of uncategorized tagging gone awry (and useless), check out the "Interests" section of RateMyBody.com (a profile site similar to hotornot.com). Users can list interests, which become links--and hypothetically bring like-minded people together. Problem is, without context, organization, hierarchy, or a controlled vocabulary, it's utterly useless.

3. January 16, 2005 10:36 AM

Quote this comment

Andreas Posted…

I’m not surprised the HTML and EXIF information is lacking, they aren’t suitable for encoding this type of information.

Why no EXIF? Adding metadata to pictures through EXIF fields sounds like a good idea to me.

4. January 16, 2005 01:06 PM

Quote this comment

tom sherman Posted…

Adding metadata to pictures through EXIF fields sounds like a good idea to me.

Agreed. The media company I work for has an online photo sales website. It's important to put copyright info in the EXIF fields, because photos often get separated from their original context once they are downloaded.

I also believe there's a place in the EXIF standard for this information, at least in the 2.2 standard. Take a look at the EXIF 2.2 standard (watch out! 750k PDF!).

Some interesting fields: ImageDescription, Artist, Copyright, DateTime, UserComment, ImageUniqueID, ImageDescription, SubjectArea, DateTimeOriginal, DateTimeDigitized. There might be more--check out section 4.6 of the PDF.

BTW, here's some background info from Kodak on "What is EXIF?" Apologies if I've read the spec incorrectly--specs are boring. :)