TECH621 Discussion: The “Long Tail” in Social Bookmarking

In response to what we discussed in class, here is some references related to the long-tail in social bookmarking.

Terms in a tagging system are usually considered to follow a power-law distribution. They tend to converge into a small subset of prevalent keywords, and other obscure or problematic tags fall into the “long-tail”, thus get filtered out of the central area. This is regarded as an illustration of Zipf’s Law (Zipf, 1935): “in a corpus of natural language utterances, the frequency of any word is roughly inversely proportional to its rank in the frequency table”. This is useful for filtering out the problematic tags, and reach a converging point of the collective intelligence. However, some researchers point out that there are hidden useful information in the long tail since the long-tail contains informal metadata, and searching method should be improved to search across the long tail, rather than only use a small subset of the tags (Tonkin, 2006).

I also found an online forum ( discussing that you can use long-tail keywords to get traffic to your websites. The basic idea is that you use low competitive keywords to bookmark your website, and each of them get a low volume of searches, but cumulatively, you will get enough traffic. I haven’t looked into whether there are scholarly publications about this idea. I am sure there are tons of other papers mention this, just list a few here:
1. Golder, S., & Huberman, B. A. (2005). The structure of collaborative tagging systems. Arxiv preprint cs/0508082.
2. Tonkin, E. (2006). Searching the long tail: Hidden structure in social tagging. Proceedings of the 17th ASIS&SIG/CR Classification Research Workshop. Austin, TX.
3. Zipf, G. K. (1935). The psycho-biology of language: An introducation to dynamic philology. Boston, MA.