In response to what we discussed in class, here is some references related to the long-tail in social bookmarking.
Terms in a tagging system are usually considered to follow a power-law distribution. They tend to converge into a small subset of prevalent keywords, and other obscure or problematic tags fall into the “long-tail”, thus get filtered out of the central area. This is regarded as an illustration of Zipf’s Law (Zipf, 1935): “in a corpus of natural language utterances, the frequency of any word is roughly inversely proportional to its rank in the frequency table”. This is useful for filtering out the problematic tags, and reach a converging point of the collective intelligence. However, some researchers point out that there are hidden useful information in the long tail since the long-tail contains informal metadata, and searching method should be improved to search across the long tail, rather than only use a small subset of the tags (Tonkin, 2006).