If your goal is to find out how people on reddit actually talk about it (rather than what kind of discussion from *other people* on the subject redditors value), a cutoff at the 100 karma threshhold is probably going to produce horribly skewed results: because karma determines presentation order, it grows non-linearly, which means that only a few very popular comments will get more than single-digit karma. If you want to get a smaller data set but keep it representative, I would isolate comments with exactly one karma point.

Written by

Resident hypertext crank. Author of Big and Small Computing: Trajectories for the Future of Software. http://www.lord-enki.net

