When I did something similar a couple years ago for work, I used them to derive the vectors (because part of the goal was to guess which classification some unfiled application would fall under & how similar it was to competitors). It makes less sense to use them in training a model if you only care about genuine similarity, & not how easy it is to find the similarity.

Resident hypertext crank. Author of Big and Small Computing: Trajectories for the Future of Software. http://www.lord-enki.net

