README file for moVMF v. 1.0
USAGE: moVMF [switches] word-doc-file
-a algorithm
s: moVMF algorithm (default)
-i [s|p|r]
initialization method:
s -- subsampling
p -- random perturbation (default)
r -- totally random
f -- read from file
-c number-of-clusters
-e epsilon
-s suppress output
-v version-number
-n no dump
-d dump the clustering process
-p perturbation-magnitude the distance between initial concept
vectors
and the centroid will be less than this.
-N number-of-samples
-O the name of output matrix
-t scaling scheme
-K lower bound on Kappa
-Z upper bound on Kappa
-S Run SOFT moVMF (default is HARD moVMF)
-E encoding-scheme
1: normalized term frequency (default)
2: normalized term frequency inverse document frequency
-o objective-function
1: nonweighted (default)
2: weighted
File Format
The input file must be in the CCS sparse matrix format.
As an example, for the input matrix, each column represents a document
(or some other entity), and each row represents a word (or a feature).
The clustering is written out to an output file, whose format depends on
whether we are doing soft or hard clustering.
If one is doing soft-clustering for example, then the output file takes
this format:
[num documents]
[p(c_1|d_1) ... p(c_k|d_1)]
[p(c_1|d_2) ... p(c_k|d_2)]
...
[p(c_1|d_n) ... p(c_k|d_n)]
That is, this clustering information file just list the posteriors
probability values for each class given the document. Each row has 'k'
values, if we created 'k' clusters, and there are 'n' rows, one for each
document.
Sample commandline
Suppose we have a dataset called 'data'. Thus we should have the
following files in the current directory. data_dim, data_txx_nz,
data_col_ccs, data_row_ccs
moVMF -c 3 -S -K 1 -Z 100 -t txx -O clusters data
This command takes the dataset 'data' and produces 3 soft clusters. We
use -K to give a lower bound on kappa. If you do not give this option,
then the program will not produce correct results. -Z is used to give an
upper bound on the estimate of the kappa parameter. The clusters will be
left in a file of the name clusters_txx_doctoclus.3
Last modified: Mon Oct 31 11:22:47 CST 2005