./NMF -i input_data_matrix
General requirements and recommendations:
1. The input data matrix must contain a column of row names (first column) and a row of column names (first row) such the [0, 0] entry of the matrix is empty.
2. For an input matrix containing no zero entries, it is recommended that idealization be set to 0 (default), scaling to 'T' and normalizing to 'T'.
3. For an input matrix containing zero entries, it is optional to set idealization to 0.1. However, it is recommended that both scaling and normalizing be set to 'F' particularly when idealization is set to 0 in order to avoid numerical issues.
4. If a large number of zero entries are presented, idealization is recommended as well as setting both scaling and normalizing to 'T' for faster convergence.
-s: scheme (algorithm), 'KL', 'Renyi', 'ED', 'GammaH', 'GammaJ', 'Gamma_dualKL', 'IG_dualKL'
-n: number of iterations (update steps), defaults to 2000
-c: number of runs of chosen algorithm, defaults to 20
-extscale: maximum number of tried runs
-k: rank range, defaults to 2-2
-t: clustering target, defaults to 'PATTERN' for clustering based on columns of the input matrix; optionally 'AMPLITUDE' for clustering based on rows of the input matrix
-cs: clustering scheme, defaults to 'Binary' which implements consensus clustering
-r: reference file (string) specifying true class labels, if available, for calculating misclassification rate
-scaling: initial scaling for faster convergence, defaults to 'F'
-normalizing: H matrix normalization, defaults to 'F'
-alphas: csv string, valid only for Renyi algorithm, defaults to 1.0
-idealization: defaults to 0, optionally 0.1
-nthreads: defaults to 1, can be set to up to 100 (only valid for multithread version)
-negatives: parameter to decide if any negative values present in the input matrix should be used, defaults to 'T' in which case all negative values are set to zero; if set to 'F', program will be terminated due to negative values
-NA: how missing values, if any, should be treated in the input matrix. The following options are available:
0 --- no missing value is allowed
1 --- set missing values as the column mean
2 --- set missing values as the column median
3 --- set missing values as the row mean
4 --- set missing values as the row median
5 --- ignored during simulation.
Devarajan, K., Wang, G. (2016). hpcNMF: A high-performance toolbox for non-negative matrix factorization. COBRA Pre-print Series, Article 115. http://biostats.bepress.com/cobra/art115 (to appear).
Devarajan, K., Wang, G., Ebrahimi, N. (2015). A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing. Machine Learning, Apr 1; 99(1):137-163. doi: 10.1007/s10994-014-5470-z. PMID: 25821345. COBRA pre-print series, Article 80. (July 2011). http://biostats.bepress.com/cobra/art80.
Devarajan, K. Cheung, V.C. (2014). On non-negative matrix factorization algorithms for signal-dependent noise with application to electromyography data. Neural Computation, Jun;26(6):1128-68. Epub 2014 Mar 31. doi: 10.1162/NECO_a_00576. PMID: 24684448.
Devarajan, K., Ebrahimi, N. (2008). Class discovery via nonnegative matrix factorization. American Journal of Mathematical and Management Sciences, 28(3&4): 457-467.