`signssm` -- SiGN-SSM: Gene Network Estimation with State Space Model.

*Single process-single/multi thread*

signssm [ *options* ] *input_file*

*Parallel execution via MPI*

mpirun [ *MPI options* ] *INSTALLPATH/*signssm [ *options* ] *input_file*

*Parallel execution via Grid Engine in HGC Shirokane1/2*

qsub -t 1-*N* [ *GE options* ] *INSTALLPATH/*signssm_sge.sh
[ *options* ] *input_file*

After finished the estimation, SiGN-SSM produces three output files per single model.

*prefix*.D*000*.S*000*.A.dat*prefix*.D*000*.S*000*.B.dat*prefix*.D*000*.S*000*.K.dat

The first file ("`*.A.dat`") contains the estimated SSM model parameters.
The file consists of 9 matrices, each starts with a header line, followed by a tab-separated
matrix. A header line consists of 3 tab-separated columns. Each column represents the matrix
(parameter) name, the number of rows of the matrix, and the number of columns of the matrix.
A vector is represented as a single column matrix. The matrices are separated by an empty line.
The matrices names are as follows:

- "
`SSM_p_k`": the value of*p*(the number of genes) and*k*(the size of system dimensions) represented by a two-column single-row matrix. - "
`SSM::exp`":*x*_{0}(the initial state vector). - "
`SSM::var`": variance matrix of the state variables. - "
`SSM::H`", "`SSM::R`", "`SSM::R`, "`SSM::F`", "`SSM::Q`", "`SSMResult::D`", "`SSMResult::L`": Matrices or vectors*H*,*R*,*F*,*Q*,*D*,*L*explaned in ABOUT SSM.

The second file ("`*.B.dat`") consists of one tab-separated line,
each column represents the size of dimensions, the set ID, the log-likelihood,
the BIC, the number of loops, whether or not the likelihood during the
EM algorithm decreased monotone, and whether or the algorithm converged
or not within the loop limitation, respectively.

The third file ("`*.K.dat`") is the state variables and
observation variables calculated from the estimated model parameters.
This file is intended to be read by Gnuplot.
The file contains 7 matrices.
The order and meaning of the matrices are explained in
Output Files in HOW TO USE.

In addition to these file, if the `--pvalues on` is specified (default),
the following two files are generated.

*prefix*.D*000*.S*000*.P.dat*prefix*.D*000*.S*000*.m.dat

The former file contains a matrix of the p values. Each p value
corresponds to the statistical significance (the result of the statistical
test) of the value in the input data at the same position (excluding
header rows and gene names). The latter file contains the integrated
p values for genes by the statistical meta analysis.
The calculation and generation of these files can be suppressed by
"`--pvalues off`" option.

(alphabetical order)

`-d` *X* [ `-`*Y* [`:`*Z* ]]`,` ...

Region of dimensions. *X*, *Y*, and *Z* are integer values.
(default: `4`)

If you specify only *X*, then the program estimates an SSM for the single
specified dimension. Specifying *X*-*Y* means to estimate multiple
SSMs for multiple dimensions ranging from *X* to *Y*. If you specify
*Z*, it represents the increment step from *X* to *Y*. If you
omit it, *Z*=1 is assumed. For example, "`-d 4-6`" represents
the program to estimate SSMs for dimensions 4, 5, and 6. The multiple region can be
specified by concatenating them with commas. For example,
"`-d 4,8-10`" represents to specify dimensions of 4, 8, 9, and 10.
The program does not select the best dimension automatically. Users may select
one model by comparing BIC (Bayesian Information Criteria) of the estimated models.

`-h`

Show the help message and quit.

`-i` *N*

Integer ID to distinguish concurrent executions. (default: *N*=`1`)

This value is used to initialize a random number generator.
Therefore, the program will produce the same result if you specify the
same ID and the same random seed (see `-r` option).
Use this option to avoid to produce the same results when you run
in parallel on a job dispatch/queueing system such as SGE.
Do not specify this if you run the program with signssm_sge.sh on SGE.

`-L` { `0` | `1` | `2` }

Log file mode (default: `0`).

`0`: Automatic mode. When running as a single process (multi-thread) program, the log message is output to the standard error. When running with MPI, only the root process generates the log file named "" where*prefix*.logis specified by*prefix*`-o`option. If`--sge`or`--perm`option is specified, each process produces the log file namedwhere*prefix*.log.*XXXXXX*is a 6-digit number that represents the task ID.*XXXXXX*`1`: Force each process to output a log file named "".*prefix*.log.*XXXXXX*`2`: Redirect all the messages to the standard error. The log files are not produced.- Other: Suppress all the log message.

`--perm` *ssm_file*

Permutation test mode. (default: not specified)

This mode reads the SSM model parameters *ssm_file*, and then performs a single execution of permutation test.
To perform permutation test, you need to run many times and compile the results into a single file,
using `signproc` program. This outputs a single file "` prefix.XXXXXX`"
where

`--ssmperm` *key1=value1,key2=value2,...*

Compilation of permutation test result mode. (default: not specified)

This compiles files of permulation test results generated by permuation test mode (`--perm` option)
into a single network and output it as a tab seperated text file. The output file name can be given
by the `-o` option. You need to specify arguments by the key=value style format.
Available arguments are listed below:

`prefix=`*prefix* | The prefix of the files to be processed.

`ssm=`*ssm_file* | File containing the SSM model parameters (*.A.dat file).

`bg=`*N* | The first index of the suffix of the processing files. (default: 1)

`ed=`*N* | The last index of the suffix of the processing files. (default: 1000)

`th=`*threshold* | The significance level of the p value left in the final network.
(default: 0.05)

`-r` *N*

Integer random seed. (default: *N*=`38`)

This value is used to initialize the random number generator together with
`-i` option.

`-s` *N*

Number of sets (or executions) for a single dimension. (default: *N*=`1`)

The program produce *N* results for a single size of dimensions specified by `-d` option.
If `--perm` is specified with this option, the single job (process) performs *N* tests
and output *N* test results in a single output file.

`--shift` { `0` | `1` | `2` }

Mean shift mode. (default: `1`)

`0`: Do not perform mean shift of the input data.`1`: Perform mean shift for each replicate in the input data before estimation.`2`: Perform mean shift for the entire time point in the input data before estimation.

`--sge`

SGE mode. (default: not specified)

If this is specified, the program runs for only 1 set with the iterations given by
`-n` option regardless of `-s` and `-d` options. That is, the
program estimates for the *i*-th set where *i* represents the ID
specified by `-i` option. The total number of sets are (sets) x (dimensions),
and each execution corresponds to one of these sets.
This is useful when you execute via SGE.

`--ssm` *ssm_file*

Read the SSM file and apply it to the input data. (default: not specified)

If this is specified, the program reads the SSM model parameters from a file and does not estimate them from the input data. This is useful when you want to apply the estimated model parameters to the different input data set to calculate the state and observation variables from the model and the input data.

`--threads` *N*

Number of threads. (default: *N*=`1`) AVAILABLE ONLY FOR SINGLE PROCESS EXECUTION.

This specifies the number of threads to be used when it runs as a single process. Specify a value less than or equal to the number of CPU cores in your computer.

(alphabetical order)

`-e` *EXT*

The suffix (extension) of the output file names. (default: "`dat`")

`--each` { `on` | `off` }

Output matrices and vectors of the result into separate files. (default: not specified)

The estimated SSM model parameters ("`*.A.dat`" file) are stored in files named
"` prefix.D000.S000.*.dat`" where
"

`H`: observation matrix H.`R`: observation noise vector (diagonal elements of) R.`F`: system transition matrix F.`D`: gene-to-module projection matrix D.`L`: diagonal elements of*L*=*H*'*R*^{-1}*H*.`x`: initial state variable*x*_{0}.

The state and observation variables ("`*.K.dat`" file) are store in files named
"` prefix.D000.S000.*.dat`" where
"

`Xp.`: one-ahead-prediction of the state variables for the*r*-th replicate.*r*`Xf.`: filtering of the state variables.*r*`Xs.`: smoothing of the state variables.*r*`Yp.`: one-ahead prediction of the observation variables.*r*`Yf.`: filtering of the observation variables.*r*`Ys.`: smoothing of the observation variables.*r*

In addition, "` prefix.Y.r.dat`" is output and contains
the mean shifted (by default) input data of the

`-o` *PREFIX*

The prefix of the output file names. (default: "`result`")

`--proc-file`

`-P`

Insert the number of processes to the output file prefix. (default: not specified) AVAILABLE FOR MPI EXECUTION ONLY.

If this is specified, letters "`.P 0000`" is added at the end of
the prefix (

`--pvalues` { `on` | `off` }

Calculate and output the p values of the input time series data. (default: `on`)

`--state` { `on` | `off` }

Output the estimated state and observation variables. (default: `on`)

(alphabetical order)

`--em-loop` *N*

`-l` *N*

Number of maximum loops of the single EM algorithm execution until converged. (default: *N*=`40000`)

`-F` { `on` | `off` }

`--constrain-F` { `on` | `off` }

Apply constraint on diagonal elements of F (system coefficient matrix). (default: `on`)

`-g` *X*

`--constrain-Fg` *X*

Strength of constraint on F. (default: *X*=`0.8`)

A real value ranging from 0.0 to 1.0 can be specified. Strong constraint
may cause difficulty in model parameter estimation to fit to the input
data. This is used only when "`-F on`" is specified (default).

`-n` *N*

Number of iterations (number of initial values) for a single EM algorithm execution.
(default: *N*=`100`)

The program chooses the best result from *N* executions of the EM algorithm.

`--retry` *N*

Maximum retry count. -1 for unlimited retry. (default *N*=`-1`)

The EM algorithm somethimes fails when the initial values are bad. If SiGN-SSM detects the estimation failure, then it automatically retries the estimation with different initial values. This specifies the maximum retry count.

`--RrI` { `yes` | `no` }

`-R` { `yes` | `no` }

Whether or not assume that the observation noise *R* = *r* *I*. (default: no)

If `yes` is specified, then *R* = *r* *I* is assumed,
and if `no`, then *R* = diag(*r*_{1}, ..., *r*_{p} )
is assumed.

`--update-mu` { `on` | `off` }

Whether or not update *μ*_{0} (= *x*_{0}).
(default: on)

If `no` is specified, then the initial state variable *x*_{0}
is fixed and not updated during the EM algorithm.

(alphabetical order)

`--F-max` *X*

Upper bound of random values for initializing F. (default: 1.5)

`--F-min` *X*

Lower bound of random values for initializing F. (default: -1.5)

`--H-mean` *X*

Mean of normally distributed random values for initializing H. (default: 0.0)

`--H-SD` *X*

Standard deviation of normally distributed random values for initializing H. (default: 1.0)

`--mu` *X*

Mean of normally distributed random values for initializing
*x*_{0}. (default: 0.0)

`--SD` *X*

Standard deviation of normally distributed random values for initializing
*x*_{0}. (default: 1.0)