Single process-single/multi thread
Parallel execution via MPI
SiGN-L1 estimates gene networks from gene expression data. For the simple network estimation, the structural equation model is available. SiGN-L1 can estimate the network structure using L1-regularizing sparse learning algorithms such as lasso. The network profiler estimates the multiple network structures based on the extra data called modulator that characterizes the individual samples.
In the structural equation model, the network structure can be output in the edge list file, the coefficients matrix or the CSML format. In the network profiler mode, the network structures can be obtained in the edge list files, or coefficient matrices. Please be careful that the network profiler estimates the network for every sample in your input data matrix.
SiGN-L1 supports single process-multi thread execution and multi process multi thread execution with MPI (hybrid parallelization). The thread parallelization is realized by OpenMP. Therefore you can control the number of threads by the environment variable available for OpenMP, e.g. OMP_NUM_THREADS for the number of threads per process.
For the network estimation with the structural equation model, the parallelization is realized by splitting the target (child) genes. Thus the degree of the parallelization (the number of concurrent execution) is limited to up to the number of children.
For the network profiler, it performs one-leave-out cross validation and structure estimation for every sample in the input data. Thus, the higher degree of parallelization is achieved and virtually you do not need to mind the upper limitation of the degree of parallelization.
SiGN-L1 accepts an EDF file as an input data matrix. For other list files, a text file in which a gene name or an 1-origin (1-based) index number is written in a line is acceptable (-x, -y, -z, --select-sample-cv, --select-sample-final options).
-m { semlasso | npflasso | npfenet | npfrenet }
Method (mode, or algorithm) of the program. The semlasso mode performs the network estimation by the structural equation model with lasso. The npflasso method perfoms the network profiler with lasso. The npfenet method performs the network profiler with elastic net. The npfrenet method performs the network profiler with recursive elastic net. By default, npflasso is assumed.
-x file
Parent candidate (regulator) list file. The file needs to be a text file where each line contains the single gene name of a regulator.
-y file
Children (target) list file. The file needs to be a text file where each line contains the single gene name of a target.
-z file
Modulator list file for the network profiler mode. Available for the network profiler mode (-m npflass and -m npfenet) only. The file needs to be a text file where each line contains the single gene name of a modulator.
-Z file
Modulator data file for the network profiler mode. Available for the network profiler mode (-m npflass and -m npfenet) only. If this is not specified, the input data file is used for the modulator data. The file can be a tab-separated matrix file or an EDF file. The file type can be specified by the --Z-type option. By default, an EDF format file is assumed. The file needs to contain the same number of samples as in the input file.
--Z-type { edf | matrix }
The file format type of the modulator data file for the -Z option.
list: The file contains the list of gene names. Each line has one gene name.
matrix: The file contains the tab separated text file of a matrix representing the modulator values.
--Z-args key=value,...
Options for the modulator data file given by the -Z option. See File Formats for available options.
--out-list prefix
Output the estimated networks as an edge list. The multiple files will be generated based on the parallelization. There are several formats for the list format output. See the --out-list-type options.
--out-list-type { 1 | 2 | 3 }
Type of the list format for the --out-list option. 3 is the most smallest format.
--out-B prefix
-B prefix
Output the estimated coefficient matrix. The coefficients for a single child are stored in a single file. The file will have n rows and p columns (n-by-p matrix) where n represents the number of samples and p the number of parent genes (regulators). If the semlasso mode is performed, then the file has only one row. The files can be distinguished by the file postfix number. By default, the 1-based (1-origin) index of the children list given by the -y option in the input file is used as a postfix number. Specify the --out-B-name option to use the gene name as the file postfix.
--out-CSML file
Output the estimated network in the CSML format. This is available for the semlasso mode only.
-H value,...
List of comma-separated real values of the hyperparameter candidates.
Available only for the network profiler mode.
The cross validation is performed to determine the hyperparamter, and the best
value is chosen from the list. The default list is 0.01,
-G value,...
List of comma-separated real values of λ2 candidates.
This is available for the network profiler mode with elastic net model
only (-m npfenet).
The cross validation is performed to determin λ2, and
it is chosen from the list. The default list is 0.00001,
--select-sample-cv file
--select-sample-final file
--fix-cv-mui
--fix-final-mui
--log-mode n
-L n
--help
-h
Show the help message and quit.
-v n
Verbose mode. By default, 0 is assumed.