SiGN-BN Two Step HC algorithm

Abstract

The Two Step HC (TSHC) algorithm is a Bayesian network structure learning algorithm developed for mRNA and miRNA (ncRNA) mixed data sets.

For data sets including both mRNAs and miRNAs, the ordinal network estimation algorithm does not work well because of the difference of expression patterns between mRNAs and miRNAs.

The Two Step HC algorithm overcomes this problem by performing the HC algorithm twice.

Method

As described above, the Two Step HC performs the HC algorithm twice.

In the first step, it estimates only relationships (network edges) from miRNAs to mRNAs. Edges from miRNA to miRNA, from mRNA to miRNA, and from mRNA to mRNA are all prohibited.

In the second step, the edges from miRNAs to mRNAs estimated in the first step are fixed. Then, it estimates other relationships except for miRNAs to miRNAs.

Please refer to our paper by Arima et al. (2014) for the real example of the gene network analysis with this algorithm.

Software Supporting TSHC

SiGN-BN HC+BS

Later than release 1.1.0

SiGN-BN NNSR

Later than release 0.10.

How to use

Specify the following options for SiGN-BN HC+BS and/or SiGN-BN NNSR software.

--algo tshc

Specify to use the Two Step HC algorithm instead of using the normal HC algorithm.

-A key=value,...

Detailed options for the algorithm. Following keys and their values are available.

first=file

A file that contains a list of nodes (genes) of miRNAs. One line consists of one node name. Edges between genes listed in the file are prohibited during the network estimation.

firstnet=file

(Optional) A file that contains the miRNA target prediction results. The format is a tab separated text file. Each line consists of a gene name of an miRNA at the first column followed by a tab separated list of its predicted target gene names. If this file is specified, edges only between miRNAs and their targets can be estimated as their candidates of children.

--skel-type parents_targets --skel TF_file --skel-args inv.target=mi_file

(Optional) This restricts the parents of mRNAs to be only transcription factors (or other specific genes given by the file) in the second HC step. Relationships between miRNAs are also prohibited. The file TF_file and mi_file are plain text files where each line corresponds to a gene name of a transcription factor or an miRNA.

--skel-type CHLIST --skel file

(Optional) This uses, for example, ChIP-Seq data to restrict the edges in the second HC step. The file file is a tab separated text file where in each line the first column represents a gene followed by list of candidate children. This does not restrict children of genes that does not appear in the first column of the file.

-p, -m, -s, and -S are effective for the fist HC execution. If you want to specify these options for the second HC estimation, specify them as arguments for the -A option such as p=, m=, s= and S=.

E.g. "-A p=10,m=20" restricts the parent candidates and maximum parents are 10 and 20 respectively in the second HC execution.

Note for the NNSR algorithm

Specify always the following two options: "--rw-int 1" and "--ex-int 1".

Back to Manual