Tutorial on Gene Network Analysis with the ΔECv method.


This is a tutorial of the gene network analysis using our Edge Contribution value (ECv) method. This reproduces results presented in Tanaka, Y. et al (2020). In this paper, we have used a 18 sample gene expression dataset (GSE49644) by Sun, Y. et al (2014), that measured two conditions of control and treated with TGFβ for 3 lung cancer cell lines with triplicate experiments. By using the ΔECv method, the paper succeeded in extracting a subnetwork consisting of differentially regulated edges (DREs) related to epithelial mesenchymal transition (EMT) in lung cancer cells.

To obtain the EMT related subnetwork, you first estimate a basal gene network consisting of the entire genes in the dataset using the SiGN-BN NNSR algorithm. Then, by using this estimated basal network, you calculate an ECv for every estimated edge. Finally, you can extract differentially regulated edges (DREs) by screening out edges whose ΔECvs (difference of ECvs between two conditions you want to compare) are greater than a certain threshold.

Step 1: Software and computer preparation

To estimate a basal gene network, you use SiGN-BN NNSR software available on the SiGN-BN website. Please, check if it works well with your computer system. We recommend to use a system equipped with at least 32 CPU cores. SiGN-BN NNSR is already installed in SHIROKANE supercomputer system provided by Human Genome Center, The Institute of Medical Science, The University of Tokyo. So if you do not have a computer enough for network estimation, it is an good option to use it.

The ECv calculation is performed by the separate software. To obtain it, please email Prof. Tamada who is a developer of the ECv method. It is free to academic users. If you want to use it for commercial purposes, please contact TLO Kyoto. It has the rights for commercial use of the method.

^ Go to Top

Step 2: Data preparation

First, download "GSE49644_annotate_results.txt.gz" from Gene Expression Omnibus website. This is a pre-processed and annotated gene expression data matrix. There is a link to download a gzip-compressed text file of the data set. It is already preprocessed by annotating gene names, log-2 conversion and normalization. Decompress the gzipped file, for example, by executing the following command:

$ gzip -d GSE49644_annotate_results.txt.gz

Next, you need to modify it so that SiGN-BN can read it. Extract column 2 of gene names, and column 4 and everything after that consisting of expression data. Then replace "symbol" at the first line with "@name" to convert it to the EDF format. This can be done by the following command:

$ cut -f2,4- GSE49644_annotated_results.txt |sed -e '1 s/symbol/@name/' > GSE49644.txt

Note: Some gene names are wrongly converted by Excel in this file, e.g. 8-Sep, 5-Mar, etc. Replace them with correct gene names if you want to have the correct gene names ;-)

^ Go to Top

Step 3: Network estimation

Estimate a basal gene network with the NNSR algorithm. In SHIROKANE, this can be done by executing the following command (using 64 cores, T=1,000,000 and output threshold 0.1):

$ qsub -o so -e se -pe mpi-fillup 64 ~tamada/sign/signmpi.sh ~tamada/sign/signbnnnsr.0.16.7 -T 1000000 --oth 0.1 -o result1 GSE49644.txt

If you use your own system with OpenMPI, the command may be as follows:

$ mpiexec -np 64 signbnnnsr.0.16.7 -T 1000000 --oth 0.1 -o result1 GSE49644.txt

If the execution succeeds, this generates result1.sgn3 that is a text file containing the estimated network in SGN3 format. Execute the network estimation at least three times for evaluating whether the algorithm was converged or not with the input data. The Smaller number of samples you use, the larger -T value you require. Saves estimated networks as different files, e.g., result2.sgn3 and result3.sgn3. To do so, execute:

$ qsub -o so -e se -pe mpi-fillup 64 ~tamada/sign/signmpi.sh ~tamada/sign/signbnnnsr.0.16.7 -T 1000000 --oth 0.1 -o result2 GSE49644.txt

$ qsub -o so -e se -pe mpi-fillup 64 ~tamada/sign/signmpi.sh ~tamada/sign/signbnnnsr.0.16.7 -T 1000000 --oth 0.1 -o result3 GSE49644.txt


$ mpiexec -np 64 signbnnnsr.0.16.7 -T 1000000 --oth 0.1 -o result2 GSE49644.txt

$ mpiexec -np 64 signbnnnsr.0.16.7 -T 1000000 --oth 0.1 -o result3 GSE49644.txt

on your system.

To evaluate whether the network estimation was converged, compare structures of every two of the estimated networks by using SiGN-Proc software. To do so, use the --comp filter of SiGN-Proc.

$ signproc -t SGN3 result1.sgn3 --comp file=result2.sgn3,type=sgn3

This compares result1.sgn3 and result2.sgn3, and outputs a confusion matrix for result1.sgn3 by regarding result2 as the true network. It may output, for example, the following comparison result:

COML: TP      RTP     FP      FN      TN        Sn     Sp
COMP: 125121  436     5690    5646    196824735 0.957  0.953

Sn here represents the ratio of whether edges of result2 were included in result1. An expample of criterion to consider that two networks are similar is 0.95, that is, 95% edges are completely identical in thses two networks. Thus we can consider these are almost the same and the algorithm was converged well. Do this comparison also for result1 and result3, and result2 and result3.

^ Go to Top

Step 4: Model parameter generation and ECv calculation

The estimated basal gene network contains only information of edge existence and does not have model parameters of B-splines between parents and children, that are required for ECv calculation. You need to re-fit (calculate) the regression model to estimate model parameters using the estimated structure and the input data. To do this, use SiGN-Proc --score filter, such as:

$ signproc -t result1.sgn3 --score data=GSE49644.txt,edgeprop --output file=result1_param.sgn3,type=sgn3

The output file result1_param.sgn3 has the estimated model parameters.

Next, we calculate ECvs of all the edges for particular samples as below:

$ ecv --net result1_param.sgn3 --name --out result1_ecv.txt GSE49644.txt

The output file result1_ecv.txt is a tab separated text file where each row represents edge and each column corresponds to samples. The first three columns are parent node names, child node names, and parent-child node names concatenated with _ (underscore).

This calculates ECvs and outputs them as the ECv matrix,

^ Go to Top

Step 5: Extracts edges (subnetworks) by the ΔECv method.

Process the ECv matrix file using your favorite scripting language, such as R, or Python, etc. The following is the example using R to extract DREs.

Reading of the ECv matrix.

> X <- read.table("result1_ecv.txt", header=T)

> dim(X)

[1] 154369     21

Extracting DREs whose ΔECvs are greater than 1.0, which corresponds to the 2-fold change, for all three experiments with three cell lines. The triplicate data were averaged before calculating Δ by rowMeans() function.

> E <- X[(abs(rowMeans(X[,4:6]) - rowMeans(X[,16:18])) > 1.0) & (abs(rowMeans(X[,7:9]) - rowMeans(X[,13:15])) > 1.0) & (abs(rowMeans(X[,10:12]) - rowMeans(X[,19:21])) > 1.0),]

> dim(E)

[1] 120  21

This example shows that 120 edges are extracted as DREs.

To save them in a file, perform as follows:

> dre <- rep(1,120)

> EE <- data.frame(E[,1:3],dre)

> write.table(EE, "emt.txt", quote=F, row.names=F, sep="\t")

This example added a column, named as dre, that is consisting of values of 1 to distinguish DREs from other basal edges after combining them in the later step.

To extract unique genes contained in the extracted subnetwork, perform as follows:

> G <- unique(c(as.character(E[,1]),as.character(E[,2])))

> length(G)

[1] 150

If you want to save them, use write.table():

> write.table(G, "genes.txt", quote=F, row.names=F, col.names=F)

^ Go to Top

Step 6: Adding the basal network edges

This is an optional process. In our paper, we presented the network consisting of not only DREs but also basal edges as dashed line. See Fig. 4 of our paper. This is realized by extracting the induced subnetwork using the node (gene) list genes.txt generated in the previous step. An induced network (graph) is a subgraph consisting of edges connecting to given list of nodes. This can be done by SiGN-Proc as follows:

$ signproc result1.sgn3 --subnet file=genes.txt,dist=1, type=induce --output file=dist1.txt, type=txt,args='{H,P,N,name}'

Import dist1.txt in Cytoscape. You can highlight DREs by additionally importing emt.txt (combining this file in the edge list table) and specify the different appearance for edges whose values of the dre property are 1.

^ Go to Top