Welcome to SiGN WWW site.

SiGN is a collection of large-scale
gene network estimation software consisting of three different
gene network models: state space models,
nonparametric Bayesian networks, and L1 regularization.
A **gene network** is the graphical representation of regulatory relationships
between genes. You can estimate, infer, or model these gene-to-gene
relationships from observed gene expression profiles using various mathematical
models and computational methods.
All the three models in SiGN require huge amount of computational resources
for estimating large-scale gene networks from the observed data.
Therefore, SiGN is designed to be able to exploit the speed of 10 petaflops,
which is planned to be achieved by the Japanese flagship
supercomputer "K computer" in 2012.
The software with all the models will be
available freely for "K computer"
and HGC supercomputer system users.
Some models are distributed as open source software.
The estimated networks can be viewed and analyzed by
Cell Illustrator Online.

SiGN-SSM is open source
gene network estimation software able to run
**in parallel** on PCs and massively parallel supercomputers. The
software estimates a **state space model** (SSM), that is a statistical
dynamic model suitable for analyzing short time and/or replicated time
series gene expression profiles. SiGN-SSM implements a novel
parameter constraint effective to stabilize the estimated models.
Also, by using the supercomputers, it is able to determine the gene
network structure by the statistical permutation test in a practical
time. SiGN-SSM is applicable not only to analyzing temporal
regulatory dependencies between genes, but also to the extraction of
the differentially regulated genes from time series expression
profiles.

SiGN-BN implements several algorithms for estimating gene networks using Bayesian network models. It uses B-spline nonparametric regression to model parent-child relationships. This is suitable to model non-linear relationships of gene-gene regulation. Generally, because a Bayesian network requires huge computational time to learn its structures fitted to given gene expression data, it is not widely used for large-scale gene regulatory network analyses. Our research group develops several algorithms to overcome this problem using supercomputers. Currently, three algorithms are available depending on the network size: (a) greedy hill-climbing algorithm + bootstrap method applicable to up to 1000 genes, and (b) neighbor node sampling & repeat algorithm applicable to genome-wide gene networks. In addition to these algorithms, (c) we recently developed a novel algorithm for estimating the global optimal network structures up to 32 genes. Learning of the optimal Bayesian network structure is an NP-hard problem and therefore so far the estimation of networks with only 29 nodes was realized. We developed a parallel algorithm of the dynamic programming based algorithm and, by using this, we succeeded in estimating 32 node global optimal network in approx. 1 week with 256 CPU cores of HGC supercomputer system.

SiGN-BN is currently available for HGC supercomputer users.

SiGN-L1 is network estimation software using sparse learning. It uses L1-regularization for simultaneous parameter estimation and model selection of statistical graphical models such as graphical Gaussian models and vector autoregressive models. Currently, there are three algorithms to estimate sparse network structures using L1-regularization: (a) weighted lasso, (b) recursive elastic net, and (c) relevance-weighted recursive elastic net. Whereas the first two algorithms aim at inferring large-scale networks up to 100,000 molecules, the third algorithm is intended for gene network estimation and comparison under various biological conditions and applicable to middle-size networks up to 1,000 genes.

SiGN-L1 is currently available for collaborators with our laboratory.

SiGN is developed in the ISLiM (Next-generation integrated simulation of living matter) project in RIKEN Computational Science Research Program. The computational resources required for the development of SiGN are being provided by the HGC Supercomputer System, Human Genome Center, Institute of Medical Science, The University of Tokyo; and RIKEN Supercomputer system RICC. SiGN is also supported by the Systems Cancer project of the Grants-in-Aid for Scientific Research on Innovative Areas, MEXT, Japan.

Copyright © 2010 - 2012

Laboratory of DNA Information Analysis & Laboratory of DNA Sequence Analysis

Human Genome Center,
Institute of Medical Science,
The University of Tokyo

Data Analysis Fusion Team

RIKEN Computational Science Research Program