SiGN: Large-Scale Gene Network Estimation Software

Welcome to SiGN WWW site.

SiGN is a collection of large-scale gene network estimation software consisting of three different gene network models: state space models, nonparametric Bayesian networks, and L1 regularization. A gene network is the graphical representation of regulatory relationships between genes. You can estimate, infer, or model these gene-to-gene relationships from observed gene expression profiles using various mathematical models and computational methods. All the three models in SiGN require huge amount of computational resources for estimating large-scale gene networks from the observed data. Therefore, SiGN is designed to be able to exploit the speed of 10 petaflops, which is planned to be achieved by the Japanese flagship supercomputer "K computer" in 2012. The software with all the models will be available freely for "K computer" and HGC supercomputer system users. Some models are distributed as open source software. The estimated networks can be viewed and analyzed by Cell Illustrator Online.

List of Software

SiGN-SSM

SiGN-SSM is open source gene network estimation software able to run in parallel on PCs and massively parallel supercomputers. The software estimates a state space model (SSM), that is a statistical dynamic model suitable for analyzing short time and/or replicated time series gene expression profiles. SiGN-SSM implements a novel parameter constraint effective to stabilize the estimated models. Also, by using the supercomputers, it is able to determine the gene network structure by the statistical permutation test in a practical time. SiGN-SSM is applicable not only to analyzing temporal regulatory dependencies between genes, but also to the extraction of the differentially regulated genes from time series expression profiles.

^ Go to Top

SiGN-BN

SiGN-BN implements several algorithms for estimating gene networks using Bayesian network models. It uses B-spline nonparametric regression to model parent-child relationships. This is suitable to model non-linear relationships of gene-gene regulation. Generally, because a Bayesian network requires huge computational time to learn its structures fitted to given gene expression data, it is not widely used for large-scale gene regulatory network analyses. Our research group develops several algorithms to overcome this problem using supercomputers. Currently, three algorithms are available depending on the network size: (a) greedy hill-climbing algorithm + bootstrap method applicable to up to 1000 genes, and (b) neighbor node sampling & repeat algorithm applicable to genome-wide gene networks. In addition to these algorithms, (c) we recently developed a novel algorithm for estimating the global optimal network structures up to 32 genes. Learning of the optimal Bayesian network structure is an NP-hard problem and therefore so far the estimation of networks with only 29 nodes was realized. We developed a parallel algorithm of the dynamic programming based algorithm and, by using this, we succeeded in estimating 32 node global optimal network in approx. 1 week with 256 CPU cores of HGC supercomputer system.

SiGN-BN is currently available for HGC supercomputer users.

^ Go to Top

SiGN-L1

SiGN-L1 is network estimation software using sparse learning. It uses L1-regularization for simultaneous parameter estimation and model selection of statistical graphical models such as graphical Gaussian models and vector autoregressive models. Currently, there are three algorithms to estimate sparse network structures using L1-regularization: (a) weighted lasso, (b) recursive elastic net, and (c) relevance-weighted recursive elastic net. Whereas the first two algorithms aim at inferring large-scale networks up to 100,000 molecules, the third algorithm is intended for gene network estimation and comparison under various biological conditions and applicable to middle-size networks up to 1,000 genes.

SiGN-L1 is currently available for collaborators with our laboratory.

^ Go to Top

ACKNOWLEDGEMENTS

SiGN is developed in the ISLiM (Next-generation integrated simulation of living matter) project in RIKEN Computational Science Research Program. The computational resources required for the development of SiGN are being provided by the HGC Supercomputer System, Human Genome Center, Institute of Medical Science, The University of Tokyo; and RIKEN Supercomputer system RICC. SiGN is also supported by the Systems Cancer project of the Grants-in-Aid for Scientific Research on Innovative Areas, MEXT, Japan.

^ Go to Top

Copyright © 2010 - 2012
Laboratory of DNA Information Analysis & Laboratory of DNA Sequence Analysis
Human Genome Center, Institute of Medical Science, The University of Tokyo

Data Analysis Fusion Team
RIKEN Computational Science Research Program

Contact: Yoshinori Tamada <tamada ATMARK ims.u-tokyo.ac.jp>