SiGN: Large-Scale Gene Network Estimation Software

Welcome to SiGN WWW site.

SiGN is a collection of large-scale gene network estimation software consisting of three different gene network models: state space models, nonparametric Bayesian networks, and L1 regularization. A gene network is the graphical representation of regulatory relationships between genes. You can estimate, infer, or model these gene-to-gene relationships from observed gene expression profiles using various mathematical models and computational methods. All the three models in SiGN require huge amount of computational resources for estimating large-scale gene networks from the observed data. Therefore, SiGN is designed to be able to exploit the speed of 10 petaflops, which was achieved by the Japanese flagship supercomputer "K computer" in 2011. Currently, we are working on supporting "Fugaku computer", being developed by RIKEN R-CCS. The software with all the models was available freely for "Fugaku computer" and HGC supercomputer system users. Some models are distributed as open source software. The estimated networks can be viewed and analyzed by Cell Illustrator Online.

List of Software

SiGN-SSM

SiGN-SSM is open source gene network estimation software able to run in parallel on PCs and massively parallel supercomputers. The software estimates a state space model (SSM), that is a statistical dynamic model suitable for analyzing short time and/or replicated time series gene expression profiles. SiGN-SSM implements a novel parameter constraint effective to stabilize the estimated models. Also, by using the supercomputers, it is able to determine the gene network structure by the statistical permutation test in a practical time. SiGN-SSM is applicable not only to analyzing temporal regulatory dependencies between genes, but also to the extraction of the differentially regulated genes from time series expression profiles.

^ Go to Top

SiGN-BN

SiGN-BN implements several algorithms for estimating gene networks using Bayesian network models. It uses B-spline nonparametric regression to model parent-child relationships. This is suitable to model non-linear relationships of gene-gene regulation. Generally, because a Bayesian network requires huge computational time to learn its structures fitted to given gene expression data, it is not widely used for large-scale gene regulatory network analyses. Our research group develops several algorithms to overcome this problem using supercomputers. Currently, three algorithms are available depending on the network size: (a) greedy hill-climbing algorithm + bootstrap method applicable to up to 1000 genes, and (b) neighbor node sampling & repeat algorithm applicable to genome-wide gene networks. In addition to these algorithms, (c) we recently developed a novel algorithm for estimating the global optimal network structures up to 37 genes for discrete model. Learning of the optimal Bayesian network structure is an NP-hard problem and therefore so far the estimation of networks with only 29 nodes was realized. We developed a parallel algorithm of the dynamic programming based algorithm and, by using this, we succeeded in estimating 37 node global optimal network in 2 hours and 17 minutes with 165,888 CPU cores of K computer.

^ Go to Top

SiGN-L1

SiGN-L1 is network estimation software using sparse learning. It uses L1-regularization for simultaneous parameter estimation and model selection of statistical graphical models such as graphical Gaussian models and vector autoregressive models. Currently, there are three algorithms to estimate sparse network structures using L1-regularization: (a) weighted lasso, (b) recursive elastic net, and (c) relevance-weighted recursive elastic net. Whereas the first two algorithms aim at inferring large-scale networks up to 100,000 molecules, the third algorithm is intended for gene network estimation and comparison under various biological conditions and applicable to middle-size networks up to 1,000 genes.

SiGN-L1 is currently available for collaborators with our laboratory.

^ Go to Top

ACKNOWLEDGEMENTS

SiGN is initially developed in the ISLiM (Next-generation integrated simulation of living matter) project in RIKEN Computational Science Research Program, and then supported by The Strategic Programs for Innovative Research (SCLS) field 1. Currently, we are supported by The Priority Issue on Post-K computer (Integrated Computational Live Science to Support Personalized and Preventive Medicine under FLAGSHIP 2020 Project, and Conquering Cancer through Neo-dimentional Systems Understanding, of MEXT Grant-in-Aid for Scientific Research on Innovative Areas. The computational resources required for the development of SiGN are being provided by the HGC Supercomputer System, Human Genome Center, Institute of Medical Science, The University of Tokyo; and RIKEN Supercomputer system RICC. SiGN was also supported by the Systems Cancer project of the Grants-in-Aid for Scientific Research on Innovative Areas, MEXT, Japan.

^ Go to Top

Copyright © 2010 - 2021

Contact: Yoshinori Tamada <y DOT tamada ATMARK hirosaki-u.ac.jp>

Current Affiliation: Department of Medical Data Intelligence, Innovation Center for Health Promotion,
Graduate School of Medicine, Hirosaki University