The is the project of GSoC 2017.

You can find more detail at homepage and source code.

Background

Graphical models provide a powerful and flexible framework for understanding complex multivariate data. These models, sometimes also referred to as network models, capture dependencies in multivariate data, allowing statisticians to discover underlying connections among measured variables. These models have been widely used in applied statistics and machine learning, with particular success in genetics, neuroscience, and finance.

In big-data domain, the observed variables usually consist of multiple types, including binary, count-valued, continuous, categorical, bounded, etc. But classical graphical models, typically assume that the data are generated from a multivariate Gaussian and that each variable is marginally Gaussian. While mathematically tractable, this assumption is plainly inappropriate for mixed multi-modal data, so they cannot apply to multi modal data directly.

Recently, mentors proposed a block randomized adaptive iterative lasso (“BRAIL”) procedure to fit the mixed graphical models. In this project, we propose a new package to make graphical models for mixed multi-modal data readily available to a wide audience. The proposed package will allow for fitting, simulating from, and visualizing mixed graphical models.

Implementations

Five main works of the summer

  • ADMM framework for l1-penalized Gaussian, Logistic and Poisson regression with warm start and early stopping based on support convergence in C++
  • Newton with l2-penalized Gaussian, Logistic and Poisson regression in C++
  • the BRAIL algorithm with foreach parallelization in R
  • the MixedGraph fitting routine with foreach parallelization in R
  • the plotting of MixedGraph object using igraph, Cytoscape and Cytoscape.js

Note: In package, we use Rcpp and RcppArmadillo to integrate R and C++ code. For the algorithm detail, you can get from here.

Installation

Install from github

library(devtools)
install_github("Xia-Zhang/MixedGraphs")

Usage

  • glmLasso

    library("MixedGraphs")
    X <- matrix(rnorm(30 * 200), 30, 200)
    y <- rbinom(30, 1, 0.5)
    glmLasso(X, y, lambda = 0.5, family = "binomial", support_stability = 10) 
  • glmRidge

    glmRidge(X, y, lambda = 0.5, family = "binomial", thresh = 0.005)
  • BRAIL

    X <- lapply(1:2, function(x) {matrix(rnorm(10 * 200), 10, 200)})
    y <- rnorm(10)
    BRAIL(X, y, family = "gaussian", tau = 0.8, B = 20, doPar = TRUE)
  • MixedGraph

    X <- lapply(1 : 3, function(x){matrix(rnorm(12), nrow = 4)})
    crf_structure = matrix(c(1, 0, 1, 1, 1, 1, 0, 0, 1), 3, 3)
    brail_control <- list(B = 5, tau = 0.6)
    G <- MixedGraph(X, crf_structure, brail_control = brail_control)
  • plot.MixedGraph

    plot(G, method = "igraph",  weighted = TRUE)
    plot(G, method = "cytoscape", layout = "")
    plot(G, method = "cytoscape.js", "attributes-layout")

    Student

    Xia Zhang
    Department of Computer Science and Technology, Peking University

Mentors

Genevera Allen Departments of Statistics and ECE, Rice University
Jan and Dan Duncan Neurological Research Institute, Baylor College of Medicine and Texas Children’s Hospital

Michael Weylandt
Department of Statistics, Rice University

License

GPL (>= 2)