MI Network and Hypergraph Analysis with miEdgeR

About this vignette

This walkthrough demonstrates how to build mutual-information (MI) networks and hypergraphs from single-cell RNA-seq data using miEdgeR. We extract cluster-specific gene expression, compute pairwise MI, refine networks through percolation and community detection, and finally assemble and visualize a hypergraph of gene modules.

Prerequisites:

R (≥ 4.2.0)

Seurat object with clustering metadata (eg. MG) and optional pseudotime

Packages: miEdgeR, Seurat, igraph, hypergraph, dplyr, ggplot2, ggraph, flextable

Introduction

This vignette walks through the key steps of MI network construction and higher-order hypergraph analysis. Starting with a Seurat object, we show how to filter for highly variable genes, compute and threshold mutual information (MI), remove noise via percolation, detect communities, and represent those communities as hyperedges.

Note: This vignette requires the Zhou et al Control Cortex dataset Microglia (MG) cluster https://doi.org/10.1038/s41591-019-0695-9.

Load Libraries and Data

We load all necessary packages and read in the preprocessed Seurat object. Ensure your object contains a clustering column (MG in seurat demo Zhou et al Control Cortex data) corresponding to cluster IDs.

library(Seurat)
library(miEdgeR)
library(igraph)
library(dplyr)
library(ggplot2)
library(ggraph)
library(flextable)

seurat_obj <- readRDS('Zhou_2020_control.rds')

Data exploration

Exploring the preprocessed Zhou et al Control Cortex data. For this tutorial we will focus on cluster “MG”

DimPlot(seurat_obj, group.by='cell_type', label=TRUE) + NoLegend()

UMAP plot showing clustering of Zhou et al Control Cortex data

features <- c(
  "RBFOX3",      # neurons
  "SLC17A7",     # excitatory
  "GAD1",        # inhibitory
  "GFAP",        # astrocytes
  "MBP",         # oligodendrocytes
  "PDGFRA",      # OPCs
  "CX3CR1"       # microglia
)

VlnPlot(seurat_obj, features = features, group.by = "cell_type", pt.size = 0)

violin plot showing gene expression of selected genes in Zhou et al Control Cortex data

# VlnPlot(seurat_obj, features = features)

Building a Mutual-Information Network

Here we call compute_mi_network(), which under the hood:

  1. Extracts expression for cluster “MG” (up to 1,500 cells; genes > 5% of cells).
  2. Filters to the top 3,000 most variable genes (after removing housekeeping genes).
  3. Discretizes expression (adaptive binning) and computes pairwise MI in parallel.
  4. Constructs an undirected igraph object by thresholding at the 90th percentile of MI values.

You can adjust percentile (e.g. 0.85 for more liberal edges) or supply fixed_threshold for absolute MI cutoffs.

Note: computing mi-network can be computationally demanding. We strongly advice to limit the number of cells and genes to the most variable ones

set.seed(123)

result <- compute_mi_network(
  seurat_obj,
  cluster_id = "MG",
  cluster_field = "cell_type",
  assay_name = "RNA",
  counts_layer = "counts",
  data_layer = "data",
  min_expr_pct = 0.05,
  top_n_genes = 3000,
  n_cores = max(1, parallel::detectCores() - 3)
)

mi_matrix <- result$mi_matrix

graph_percentile <- build_mi_graph(mi_matrix, 
                                   threshold_method = "percentile", 
                                   percentile = 0.90)

cat("Vertices:", vcount(graph_percentile), "\n")
## Vertices: 2924
cat("Edges:", ecount(graph_percentile), "\n")
## Edges: 427547
cat("Density:", graph.density(graph_percentile), "\n")
## Density: 0.1000478

The resulting MI matrix was filtered using the 90th percentile threshold with build_mi_graph(), producing a graph with 2924 genes (nodes), 427,547 edges, and a density of ~0.1. This indicates substantial gene-gene co-dependence within the cluster.

Consensus Edges via Percolation

Real-world networks can be noisy. We apply a simple “percolation” strategy:

This yields a consensus graph that filters out spurious edges.

In miEdgeR v0.2+, percolation + edge canonicalization + weight hygiene are handled inside percolate_graph().

g_consensus <- percolate_graph(
  graph_percentile,
  n_iterations   = 100,
  edge_fraction  = 0.6,
  freq_threshold = 0.5,
  percentile     = 0.9
)

cat("Consensus vertices:", vcount(g_consensus), "\n")
## Consensus vertices: 2924
cat("Consensus edges:", ecount(g_consensus), "\n")
## Consensus edges: 41999
cat("Consensus density:", graph.density(g_consensus), "\n")
## Consensus density: 0.009827946

Community Detection

We use detect_communities() which performs community detection and overlap expansion consistently.

set.seed(123)

comm_out <- detect_communities(
  g_consensus,
  min_size          = 10,
  overlap_threshold = 0.2
)

# v0.2 may return either:
# - a plain list of communities
# - or a richer list with $communities, $bridge, $gene2mod

# v0.2 canonical structure
stopifnot("groups" %in% names(comm_out))

large_communities <- comm_out$groups

cat("Communities (size >= 10):", length(large_communities), "\n")
## Communities (size >= 10): 2
cat("Sizes:", sapply(large_communities, length), "\n")
## Sizes: 23 969

Clustering coefficient

Compute additional network metrics for the consensus graph.

clustering_coeff <- transitivity(g_consensus, type = "global")
cat("Global clustering coefficient:", clustering_coeff, "\n")
## Global clustering coefficient: 0.4478631
degree_centrality      <- degree(g_consensus)
betweenness_centrality <- betweenness(g_consensus)
closeness_centrality   <- closeness(g_consensus)

top_degree      <- names(sort(degree_centrality, decreasing = TRUE))[1:10]
top_betweenness <- names(sort(betweenness_centrality, decreasing = TRUE))[1:10]
top_closeness   <- names(sort(closeness_centrality, decreasing = TRUE))[1:10]

centrality_table <- data.frame(
  Metric = c("Degree", "Betweenness", "Closeness"),
  Top_Genes = c(
    paste(top_degree, collapse = ", "),
    paste(top_betweenness, collapse = ", "),
    paste(top_closeness, collapse = ", ")
  ),
  stringsAsFactors = FALSE
)

ft_centrality <- flextable::flextable(centrality_table) %>%
  flextable::autofit() %>%
  flextable::set_header_labels(Metric = "Centrality Metric", Top_Genes = "Top 10 Genes") %>%
  flextable::bg(part = "header", bg = "gray") %>%
  flextable::theme_vanilla()

ft_centrality

Centrality Metric

Top 10 Genes

Degree

VPS13C, SENP6, ATRX, COP1, BIRC6, USP34, PIAS1, DOCK2, SETX, GNB1

Betweenness

VPS13C, SENP6, FTH1, RHBDF2, GNB1, COP1, FTL, ATRX, ALOX5, SETX

Closeness

SPP1, AC131944.1, HSPA1A, HSPA1B, MTRNR2L8, MTRNR2L1, LINC01320, DHFR, MTRNR2L12, VPS13C

Network Visualization

Visualize a subset of the MI network. Nodes are colored by community, and top hub genes per community are labeled.

plot_mi_network(g_consensus, large_communities, title = "MG MI Network Communities")

network plot of MG communities

Hypergraph Visualization

To visualize overlapping gene communities, we constructed a hypergraph where genes are connected to their assigned communities. Overlapping genes (shared across communities) are highlighted in red, community nodes in green, and unique genes in blue.

plot_hypergraph(
  large_communities,
  comm_indices = c(1, 2),
  title        = "MG Hypergraph with Overlapping Communities",
  key_genes    = NULL
)

Hypergraph visualization of overlapping communities

Community Sizes

Visualize community sizes.

comm_sizes_df <- data.frame(
  Community = paste("Community", seq_along(large_communities)),
  Size      = sapply(large_communities, length)
)

p_comm_sizes <- ggplot(comm_sizes_df, aes(x = Community, y = Size, fill = Community)) +
  geom_bar(stat = "identity") +
  labs(title = "Community Sizes (MG)", x = "Community", y = "Size") +
  theme_minimal() +
  theme(legend.position = "none")

print(p_comm_sizes)

bar plot showing community sizes

Hub Genes Table from Full Graph

Generate a table of top hub genes (by degree) for each community using the full graph.

summarize_hub_genes(graph_percentile, large_communities, cluster_name = "MG")

Cluster

Community

Top Hub Genes

MG

Community 1

FTH1, FTL, TMSB10, PFN1, TMSB4X, ATP5F1E, PTMA, OAZ1, YBX1, HSP90AB1

Community 2

SENP6, USP34, GNB1, RAP1A, VPS13C, COP1, CEP170, DOCK2, SETX, ALOX5

We identified top hub genes within each community by ranking nodes based on degree centrality. Using summarize_hub_genes(). Key genes extracted potentially involved in core regulatory processes.

Hub Genes (Percolated)

Identify top hub genes for communities from percolated graph.

summarize_hub_genes(g_consensus, large_communities, cluster_name = "MG")

Cluster

Community

Top Hub Genes

MG

Community 1

FTH1, FTL, TMSB10, TMSB4X, ATP5F1E, HSP90AA1, PTMA, TPT1, PFN1, NACA

Community 2

VPS13C, SENP6, ATRX, COP1, BIRC6, USP34, PIAS1, DOCK2, SETX, GNB1

GO Enrichment Analysis

Perform enrichment for communities after filtering of housekeeping genes which tend to be over-represented due to their high mi-scores

# Perform enrichment
enrich_results <- enrich_go(large_communities, ontology = "BP")
# Plot the top 4 terms per community
dot_plot <- plot_go_terms(enrich_results, top_n_terms = 5, ontology = "BP")
print(dot_plot)

Dot plot of enriched terms

## Save Results

Save Results

# saveRDS(result, file = "mi_network_results.rds")
sessionInfo()
## R version 4.3.2 (2023-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_SG.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_SG.UTF-8        LC_COLLATE=en_SG.UTF-8    
##  [5] LC_MONETARY=en_SG.UTF-8    LC_MESSAGES=en_SG.UTF-8   
##  [7] LC_PAPER=en_SG.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_SG.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Asia/Singapore
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] flextable_0.9.7        ggraph_2.2.1           miEdgeR_0.2.0         
##  [4] pheatmap_1.0.12        ggVennDiagram_1.5.2    ggpubr_0.6.0          
##  [7] org.Hs.eg.db_3.18.0    AnnotationDbi_1.64.1   IRanges_2.36.0        
## [10] S4Vectors_0.40.2       Biobase_2.62.0         BiocGenerics_0.48.1   
## [13] clusterProfiler_4.10.1 ggplot2_3.5.1          dplyr_1.1.4           
## [16] igraph_2.1.1           SeuratObject_5.0.2     Seurat_4.3.0.1        
## 
## loaded via a namespace (and not attached):
##   [1] fs_1.6.6                matrixStats_1.4.1       spatstat.sparse_3.1-0  
##   [4] bitops_1.0-9            enrichplot_1.22.0       HDO.db_0.99.1          
##   [7] httr_1.4.7              RColorBrewer_1.1-3      tools_4.3.2            
##  [10] sctransform_0.4.1       backports_1.5.0         utf8_1.2.4             
##  [13] R6_2.5.1                lazyeval_0.2.2          uwot_0.2.2             
##  [16] withr_3.0.2             sp_2.1-4                gridExtra_2.3          
##  [19] progressr_0.15.0        cli_3.6.5               textshaping_0.4.0      
##  [22] spatstat.explore_3.3-3  scatterpie_0.2.4        officer_0.6.7          
##  [25] labeling_0.4.3          sass_0.4.9              S7_0.2.0               
##  [28] spatstat.data_3.1-2     ggridges_0.5.6          pbapply_1.7-2          
##  [31] askpass_1.2.1           systemfonts_1.3.1       yulab.utils_0.2.0      
##  [34] gson_0.1.0              DOSE_3.28.2             parallelly_1.39.0      
##  [37] RSQLite_2.3.7           generics_0.1.3          gridGraphics_0.5-1     
##  [40] ica_1.0-3               spatstat.random_3.3-2   zip_2.3.1              
##  [43] car_3.1-3               GO.db_3.18.0            Matrix_1.6-4           
##  [46] ggbeeswarm_0.7.2        fansi_1.0.6             abind_1.4-8            
##  [49] infotheo_1.2.0.1        lifecycle_1.0.4         yaml_2.3.10            
##  [52] carData_3.0-5           qvalue_2.34.0           Rtsne_0.17             
##  [55] grid_4.3.2              blob_1.2.4              promises_1.3.0         
##  [58] crayon_1.5.3            miniUI_0.1.2            lattice_0.21-9         
##  [61] cowplot_1.1.3           KEGGREST_1.42.0         pillar_1.9.0           
##  [64] knitr_1.48              fgsea_1.28.0            future.apply_1.11.3    
##  [67] codetools_0.2-19        fastmatch_1.1-4         leiden_0.4.3.1         
##  [70] glue_1.8.0              ggfun_0.1.7             spatstat.univar_3.1-1  
##  [73] fontLiberation_0.1.0    data.table_1.16.2       vctrs_0.6.5            
##  [76] png_0.1-8               treeio_1.26.0           spam_2.11-0            
##  [79] gtable_0.3.6            cachem_1.1.0            xfun_0.49              
##  [82] mime_0.12               tidygraph_1.3.1         survival_3.5-7         
##  [85] fitdistrplus_1.2-1      ROCR_1.0-11             nlme_3.1-163           
##  [88] ggtree_3.10.1           bit64_4.5.2             fontquiver_0.2.1       
##  [91] RcppAnnoy_0.0.22        GenomeInfoDb_1.38.8     bslib_0.8.0            
##  [94] irlba_2.3.5.1           vipor_0.4.7             KernSmooth_2.23-22     
##  [97] colorspace_2.1-1        DBI_1.2.3               ggrastr_1.0.2          
## [100] tidyselect_1.2.1        bit_4.5.0               compiler_4.3.2         
## [103] xml2_1.3.6              fontBitstreamVera_0.1.1 plotly_4.10.4          
## [106] shadowtext_0.1.4        scales_1.3.0            lmtest_0.9-40          
## [109] stringr_1.5.1           digest_0.6.37           goftest_1.2-3          
## [112] spatstat.utils_3.1-1    rmarkdown_2.29          XVector_0.42.0         
## [115] htmltools_0.5.8.1       pkgconfig_2.0.3         highr_0.11             
## [118] fastmap_1.2.0           rlang_1.1.6             htmlwidgets_1.6.4      
## [121] shiny_1.9.1             farver_2.1.2            jquerylib_0.1.4        
## [124] zoo_1.8-12              jsonlite_1.8.9          BiocParallel_1.36.0    
## [127] GOSemSim_2.28.1         RCurl_1.98-1.16         magrittr_2.0.3         
## [130] Formula_1.2-5           GenomeInfoDbData_1.2.11 ggplotify_0.1.2        
## [133] dotCall64_1.2           patchwork_1.3.0         munsell_0.5.1          
## [136] Rcpp_1.0.13-1           ape_5.8                 viridis_0.6.5          
## [139] gdtools_0.4.1           reticulate_1.40.0       stringi_1.8.4          
## [142] zlibbioc_1.48.2         MASS_7.3-60             plyr_1.8.9             
## [145] parallel_4.3.2          listenv_0.9.1           ggrepel_0.9.6          
## [148] deldir_2.0-4            Biostrings_2.70.3       graphlayouts_1.2.0     
## [151] splines_4.3.2           tensor_1.5              uuid_1.2-1             
## [154] spatstat.geom_3.3-3     ggsignif_0.6.4          reshape2_1.4.4         
## [157] evaluate_1.0.1          tweenr_2.0.3            httpuv_1.6.15          
## [160] RANN_2.6.2              tidyr_1.3.1             openssl_2.2.2          
## [163] purrr_1.0.2             polyclip_1.10-7         future_1.34.0          
## [166] scattermore_1.2         ggforce_0.4.2           broom_1.0.7            
## [169] xtable_1.8-4            tidytree_0.4.6          rstatix_0.7.2          
## [172] later_1.3.2             viridisLite_0.4.2       ragg_1.3.3             
## [175] tibble_3.2.1            aplot_0.2.3             beeswarm_0.4.0         
## [178] memoise_2.0.1           cluster_2.1.4           globals_0.16.3