MI Network and Hypergraph Analysis with miEdgeR

Data exploration

Exploring the preprocessed Zhou et al Control Cortex data. For this tutorial we will focus on cluster “MG”

DimPlot(seurat_obj, group.by='cell_type', label=TRUE) + NoLegend()

UMAP plot showing clustering of Zhou et al Control Cortex data

features <- c(
  "RBFOX3",      # neurons
  "SLC17A7",     # excitatory
  "GAD1",        # inhibitory
  "GFAP",        # astrocytes
  "MBP",         # oligodendrocytes
  "PDGFRA",      # OPCs
  "CX3CR1"       # microglia
)

VlnPlot(seurat_obj, features = features, group.by = "cell_type", pt.size = 0)

violin plot showing gene expression of selected genes in Zhou et al Control Cortex data

# VlnPlot(seurat_obj, features = features)

Building a Mutual-Information Network

Here we call compute_mi_network(), which under the hood:

Extracts expression for cluster “MG” (up to 1,500 cells; genes > 5% of cells).
Filters to the top 3,000 most variable genes (after removing housekeeping genes).
Discretizes expression (adaptive binning) and computes pairwise MI in parallel.
Constructs an undirected igraph object by thresholding at the 90th percentile of MI values.

You can adjust percentile (e.g. 0.85 for more liberal edges) or supply fixed_threshold for absolute MI cutoffs.

Note: computing mi-network can be computationally demanding. We strongly advice to limit the number of cells and genes to the most variable ones

set.seed(123)

result <- compute_mi_network(
  seurat_obj,
  cluster_id = "MG",
  cluster_field = "cell_type",
  assay_name = "RNA",
  counts_layer = "counts",
  data_layer = "data",
  min_expr_pct = 0.05,
  top_n_genes = 3000,
  n_cores = max(1, parallel::detectCores() - 3)
)

mi_matrix <- result$mi_matrix

graph_percentile <- build_mi_graph(mi_matrix, 
                                   threshold_method = "percentile", 
                                   percentile = 0.90)

cat("Vertices:", vcount(graph_percentile), "\n")

## Vertices: 2924

cat("Edges:", ecount(graph_percentile), "\n")

## Edges: 427547

cat("Density:", graph.density(graph_percentile), "\n")

## Density: 0.1000478

The resulting MI matrix was filtered using the 90th percentile threshold with build_mi_graph(), producing a graph with 2924 genes (nodes), 427,547 edges, and a density of ~0.1. This indicates substantial gene-gene co-dependence within the cluster.

Consensus Edges via Percolation

Real-world networks can be noisy. We apply a simple “percolation” strategy:

Pre-filter: Keep only strong edges (by percentile).
Resample 100 times, each time retaining 60% of edges at random.
Consensus: Edges appearing in ≥ 50% of iterations are considered stable.

This yields a consensus graph that filters out spurious edges.

In miEdgeR v0.2+, percolation + edge canonicalization + weight hygiene are handled inside percolate_graph().

g_consensus <- percolate_graph(
  graph_percentile,
  n_iterations   = 100,
  edge_fraction  = 0.6,
  freq_threshold = 0.5,
  percentile     = 0.9
)

cat("Consensus vertices:", vcount(g_consensus), "\n")

## Consensus vertices: 2924

cat("Consensus edges:", ecount(g_consensus), "\n")

## Consensus edges: 41999

cat("Consensus density:", graph.density(g_consensus), "\n")

## Consensus density: 0.009827946

Community Detection

We use detect_communities() which performs community detection and overlap expansion consistently.

set.seed(123)

comm_out <- detect_communities(
  g_consensus,
  min_size          = 10,
  overlap_threshold = 0.2
)

# v0.2 may return either:
# - a plain list of communities
# - or a richer list with $communities, $bridge, $gene2mod

# v0.2 canonical structure
stopifnot("groups" %in% names(comm_out))

large_communities <- comm_out$groups

cat("Communities (size >= 10):", length(large_communities), "\n")

## Communities (size >= 10): 2

cat("Sizes:", sapply(large_communities, length), "\n")

## Sizes: 23 969

Clustering coefficient

Compute additional network metrics for the consensus graph.

clustering_coeff <- transitivity(g_consensus, type = "global")
cat("Global clustering coefficient:", clustering_coeff, "\n")

## Global clustering coefficient: 0.4478631

degree_centrality      <- degree(g_consensus)
betweenness_centrality <- betweenness(g_consensus)
closeness_centrality   <- closeness(g_consensus)

top_degree      <- names(sort(degree_centrality, decreasing = TRUE))[1:10]
top_betweenness <- names(sort(betweenness_centrality, decreasing = TRUE))[1:10]
top_closeness   <- names(sort(closeness_centrality, decreasing = TRUE))[1:10]

centrality_table <- data.frame(
  Metric = c("Degree", "Betweenness", "Closeness"),
  Top_Genes = c(
    paste(top_degree, collapse = ", "),
    paste(top_betweenness, collapse = ", "),
    paste(top_closeness, collapse = ", ")
  ),
  stringsAsFactors = FALSE
)

ft_centrality <- flextable::flextable(centrality_table) %>%
  flextable::autofit() %>%
  flextable::set_header_labels(Metric = "Centrality Metric", Top_Genes = "Top 10 Genes") %>%
  flextable::bg(part = "header", bg = "gray") %>%
  flextable::theme_vanilla()

ft_centrality

Centrality Metric	Top 10 Genes
Degree	VPS13C, SENP6, ATRX, COP1, BIRC6, USP34, PIAS1, DOCK2, SETX, GNB1
Betweenness	VPS13C, SENP6, FTH1, RHBDF2, GNB1, COP1, FTL, ATRX, ALOX5, SETX
Closeness	SPP1, AC131944.1, HSPA1A, HSPA1B, MTRNR2L8, MTRNR2L1, LINC01320, DHFR, MTRNR2L12, VPS13C

Network Visualization

Visualize a subset of the MI network. Nodes are colored by community, and top hub genes per community are labeled.

plot_mi_network(g_consensus, large_communities, title = "MG MI Network Communities")

network plot of MG communities

Hypergraph Visualization

To visualize overlapping gene communities, we constructed a hypergraph where genes are connected to their assigned communities. Overlapping genes (shared across communities) are highlighted in red, community nodes in green, and unique genes in blue.

plot_hypergraph(
  large_communities,
  comm_indices = c(1, 2),
  title        = "MG Hypergraph with Overlapping Communities",
  key_genes    = NULL
)

Hypergraph visualization of overlapping communities

Community Sizes

Visualize community sizes.

comm_sizes_df <- data.frame(
  Community = paste("Community", seq_along(large_communities)),
  Size      = sapply(large_communities, length)
)

p_comm_sizes <- ggplot(comm_sizes_df, aes(x = Community, y = Size, fill = Community)) +
  geom_bar(stat = "identity") +
  labs(title = "Community Sizes (MG)", x = "Community", y = "Size") +
  theme_minimal() +
  theme(legend.position = "none")

print(p_comm_sizes)

bar plot showing community sizes

Hub Genes Table from Full Graph

Generate a table of top hub genes (by degree) for each community using the full graph.

summarize_hub_genes(graph_percentile, large_communities, cluster_name = "MG")

Cluster	Community	Top Hub Genes
MG	Community 1	FTH1, FTL, TMSB10, PFN1, TMSB4X, ATP5F1E, PTMA, OAZ1, YBX1, HSP90AB1
MG	Community 2	SENP6, USP34, GNB1, RAP1A, VPS13C, COP1, CEP170, DOCK2, SETX, ALOX5

We identified top hub genes within each community by ranking nodes based on degree centrality. Using summarize_hub_genes(). Key genes extracted potentially involved in core regulatory processes.

Hub Genes (Percolated)

Identify top hub genes for communities from percolated graph.

summarize_hub_genes(g_consensus, large_communities, cluster_name = "MG")

Cluster	Community	Top Hub Genes
MG	Community 1	FTH1, FTL, TMSB10, TMSB4X, ATP5F1E, HSP90AA1, PTMA, TPT1, PFN1, NACA
MG	Community 2	VPS13C, SENP6, ATRX, COP1, BIRC6, USP34, PIAS1, DOCK2, SETX, GNB1

GO Enrichment Analysis

Perform enrichment for communities after filtering of housekeeping genes which tend to be over-represented due to their high mi-scores

# Perform enrichment
enrich_results <- enrich_go(large_communities, ontology = "BP")

# Plot the top 4 terms per community
dot_plot <- plot_go_terms(enrich_results, top_n_terms = 5, ontology = "BP")
print(dot_plot)

Dot plot of enriched terms

## Save Results

Save Results

# saveRDS(result, file = "mi_network_results.rds")

sessionInfo()

## R version 4.3.2 (2023-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_SG.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_SG.UTF-8        LC_COLLATE=en_SG.UTF-8    
##  [5] LC_MONETARY=en_SG.UTF-8    LC_MESSAGES=en_SG.UTF-8   
##  [7] LC_PAPER=en_SG.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_SG.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Asia/Singapore
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] flextable_0.9.7        ggraph_2.2.1           miEdgeR_0.2.0         
##  [4] pheatmap_1.0.12        ggVennDiagram_1.5.2    ggpubr_0.6.0          
##  [7] org.Hs.eg.db_3.18.0    AnnotationDbi_1.64.1   IRanges_2.36.0        
## [10] S4Vectors_0.40.2       Biobase_2.62.0         BiocGenerics_0.48.1   
## [13] clusterProfiler_4.10.1 ggplot2_3.5.1          dplyr_1.1.4           
## [16] igraph_2.1.1           SeuratObject_5.0.2     Seurat_4.3.0.1        
## 
## loaded via a namespace (and not attached):
##   [1] fs_1.6.6                matrixStats_1.4.1       spatstat.sparse_3.1-0  
##   [4] bitops_1.0-9            enrichplot_1.22.0       HDO.db_0.99.1          
##   [7] httr_1.4.7              RColorBrewer_1.1-3      tools_4.3.2            
##  [10] sctransform_0.4.1       backports_1.5.0         utf8_1.2.4             
##  [13] R6_2.5.1                lazyeval_0.2.2          uwot_0.2.2             
##  [16] withr_3.0.2             sp_2.1-4                gridExtra_2.3          
##  [19] progressr_0.15.0        cli_3.6.5               textshaping_0.4.0      
##  [22] spatstat.explore_3.3-3  scatterpie_0.2.4        officer_0.6.7          
##  [25] labeling_0.4.3          sass_0.4.9              S7_0.2.0               
##  [28] spatstat.data_3.1-2     ggridges_0.5.6          pbapply_1.7-2          
##  [31] askpass_1.2.1           systemfonts_1.3.1       yulab.utils_0.2.0      
##  [34] gson_0.1.0              DOSE_3.28.2             parallelly_1.39.0      
##  [37] RSQLite_2.3.7           generics_0.1.3          gridGraphics_0.5-1     
##  [40] ica_1.0-3               spatstat.random_3.3-2   zip_2.3.1              
##  [43] car_3.1-3               GO.db_3.18.0            Matrix_1.6-4           
##  [46] ggbeeswarm_0.7.2        fansi_1.0.6             abind_1.4-8            
##  [49] infotheo_1.2.0.1        lifecycle_1.0.4         yaml_2.3.10            
##  [52] carData_3.0-5           qvalue_2.34.0           Rtsne_0.17             
##  [55] grid_4.3.2              blob_1.2.4              promises_1.3.0         
##  [58] crayon_1.5.3            miniUI_0.1.2            lattice_0.21-9         
##  [61] cowplot_1.1.3           KEGGREST_1.42.0         pillar_1.9.0           
##  [64] knitr_1.48              fgsea_1.28.0            future.apply_1.11.3    
##  [67] codetools_0.2-19        fastmatch_1.1-4         leiden_0.4.3.1         
##  [70] glue_1.8.0              ggfun_0.1.7             spatstat.univar_3.1-1  
##  [73] fontLiberation_0.1.0    data.table_1.16.2       vctrs_0.6.5            
##  [76] png_0.1-8               treeio_1.26.0           spam_2.11-0            
##  [79] gtable_0.3.6            cachem_1.1.0            xfun_0.49              
##  [82] mime_0.12               tidygraph_1.3.1         survival_3.5-7         
##  [85] fitdistrplus_1.2-1      ROCR_1.0-11             nlme_3.1-163           
##  [88] ggtree_3.10.1           bit64_4.5.2             fontquiver_0.2.1       
##  [91] RcppAnnoy_0.0.22        GenomeInfoDb_1.38.8     bslib_0.8.0            
##  [94] irlba_2.3.5.1           vipor_0.4.7             KernSmooth_2.23-22     
##  [97] colorspace_2.1-1        DBI_1.2.3               ggrastr_1.0.2          
## [100] tidyselect_1.2.1        bit_4.5.0               compiler_4.3.2         
## [103] xml2_1.3.6              fontBitstreamVera_0.1.1 plotly_4.10.4          
## [106] shadowtext_0.1.4        scales_1.3.0            lmtest_0.9-40          
## [109] stringr_1.5.1           digest_0.6.37           goftest_1.2-3          
## [112] spatstat.utils_3.1-1    rmarkdown_2.29          XVector_0.42.0         
## [115] htmltools_0.5.8.1       pkgconfig_2.0.3         highr_0.11             
## [118] fastmap_1.2.0           rlang_1.1.6             htmlwidgets_1.6.4      
## [121] shiny_1.9.1             farver_2.1.2            jquerylib_0.1.4        
## [124] zoo_1.8-12              jsonlite_1.8.9          BiocParallel_1.36.0    
## [127] GOSemSim_2.28.1         RCurl_1.98-1.16         magrittr_2.0.3         
## [130] Formula_1.2-5           GenomeInfoDbData_1.2.11 ggplotify_0.1.2        
## [133] dotCall64_1.2           patchwork_1.3.0         munsell_0.5.1          
## [136] Rcpp_1.0.13-1           ape_5.8                 viridis_0.6.5          
## [139] gdtools_0.4.1           reticulate_1.40.0       stringi_1.8.4          
## [142] zlibbioc_1.48.2         MASS_7.3-60             plyr_1.8.9             
## [145] parallel_4.3.2          listenv_0.9.1           ggrepel_0.9.6          
## [148] deldir_2.0-4            Biostrings_2.70.3       graphlayouts_1.2.0     
## [151] splines_4.3.2           tensor_1.5              uuid_1.2-1             
## [154] spatstat.geom_3.3-3     ggsignif_0.6.4          reshape2_1.4.4         
## [157] evaluate_1.0.1          tweenr_2.0.3            httpuv_1.6.15          
## [160] RANN_2.6.2              tidyr_1.3.1             openssl_2.2.2          
## [163] purrr_1.0.2             polyclip_1.10-7         future_1.34.0          
## [166] scattermore_1.2         ggforce_0.4.2           broom_1.0.7            
## [169] xtable_1.8-4            tidytree_0.4.6          rstatix_0.7.2          
## [172] later_1.3.2             viridisLite_0.4.2       ragg_1.3.3             
## [175] tibble_3.2.1            aplot_0.2.3             beeswarm_0.4.0         
## [178] memoise_2.0.1           cluster_2.1.4           globals_0.16.3