This walkthrough demonstrates how to build mutual-information (MI) networks and hypergraphs from single-cell RNA-seq data using miEdgeR. We extract cluster-specific gene expression, compute pairwise MI, refine networks through percolation and community detection, and finally assemble and visualize a hypergraph of gene modules.
Prerequisites:
R (≥ 4.2.0)
Seurat object with clustering metadata (eg. MG) and optional pseudotime
Packages: miEdgeR, Seurat, igraph, hypergraph, dplyr, ggplot2, ggraph, flextable
This vignette walks through the key steps of MI network construction and higher-order hypergraph analysis. Starting with a Seurat object, we show how to filter for highly variable genes, compute and threshold mutual information (MI), remove noise via percolation, detect communities, and represent those communities as hyperedges.
Note: This vignette requires the Zhou et al Control Cortex dataset Microglia (MG) cluster https://doi.org/10.1038/s41591-019-0695-9.
We load all necessary packages and read in the preprocessed Seurat object. Ensure your object contains a clustering column (MG in seurat demo Zhou et al Control Cortex data) corresponding to cluster IDs.
library(Seurat)
library(miEdgeR)
library(igraph)
library(dplyr)
library(ggplot2)
library(ggraph)
library(flextable)
seurat_obj <- readRDS('Zhou_2020_control.rds')Exploring the preprocessed Zhou et al Control Cortex data. For this tutorial we will focus on cluster “MG”
DimPlot(seurat_obj, group.by='cell_type', label=TRUE) + NoLegend()features <- c(
"RBFOX3", # neurons
"SLC17A7", # excitatory
"GAD1", # inhibitory
"GFAP", # astrocytes
"MBP", # oligodendrocytes
"PDGFRA", # OPCs
"CX3CR1" # microglia
)
VlnPlot(seurat_obj, features = features, group.by = "cell_type", pt.size = 0)# VlnPlot(seurat_obj, features = features)Here we call compute_mi_network(), which under the hood:
You can adjust percentile (e.g. 0.85 for more liberal edges) or supply fixed_threshold for absolute MI cutoffs.
Note: computing mi-network can be computationally demanding. We strongly advice to limit the number of cells and genes to the most variable ones
set.seed(123)
result <- compute_mi_network(
seurat_obj,
cluster_id = "MG",
cluster_field = "cell_type",
assay_name = "RNA",
counts_layer = "counts",
data_layer = "data",
min_expr_pct = 0.05,
top_n_genes = 3000,
n_cores = max(1, parallel::detectCores() - 3)
)
mi_matrix <- result$mi_matrix
graph_percentile <- build_mi_graph(mi_matrix,
threshold_method = "percentile",
percentile = 0.90)
cat("Vertices:", vcount(graph_percentile), "\n")## Vertices: 2924
cat("Edges:", ecount(graph_percentile), "\n")## Edges: 427547
cat("Density:", graph.density(graph_percentile), "\n")## Density: 0.1000478
The resulting MI matrix was filtered using the 90th percentile threshold with build_mi_graph(), producing a graph with 2924 genes (nodes), 427,547 edges, and a density of ~0.1. This indicates substantial gene-gene co-dependence within the cluster.
Real-world networks can be noisy. We apply a simple “percolation” strategy:
This yields a consensus graph that filters out spurious edges.
In miEdgeR v0.2+, percolation + edge canonicalization + weight hygiene are handled inside percolate_graph().
g_consensus <- percolate_graph(
graph_percentile,
n_iterations = 100,
edge_fraction = 0.6,
freq_threshold = 0.5,
percentile = 0.9
)
cat("Consensus vertices:", vcount(g_consensus), "\n")## Consensus vertices: 2924
cat("Consensus edges:", ecount(g_consensus), "\n")## Consensus edges: 41999
cat("Consensus density:", graph.density(g_consensus), "\n")## Consensus density: 0.009827946
We use detect_communities() which performs community detection and overlap expansion consistently.
set.seed(123)
comm_out <- detect_communities(
g_consensus,
min_size = 10,
overlap_threshold = 0.2
)
# v0.2 may return either:
# - a plain list of communities
# - or a richer list with $communities, $bridge, $gene2mod
# v0.2 canonical structure
stopifnot("groups" %in% names(comm_out))
large_communities <- comm_out$groups
cat("Communities (size >= 10):", length(large_communities), "\n")## Communities (size >= 10): 2
cat("Sizes:", sapply(large_communities, length), "\n")## Sizes: 23 969
Compute additional network metrics for the consensus graph.
clustering_coeff <- transitivity(g_consensus, type = "global")
cat("Global clustering coefficient:", clustering_coeff, "\n")## Global clustering coefficient: 0.4478631
degree_centrality <- degree(g_consensus)
betweenness_centrality <- betweenness(g_consensus)
closeness_centrality <- closeness(g_consensus)
top_degree <- names(sort(degree_centrality, decreasing = TRUE))[1:10]
top_betweenness <- names(sort(betweenness_centrality, decreasing = TRUE))[1:10]
top_closeness <- names(sort(closeness_centrality, decreasing = TRUE))[1:10]
centrality_table <- data.frame(
Metric = c("Degree", "Betweenness", "Closeness"),
Top_Genes = c(
paste(top_degree, collapse = ", "),
paste(top_betweenness, collapse = ", "),
paste(top_closeness, collapse = ", ")
),
stringsAsFactors = FALSE
)
ft_centrality <- flextable::flextable(centrality_table) %>%
flextable::autofit() %>%
flextable::set_header_labels(Metric = "Centrality Metric", Top_Genes = "Top 10 Genes") %>%
flextable::bg(part = "header", bg = "gray") %>%
flextable::theme_vanilla()
ft_centralityCentrality Metric | Top 10 Genes |
|---|---|
Degree | VPS13C, SENP6, ATRX, COP1, BIRC6, USP34, PIAS1, DOCK2, SETX, GNB1 |
Betweenness | VPS13C, SENP6, FTH1, RHBDF2, GNB1, COP1, FTL, ATRX, ALOX5, SETX |
Closeness | SPP1, AC131944.1, HSPA1A, HSPA1B, MTRNR2L8, MTRNR2L1, LINC01320, DHFR, MTRNR2L12, VPS13C |
Visualize a subset of the MI network. Nodes are colored by community, and top hub genes per community are labeled.
plot_mi_network(g_consensus, large_communities, title = "MG MI Network Communities")To visualize overlapping gene communities, we constructed a hypergraph where genes are connected to their assigned communities. Overlapping genes (shared across communities) are highlighted in red, community nodes in green, and unique genes in blue.
plot_hypergraph(
large_communities,
comm_indices = c(1, 2),
title = "MG Hypergraph with Overlapping Communities",
key_genes = NULL
)Visualize community sizes.
comm_sizes_df <- data.frame(
Community = paste("Community", seq_along(large_communities)),
Size = sapply(large_communities, length)
)
p_comm_sizes <- ggplot(comm_sizes_df, aes(x = Community, y = Size, fill = Community)) +
geom_bar(stat = "identity") +
labs(title = "Community Sizes (MG)", x = "Community", y = "Size") +
theme_minimal() +
theme(legend.position = "none")
print(p_comm_sizes)Generate a table of top hub genes (by degree) for each community using the full graph.
summarize_hub_genes(graph_percentile, large_communities, cluster_name = "MG")Cluster | Community | Top Hub Genes |
|---|---|---|
MG | Community 1 | FTH1, FTL, TMSB10, PFN1, TMSB4X, ATP5F1E, PTMA, OAZ1, YBX1, HSP90AB1 |
Community 2 | SENP6, USP34, GNB1, RAP1A, VPS13C, COP1, CEP170, DOCK2, SETX, ALOX5 |
We identified top hub genes within each community by ranking nodes based on degree centrality. Using summarize_hub_genes(). Key genes extracted potentially involved in core regulatory processes.
Identify top hub genes for communities from percolated graph.
summarize_hub_genes(g_consensus, large_communities, cluster_name = "MG")Cluster | Community | Top Hub Genes |
|---|---|---|
MG | Community 1 | FTH1, FTL, TMSB10, TMSB4X, ATP5F1E, HSP90AA1, PTMA, TPT1, PFN1, NACA |
Community 2 | VPS13C, SENP6, ATRX, COP1, BIRC6, USP34, PIAS1, DOCK2, SETX, GNB1 |
Perform enrichment for communities after filtering of housekeeping genes which tend to be over-represented due to their high mi-scores
# Perform enrichment
enrich_results <- enrich_go(large_communities, ontology = "BP")# Plot the top 4 terms per community
dot_plot <- plot_go_terms(enrich_results, top_n_terms = 5, ontology = "BP")
print(dot_plot)## Save Results# saveRDS(result, file = "mi_network_results.rds")sessionInfo()## R version 4.3.2 (2023-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_SG.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_SG.UTF-8 LC_COLLATE=en_SG.UTF-8
## [5] LC_MONETARY=en_SG.UTF-8 LC_MESSAGES=en_SG.UTF-8
## [7] LC_PAPER=en_SG.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_SG.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Asia/Singapore
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] flextable_0.9.7 ggraph_2.2.1 miEdgeR_0.2.0
## [4] pheatmap_1.0.12 ggVennDiagram_1.5.2 ggpubr_0.6.0
## [7] org.Hs.eg.db_3.18.0 AnnotationDbi_1.64.1 IRanges_2.36.0
## [10] S4Vectors_0.40.2 Biobase_2.62.0 BiocGenerics_0.48.1
## [13] clusterProfiler_4.10.1 ggplot2_3.5.1 dplyr_1.1.4
## [16] igraph_2.1.1 SeuratObject_5.0.2 Seurat_4.3.0.1
##
## loaded via a namespace (and not attached):
## [1] fs_1.6.6 matrixStats_1.4.1 spatstat.sparse_3.1-0
## [4] bitops_1.0-9 enrichplot_1.22.0 HDO.db_0.99.1
## [7] httr_1.4.7 RColorBrewer_1.1-3 tools_4.3.2
## [10] sctransform_0.4.1 backports_1.5.0 utf8_1.2.4
## [13] R6_2.5.1 lazyeval_0.2.2 uwot_0.2.2
## [16] withr_3.0.2 sp_2.1-4 gridExtra_2.3
## [19] progressr_0.15.0 cli_3.6.5 textshaping_0.4.0
## [22] spatstat.explore_3.3-3 scatterpie_0.2.4 officer_0.6.7
## [25] labeling_0.4.3 sass_0.4.9 S7_0.2.0
## [28] spatstat.data_3.1-2 ggridges_0.5.6 pbapply_1.7-2
## [31] askpass_1.2.1 systemfonts_1.3.1 yulab.utils_0.2.0
## [34] gson_0.1.0 DOSE_3.28.2 parallelly_1.39.0
## [37] RSQLite_2.3.7 generics_0.1.3 gridGraphics_0.5-1
## [40] ica_1.0-3 spatstat.random_3.3-2 zip_2.3.1
## [43] car_3.1-3 GO.db_3.18.0 Matrix_1.6-4
## [46] ggbeeswarm_0.7.2 fansi_1.0.6 abind_1.4-8
## [49] infotheo_1.2.0.1 lifecycle_1.0.4 yaml_2.3.10
## [52] carData_3.0-5 qvalue_2.34.0 Rtsne_0.17
## [55] grid_4.3.2 blob_1.2.4 promises_1.3.0
## [58] crayon_1.5.3 miniUI_0.1.2 lattice_0.21-9
## [61] cowplot_1.1.3 KEGGREST_1.42.0 pillar_1.9.0
## [64] knitr_1.48 fgsea_1.28.0 future.apply_1.11.3
## [67] codetools_0.2-19 fastmatch_1.1-4 leiden_0.4.3.1
## [70] glue_1.8.0 ggfun_0.1.7 spatstat.univar_3.1-1
## [73] fontLiberation_0.1.0 data.table_1.16.2 vctrs_0.6.5
## [76] png_0.1-8 treeio_1.26.0 spam_2.11-0
## [79] gtable_0.3.6 cachem_1.1.0 xfun_0.49
## [82] mime_0.12 tidygraph_1.3.1 survival_3.5-7
## [85] fitdistrplus_1.2-1 ROCR_1.0-11 nlme_3.1-163
## [88] ggtree_3.10.1 bit64_4.5.2 fontquiver_0.2.1
## [91] RcppAnnoy_0.0.22 GenomeInfoDb_1.38.8 bslib_0.8.0
## [94] irlba_2.3.5.1 vipor_0.4.7 KernSmooth_2.23-22
## [97] colorspace_2.1-1 DBI_1.2.3 ggrastr_1.0.2
## [100] tidyselect_1.2.1 bit_4.5.0 compiler_4.3.2
## [103] xml2_1.3.6 fontBitstreamVera_0.1.1 plotly_4.10.4
## [106] shadowtext_0.1.4 scales_1.3.0 lmtest_0.9-40
## [109] stringr_1.5.1 digest_0.6.37 goftest_1.2-3
## [112] spatstat.utils_3.1-1 rmarkdown_2.29 XVector_0.42.0
## [115] htmltools_0.5.8.1 pkgconfig_2.0.3 highr_0.11
## [118] fastmap_1.2.0 rlang_1.1.6 htmlwidgets_1.6.4
## [121] shiny_1.9.1 farver_2.1.2 jquerylib_0.1.4
## [124] zoo_1.8-12 jsonlite_1.8.9 BiocParallel_1.36.0
## [127] GOSemSim_2.28.1 RCurl_1.98-1.16 magrittr_2.0.3
## [130] Formula_1.2-5 GenomeInfoDbData_1.2.11 ggplotify_0.1.2
## [133] dotCall64_1.2 patchwork_1.3.0 munsell_0.5.1
## [136] Rcpp_1.0.13-1 ape_5.8 viridis_0.6.5
## [139] gdtools_0.4.1 reticulate_1.40.0 stringi_1.8.4
## [142] zlibbioc_1.48.2 MASS_7.3-60 plyr_1.8.9
## [145] parallel_4.3.2 listenv_0.9.1 ggrepel_0.9.6
## [148] deldir_2.0-4 Biostrings_2.70.3 graphlayouts_1.2.0
## [151] splines_4.3.2 tensor_1.5 uuid_1.2-1
## [154] spatstat.geom_3.3-3 ggsignif_0.6.4 reshape2_1.4.4
## [157] evaluate_1.0.1 tweenr_2.0.3 httpuv_1.6.15
## [160] RANN_2.6.2 tidyr_1.3.1 openssl_2.2.2
## [163] purrr_1.0.2 polyclip_1.10-7 future_1.34.0
## [166] scattermore_1.2 ggforce_0.4.2 broom_1.0.7
## [169] xtable_1.8-4 tidytree_0.4.6 rstatix_0.7.2
## [172] later_1.3.2 viridisLite_0.4.2 ragg_1.3.3
## [175] tibble_3.2.1 aplot_0.2.3 beeswarm_0.4.0
## [178] memoise_2.0.1 cluster_2.1.4 globals_0.16.3