GSE183973 DESeq2

Mikias HW

2024-03-15

Work in progress (https://github.com/MikiasHWT/Bulk-RNA-Seq)

Purpose : Comparison of RNA expression of three innate immune cell populations across three patient groups.

Load libraries

library(prettydoc)
# library(GEOquery) # import datasets from GEO (unused)
library(DESeq2) # Diverential gene expression analysis
library(vsn) # Transformation
library(apeglm) # log fold shrinking (adaptive t prior shrinkage estimator)
library(ashr) # log fold shrinking (adaptive shrinkage estimator)
library(pheatmap) # heatmaps
library(RColorBrewer) # Color themes
library(EnhancedVolcano) # Pleasing volcano plots
library(gridExtra) # GGplot extention
library(DT) # nice data tables
library(SparseArray)
library(ggvenn)
library(qvalue)

library(org.Mm.eg.db) # Mouse database
library(org.Hs.eg.db) # Human database
library(DOSE)
library(pathview)
library(clusterProfiler) # over representation/gene enrichment
library(AnnotationHub)
library(ensembldb)
library(enrichplot)
library(biomaRt)
library(ReactomePA)
library(ggupset)
library(cowplot)
library(europepmc)
library(ggstance)
# library(GO.db) # Gene ontology
# library(GOstats) # Gene ontology
library(tidyverse) # data wrangling & visualization
library(conflicted)

Conflicted functions

# Set function preferences
conflicts_prefer(dplyr::select)

conflicts_prefer(dplyr::filter)

conflicts_prefer(dplyr::slice)

conflicts_prefer(SparseArray::rowSds)

Define MA Plotting function MA plot visualizes relationships between log ratio & mean values of two variables: • “M” = minus in the log scale. Log ratios on (y) axis. • “A” = average in the log scale. The mean values on (x) axis.

# MA Plots
# Arranged into 2 rows
# Color & translucency by DEG
# Plot titles fed in as arguments
# Mean counts (x-axis) Log10 transformed

plotMA_arrange_iterative <- function(res_list) {
  plots <- list()

  for (i in seq_along(res_list)) {
    res <- res_list[[i]]
    res_name <- names(res_list)[i] # Get the name of the res object
    p <- plotMA(res, returnData = TRUE) # DESeq2's PlotMA function
    p_plot <- ggplot(p, aes(x = mean, y = lfc, color = isDE)) +
      geom_point(aes(alpha = ifelse(isDE == "TRUE", 1, 0.1)), show.legend = FALSE) + # Reduce non DEG genes transperancy
      scale_x_continuous(trans = "log10") + # Scale mean counts
      ylim(-2, 2) +
      labs(title = res_name) # Use the name in the plot title

    plots[[i]] <- p_plot
  }

  do.call(grid.arrange, c(plots, nrow = 2))
}

Define Heatmap plotting function

Preparation

Note: Using GEOquery will require some troubleshooting as the GSE submission for this dataset lacked feature information. Instead the CSV files for gene counts and metadata can be download directly from GEO and saved locally.

Load data

# Load CSV's for gene counts & metadata. Retain untidy names for ease
counts <- read.csv("Data/GSE183973_bulkRNA_gene_counts.csv",
  check.names = FALSE
)

metadata <- read.csv("Data/GSE183973_metadata_samples.csv",
  check.names = FALSE
)

head(counts)
head(metadata)

Initially, Two sets of variables are considered for this analysis.

Cell types: mono, ma, mreg

Patient Groups: smoker, non_smoker, copd

There are 9 patients evenly spread across the 3 groups. Each patient contributing Blood Monocytes (mono), Alveolar Macrophages (ma), and Regulatory Macrophages (mreg), the latter two being extracted from bronchoalveolar lavage fluid. Cells were isolated by Flow Cytometry and sequenced in Bulk.

Clean up

# Set conditions variables as factors
metadata$patient_group <- factor(metadata$patient_group)

metadata$cell_type <- factor(metadata$cell_type)

# Define missing column names
colnames(counts)[1] <- "genes"

row.names(counts) <- counts$genes

colnames(metadata)[1] <- "samples"

Match samples

# Remove genes column (gene names retained as index)
# Assign to new df to retain genes column for later data exploration
geneCounts <- counts |>
  select(-genes)

# Match counts column names and metadata row names
desired_order <- metadata$samples

geneCounts <- geneCounts[, desired_order]

# Confirm match
all(colnames(geneCounts) %in% metadata$samples)
## [1] TRUE
all(colnames(geneCounts) == metadata$samples)
## [1] TRUE

Construct DESeqDataSet object

# Define design matrix for DESeq2
dds <- DESeqDataSetFromMatrix(
  countData = geneCounts,
  colData = metadata,
  design = ~ cell_type + patient_group
)

# normalization by estimating size factor
dds <- estimateSizeFactors(dds)

# remove low expressed genes
keep <- rowSums(counts(dds)) >= 10

dds <- dds[keep, ]

# set factor level (reference group) : Im interested in comparing copd and smokers to the reference group non-smokers in this case.
dds$patient_group <- relevel(dds$patient_group, ref = "non_smoker")

# Run DEG analysis
dds <- DESeq(dds)
## using pre-existing size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
dds
## class: DESeqDataSet 
## dim: 21090 27 
## metadata(1): version
## assays(4): counts mu H cooks
## rownames(21090): WASH7P RP11-34P13.7 ... RP11-65G9.1 PARP4P1
## rowData names(34): baseMean baseVar ... deviance maxCooks
## colnames(27): 10.smoker.ag.mono_NGS19.K315 11.smoker.ag.ma_NGS19.K316
##   ... 8.smoker.lc.ma_NGS19.K313 9.smoker.lc.mreg_NGS19.K314
## colData names(5): samples patient_group patient_initials cell_type
##   sizeFactor
# lists the contracts
resultsNames(dds)
## [1] "Intercept"                          "cell_type_mono_vs_ma"              
## [3] "cell_type_mreg_vs_ma"               "patient_group_copd_vs_non_smoker"  
## [5] "patient_group_smoker_vs_non_smoker"

Results

Note U-shaped p-value histograms-can indicate that a one-sided test was performed on data where there is signal in both directions, or it can indicate that there is dependence among the variables in the data.

Might be worth checking cell_type:patient_group interaction term. As Pvalue histogram has a little Ushape to it.

View results of default DEG analysis

# defaul results with padj = 0.1
res <- results(dds)

# Specifying a strict FDR (p-adj) cutoff for posterity. (default is 0.1)
res05 <- results(dds, alpha = 0.05)

# Default contrast will be "smoker" vs reference group "non_smoker"
head(as.data.frame(res05))
summary(res05)
## 
## out of 21090 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 86, 0.41%
## LFC < 0 (down)     : 102, 0.48%
## outliers [1]       : 60, 0.28%
## low counts [2]     : 4078, 19%
## (mean count < 2)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
# Count of DEG's & display pvalue histograms
sum(res$padj < 0.1, na.rm = TRUE)
## [1] 311
hist(res$pvalue)

# Set a more strict FDR cutoff
sum(res05$padj < 0.05, na.rm = TRUE)
## [1] 188
hist(res05$pvalue)

FDR estimations

qobj <- qvalue(res$pvalue)

summary(qobj)
## 
## Call:
## qvalue(p = res$pvalue)
## 
## pi0: 1   
## 
## Cumulative number of significant calls:
## 
##           <1e-04 <0.001 <0.01 <0.025 <0.05 <0.1    <1
## p-value       97    240   698   1103  1673 2724 21030
## q-value        9     18    74    113   169  258 21030
## local FDR      8     10    38     67    89  143  2155
hist(qobj)

plot(qobj)

Test interaction term

# dds2 <- DESeqDataSetFromMatrix(
#   countData = geneCounts,
#   colData = metadata,
#   design = ~ cell_type + patient_group + cell_type:patient_group
# )
#
# dds2 <- estimateSizeFactors(dds2)
#
# keep <- rowSums(counts(dds2)) >= 10
#
# dds2 <- dds2[keep, ]
#
# dds2$patient_group <- relevel(dds2$patient_group, ref = "non_smoker")
#
# dds2 <- DESeq(dds2)
#
# resultsNames(dds2)
#
# res105 <- results(dds2, alpha = 0.1)
#
# summary(res105)
#
# sum(res105$padj < 0.1, na.rm = TRUE)
#
# hist(res105$pvalue)
#
# qobj2 <- qvalue(res105$pvalue)
#
# summary(qobj2)
#
# hist(qobj2)
#
# plot(qobj2)

Viewing DEG’s alternate contracts

# Next im curious about seeing the remaining group comparisons

# COPD vs Non-Smoker
resCOPD <- results(dds, alpha = 0.05, contrast = c("patient_group", "copd", "non_smoker"))

head(as.data.frame(resCOPD))
summary(resCOPD)
## 
## out of 21090 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 779, 3.7%
## LFC < 0 (down)     : 704, 3.3%
## outliers [1]       : 60, 0.28%
## low counts [2]     : 5699, 27%
## (mean count < 3)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
sum(resCOPD$padj < 0.05, na.rm = TRUE)
## [1] 1483
hist(resCOPD$pvalue)

qobj <- qvalue(resCOPD$pvalue)

summary(qobj)
## 
## Call:
## qvalue(p = resCOPD$pvalue)
## 
## pi0: 0.7801735   
## 
## Cumulative number of significant calls:
## 
##           <1e-04 <0.001 <0.01 <0.025 <0.05 <0.1    <1
## p-value      327    791  2011   2934  3970 5580 21021
## q-value       60    154   463    926  1473 2317 21030
## local FDR     39     96   273    476   799 1257 11013
hist(qobj)

plot(qobj)

# COPD vs Smoker
resWild <- results(dds, alpha = 0.05, contrast = c("patient_group", "copd", "smoker"))

head(as.data.frame(resWild))
summary(resWild)
## 
## out of 21090 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 434, 2.1%
## LFC < 0 (down)     : 357, 1.7%
## outliers [1]       : 60, 0.28%
## low counts [2]     : 8145, 39%
## (mean count < 13)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
sum(resWild$padj < 0.05, na.rm = TRUE)
## [1] 791
hist(resWild$pvalue)

qobj <- qvalue(resWild$pvalue)

summary(qobj)
## 
## Call:
## qvalue(p = resWild$pvalue)
## 
## pi0: 0.7705233   
## 
## Cumulative number of significant calls:
## 
##           <1e-04 <0.001 <0.01 <0.025 <0.05 <0.1    <1
## p-value      167    472  1578   2493  3536 5112 21023
## q-value       21     36   174    327   708 1539 21030
## local FDR     13     25    88    187   356  785 19802
hist(qobj)

plot(qobj)

Log fold Skrink DEG’s

Aids in visualization and gene ranking

# Adaptive t prior shrinkage estimator
resLFC <- lfcShrink(dds,
  coef = "patient_group_smoker_vs_non_smoker",
  type = "apeglm"
)
## using 'apeglm' for LFC shrinkage. If used in published research, please cite:
##     Zhu, A., Ibrahim, J.G., Love, M.I. (2018) Heavy-tailed prior distributions for
##     sequence count data: removing the noise and preserving large differences.
##     Bioinformatics. https://doi.org/10.1093/bioinformatics/bty895
# Original DESeq2 shrinkage estimator, an adaptive Normal distribution as prior.
resNorm <- lfcShrink(dds,
  coef = "patient_group_smoker_vs_non_smoker",
  type = "normal"
)
## using 'normal' for LFC shrinkage, the Normal prior from Love et al (2014).
## 
## Note that type='apeglm' and type='ashr' have shown to have less bias than type='normal'.
## See ?lfcShrink for more details on shrinkage type, and the DESeq2 vignette.
## Reference: https://doi.org/10.1093/bioinformatics/bty895
# Adaptive shrinkage estimator. Fits a mixture of Normal distributions to form the prior
resAsh <- lfcShrink(dds,
  coef = "patient_group_smoker_vs_non_smoker",
  type = "ashr"
)
## using 'ashr' for LFC shrinkage. If used in published research, please cite:
##     Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2.
##     https://doi.org/10.1093/biostatistics/kxw041

Subset results

# Extract differentially expressed genes
subRes05 <- subset(res, padj < 0.05)

subRes01 <- subset(res, padj < 0.01)

subRes005 <- subset(res, padj < 0.005)

# DEG05 <- subRes05@rownames
#
# DEG01 <- subRes01@rownames
#
# DEG005 <- subRes005@rownames

Plot subset results

# counts |>
#   filter(genes %in% DEG) |>
#   ggplot(aes(x = samples, y = genes, fill = counts)) +
#   geom_tile() +
#   scale_fill_gradient(low = "white", high = "red") +
#   theme(axis.text.x = element_text(angle = 45, hjust = 1))

Visualize

Visualize DEG results

# Visualize log2 fold changes of a given variable over the mean of normalized counts for all samples

res_list <- list(
  "padj=0.1" = res,
  "padj=0.05" = res05,
  "Log fold shrunk padj=0.1" = resLFC
)

plotMA_arrange_iterative(res_list)
## Warning: Removed 357 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 357 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 38 rows containing missing values or values outside the scale range
## (`geom_point()`).

Visualizing alternate contrasts

res_list <- list(
  "Smoker vs Non_Smoker DEG's" = res,
  "COPD v Non Smoker DEG's" = resCOPD,
  "COPD v Smoker DEG's" = resWild
)

plotMA_arrange_iterative(res_list)
## Warning: Removed 357 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 455 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 364 rows containing missing values or values outside the scale range
## (`geom_point()`).

Visualize Log Fold Shrunken MA Plots

# Plot log fold shrunken DEG's
res_list <- list(
  "apeglm" = resLFC,
  "normal" = resNorm,
  "ashr" = resAsh
)

plotMA_arrange_iterative(res_list)
## Warning: Removed 38 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 18 rows containing missing values or values outside the scale range
## (`geom_point()`).

visualize subset results

# Plot log fold shrunken DEG's
res_list <- list(
  "Subset 0.05" = subRes05,
  "Subset 0.01" = subRes01,
  "Subset 0.005" = subRes005
)

plotMA_arrange_iterative(res_list)
## Warning: Removed 58 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 15 rows containing missing values or values outside the scale range
## (`geom_point()`).

Note: If there is unwanted variation present in the data (e.g. batch effects) it is always recommend to correct for this, which can be accommodated in DESeq2 by including in the design any known batch variables or by using functions/packages such as svaseq in sva (Leek 2014) or the RUV functions in RUVSeq (Risso et al. 2014) to estimate variables that capture the unwanted variation. In addition, the ashr developers have a specific method for accounting for unwanted variation in combination with ashr (Gerard and Stephens 2017).

Transform

Raw gene variance

Raw RNA sequencing counts have a high degree of variability; which increases with a genes average expression level. This variability needs to be accounted before some downstream analysis such as visualization and clustering. We do this by transforming the data such that the variance is stabilized across the entire spectrum of expression values, allowing for use of statistical models/methods where that is a requirement.

# Filter out low expressed genes
keep <- rowSums(geneCounts) > 10

filtCounts <- as.matrix(geneCounts[keep, ])

# Visualize gene expression values by sample
boxplot(filtCounts, main = "Raw counts", ylab = "Gene Expression", las = 2)

# Mean expression vs standard deviation (SD)
plot(rowMeans(filtCounts), rowSds(filtCounts),
  main = "Raw counts: Mean vs SD",
  xlim = c(0, 10000),
  ylim = c(0, 5000)
)

# Save as ggplot
Raw <- ggplot(filtCounts, aes(
  x = rowMeans(filtCounts),
  y = rowSds(filtCounts)
)) +
  geom_point() +
  geom_smooth() +
  labs(title = "Raw", x = "Mean", y = "SD")

Various Transformations

Log2 fold transformation were the standard in the past, but they can result in over representation of low expressed genes. As such various other methods are implemented. The goal is that the standard deviation stays consistent across all mean values of gene expression. vst and rlog have pros and cons determined by the number of samples.

# Get log2 counts
logcounts <- log2(filtCounts + 1)

# Plot mean vs SD
Log2 <- ggplot(logcounts, aes(
  x = rowMeans(logcounts),
  y = rowSds(logcounts)
)) +
  geom_point() +
  geom_smooth() +
  labs(title = "Log2", x = "Mean", y = "SD")

# Variance stabilizing transformation
vst_counts <- vst(filtCounts)

# Plot
VST <- ggplot(vst_counts, aes(
  x = rowMeans(vst_counts),
  y = rowSds(vst_counts)
)) +
  geom_point() +
  geom_smooth() +
  labs(title = "VST", x = "Mean", y = "SD")

# Rlog counts
rlog_counts <- rlog(filtCounts)

# Plot
Rlog <- ggplot(rlog_counts, aes(
  x = rowMeans(rlog_counts),
  y = rowSds(rlog_counts)
)) +
  geom_point() +
  geom_smooth() +
  labs(title = "Rlog", x = "Mean", y = "SD")

# Arrange Plots
grid.arrange(Raw, Log2, VST, Rlog, nrow = 2)
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

# select <- order(rowMeans(rlog_counts), decreasing = TRUE)[1:20]
#
# df <- as.data.frame(metadata[, c("patient_group", "cell_type")])
#
# pheatmap(rlog_counts[select,], main = "rlog",
#          cluster_rows = TRUE, show_rownames = TRUE,
#          cluster_cols = TRUE, show_colnames = FALSE)
# Variance stabilizing transformation. (log2 scaling for large counts)
vsd <- vst(dds, blind = FALSE)

rld <- rlog(dds, blind = FALSE)

ntd <- normTransform(dds)


#
# meanSdPlot(assay(ntd))
#
# meanSdPlot(assay(vsd))
#
# meanSdPlot(assay(rld))

Quality Assesment

Quality assesment with Heatmaps

select <- order(rowMeans(counts(dds, normalized = TRUE)),
  decreasing = TRUE
)[1:20]

df <- as.data.frame(colData(dds)[, c("patient_group", "cell_type")])

pheatmap(assay(ntd)[select, ],
  cluster_rows = TRUE, show_rownames = TRUE,
  cluster_cols = TRUE, annotation_col = df, show_colnames = FALSE,
  main = "ntd", scale = "row"
)

pheatmap(assay(vsd)[select, ],
  cluster_rows = TRUE, show_rownames = TRUE,
  cluster_cols = TRUE, annotation_col = df, show_colnames = FALSE,
  main = "vsd", scale = "row"
)

pheatmap(assay(rld)[select, ],
  cluster_rows = TRUE, show_rownames = TRUE,
  cluster_cols = TRUE, annotation_col = df, show_colnames = FALSE,
  main = "rld", scale = "row"
)

Sample distance

# Determine between group variety
sampleDists <- dist(t(assay(vsd)))

sampleDistMatrix <- as.matrix(sampleDists)

rownames(sampleDistMatrix) <- paste(vsd$cell_type, vsd$patient_group, sep = "-")

colnames(sampleDistMatrix) <- NULL

colors <- colorRampPalette(rev(brewer.pal(9, "Blues")))(255)

pheatmap(sampleDistMatrix,
  clustering_distance_rows = sampleDists,
  clustering_distance_cols = sampleDists,
  col = colors
)

PCA plots

plotPCA(vsd, intgroup = c("cell_type", "patient_group"))

plotPCA(vsd, intgroup = c("cell_type"))

pcaData <- plotPCA(vsd,
  intgroup = c("cell_type", "patient_group"),
  returnData = TRUE
)

percentVar <- round(100 * attr(pcaData, "percentVar"))

ggplot(pcaData, aes(PC1, PC2, color = patient_group, shape = cell_type)) +
  geom_point(size = 3) +
  xlab(paste0("PC1: ", percentVar[1], "% variance")) +
  ylab(paste0("PC2: ", percentVar[2], "% variance")) +
  coord_fixed()

Plot Counts

# Plot the gene with the smallest p adj value across specified groupings
c1 <- plotCounts(dds, gene = which.min(res$padj), intgroup = c("cell_type"), returnData = TRUE)

c1_plot <- ggplot(c1, aes(x = cell_type, y = count)) +
  geom_point(position = position_jitter(w = 0.1, h = 0)) +
  scale_y_log10(breaks = c(25, 100, 400))

c2 <- plotCounts(dds, gene = which.min(res$padj), intgroup = c("patient_group"), returnData = TRUE)

c2_plot <- ggplot(c2, aes(x = patient_group, y = count)) +
  geom_point(position = position_jitter(w = 0.1, h = 0)) +
  scale_y_log10(breaks = c(25, 100, 400))

c3 <- plotCounts(dds, gene = which.min(res$padj), intgroup = c("cell_type", "patient_group"), returnData = TRUE)

c3_plot <- ggplot(c3, aes(x = patient_group, y = count, color = cell_type)) +
  geom_point(position = position_jitter(w = 0.1, h = 0)) +
  scale_y_log10(breaks = c(25, 100, 400))

gridExtra::grid.arrange(c1_plot, c2_plot, c3_plot, nrow = 2)

# plot chosen genes per chosen conditions
plotCounts(dds, gene = which.min(res$padj), intgroup = "patient_group")

plotCounts(dds, gene = which.min(res$padj), intgroup = c("patient_group", "cell_type"))

plotCounts(dds, gene = "CD101", intgroup = "patient_group")

p <- plotCounts(dds,
  gene = which.min(res$padj),
  intgroup = c("patient_group", "cell_type"),
  returnData = TRUE
)

ggplot(p, aes(x = patient_group, y = count, color = cell_type)) +
  geom_point(position = position_jitter(w = 0.1, h = 0), size = 3) +
  scale_y_log10(breaks = c(25, 100, 400))

Exploratory Data Analysis

Merge counts & Metadata for Data exploration

# Pivot DF for exploratory data analysis
countsLong <- counts |>
  pivot_longer(cols = !genes, names_to = "samples", values_to = "counts")

head(countsLong)
# Annotate counts with metadata
counts <- countsLong |>
  left_join(metadata, by = c("samples" = "samples"))

head(counts)

Random EDA

# barplot
counts |>
  filter(genes == "CD101") |>
  ggplot(aes(x = samples, y = counts, fill = cell_type)) +
  geom_col() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# density plot
counts |>
  filter(genes == "FTL") |>
  ggplot(aes(x = counts, fill = cell_type)) +
  geom_density(alpha = 0.5)

# boxplot
counts |>
  filter(genes == "LYZ") |>
  ggplot(aes(x = cell_type, y = counts)) +
  geom_boxplot()

# voilinplot
counts |>
  filter(genes == "CD74") |>
  ggplot(aes(x = cell_type, y = counts)) +
  geom_violin()

# scatterplot
counts |>
  filter(genes == "FTL" | genes == "LYZ") |>
  pivot_wider(names_from = genes, values_from = counts) |>
  ggplot(aes(x = FTL, y = LYZ, color = cell_type)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'

# heatmap
genesOfInterest <- c("FTL", "FN1", "CD74", "LYZ")

counts |>
  filter(genes %in% genesOfInterest) |>
  ggplot(aes(x = samples, y = genes, fill = counts)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "red") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Volcano Plot

EnhancedVolcano(res,
  lab = rownames(res),
  x = "log2FoldChange",
  y = "pvalue",
  title = "Smoker vs Non-smoker",
  pCutoff = 1e-05,
  FCcutoff = 0.5,
  pointSize = 2,
  labSize = 6.0,
  colAlpha = 0.3
)

resLFC_DF <- as.data.frame(resLFC) |>
  rownames_to_column("GeneID")

resLFC_DF <- resLFC_DF |>
  mutate(`-log10(pvalue)` = -log10(pvalue))

ggplot(resLFC_DF, aes(x = log2FoldChange, y = `-log10(pvalue)`)) +
  geom_point(aes(colour = padj < 0.05), size = 1) +
  geom_text(data = ~ top_n(.x, 5, wt = -padj), aes(label = GeneID)) +
  labs(title = "Smoker vs NonSmoker")
## Warning: Removed 60 rows containing missing values or values outside the scale range
## (`geom_point()`).

Subset results

# Extract differentially expressed genes
DE_genes <- subset(res, padj < 0.05)

summary(DE_genes)
## 
## out of 189 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up)       : 86, 46%
## LFC < 0 (down)     : 103, 54%
## outliers [1]       : 0, 0%
## low counts [2]     : 0, 0%
## (mean count < 2)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
DEG <- DE_genes@rownames

Plot subset results

plotMA(DE_genes)

counts |>
  filter(genes %in% DEG) |>
  ggplot(aes(x = samples, y = genes, fill = counts)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "red") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Exporting results

# order results by pvalue
resOrdered <- res[order(res$pvalue), ]

resOrdered
## log2 fold change (MLE): patient group smoker vs non smoker 
## Wald test p-value: patient group smoker vs non smoker 
## DataFrame with 21090 rows and 6 columns
##                baseMean log2FoldChange     lfcSE       stat      pvalue
##               <numeric>      <numeric> <numeric>  <numeric>   <numeric>
## HLA-DPB2        31.5128      -2.169615 0.2259472   -9.60231 7.81732e-22
## PLEKHG3        486.4914      -0.668327 0.0919471   -7.26860 3.63227e-13
## YPEL4          101.9154       1.368702 0.1981824    6.90628 4.97540e-12
## PLTP           546.4092      -2.160822 0.3264863   -6.61841 3.63075e-11
## SARDH           15.2484       4.876028 0.7729927    6.30799 2.82688e-10
## ...                 ...            ...       ...        ...         ...
## NRARP           3.15119     -2.9343118  0.927136 -3.1649196          NA
## P2RY8         641.71420     -0.8002904  0.569591 -1.4050256          NA
## FAM9B           2.24115     -4.5838024  1.719012 -2.6665326          NA
## XIST         2725.38469      0.0420529  2.928501  0.0143599          NA
## CTD-2328D6.1    3.14600      9.0527378  3.163472  2.8616462          NA
##                     padj
##                <numeric>
## HLA-DPB2     1.29361e-17
## PLEKHG3      3.00534e-09
## YPEL4        2.74443e-08
## PLTP         1.50204e-07
## SARDH        9.03196e-07
## ...                  ...
## NRARP                 NA
## P2RY8                 NA
## FAM9B                 NA
## XIST                  NA
## CTD-2328D6.1          NA
summary(resOrdered)
## 
## out of 21090 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up)       : 140, 0.66%
## LFC < 0 (down)     : 171, 0.81%
## outliers [1]       : 60, 0.28%
## low counts [2]     : 4482, 21%
## (mean count < 2)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
resOrdered05 <- subset(resOrdered, padj < 0.05)
resOrdered05
## log2 fold change (MLE): patient group smoker vs non smoker 
## Wald test p-value: patient group smoker vs non smoker 
## DataFrame with 189 rows and 6 columns
##             baseMean log2FoldChange     lfcSE      stat      pvalue        padj
##            <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
## HLA-DPB2     31.5128      -2.169615 0.2259472  -9.60231 7.81732e-22 1.29361e-17
## PLEKHG3     486.4914      -0.668327 0.0919471  -7.26860 3.63227e-13 3.00534e-09
## YPEL4       101.9154       1.368702 0.1981824   6.90628 4.97540e-12 2.74443e-08
## PLTP        546.4092      -2.160822 0.3264863  -6.61841 3.63075e-11 1.50204e-07
## SARDH        15.2484       4.876028 0.7729927   6.30799 2.82688e-10 9.03196e-07
## ...              ...            ...       ...       ...         ...         ...
## PI3          7.24133      -3.157026  0.912284  -3.46058 0.000539023   0.0482149
## PFKFB3    1163.56592      -0.637206  0.184499  -3.45372 0.000552915   0.0489829
## OR2A1-AS1   35.49370       1.192636  0.345349   3.45342 0.000553529   0.0489829
## LPAR5       54.24131      -1.032522  0.299226  -3.45065 0.000559241   0.0492251
## PTPN3       14.19870      -1.807723  0.524275  -3.44805 0.000564660   0.0494391
# write.csv(as.data.frame(resOrdered05), file="Smoker_NonSmoker_results_padj05.csv")

#Venn Diagram

vennTest <- tibble(Geneid = rownames(res)) |>
  mutate(NonSmokerVSSmoker = res$padj < 0.05 &
    !is.na(res$padj) &
    res$log2FoldChange > 0) |>
  mutate(Smoker = res$padj < 0.05 &
    !is.na(res$padj) &
    res$log2FoldChange < 0) |>
  mutate(NonSmokerVSCOPD = resCOPD$padj < 0.05 &
    !is.na(resCOPD$padj) &
    resCOPD$log2FoldChange > 0) |>
  mutate(COPD = resCOPD$padj < 0.05 &
    !is.na(resCOPD$padj) &
    resCOPD$log2FoldChange < 0)

ggvenn(vennTest, set_name_size = 4)

Next Analysis: Enrichment/Over-representation analysis

https://yulab-smu.top/biomedical-knowledge-mining-book/index.html

file:///C:/Users/Owner/Downloads/RNASEQ20_Day3_HandsOn.pdf ## ORA & Enrichment

# Create background dataset for hypergeometric testing using all genes tested for significance in the results
all_genes <- as.character(rownames(res))

# Extract significant results
signif_res <- res[res$padj < 0.05 & !is.na(res$padj), ]

signif_genes <- as.character(rownames(signif_res))

# Run GO enrichment analysis
egoBP <- enrichGO(
  gene = signif_genes,
  universe = all_genes,
  keyType = "SYMBOL",
  OrgDb = org.Hs.eg.db,
  ont = "BP",
  pAdjustMethod = "BH",
  qvalueCutoff = 0.05,
  readable = TRUE
)

head(egoBP)
egoCC <- enrichGO(
  gene = signif_genes,
  universe = all_genes,
  keyType = "SYMBOL",
  OrgDb = org.Hs.eg.db,
  ont = "CC",
  pAdjustMethod = "BH",
  qvalueCutoff = 0.05,
  readable = TRUE
)

head(egoCC)
egoMF <- enrichGO(
  gene = signif_genes,
  universe = all_genes,
  keyType = "SYMBOL",
  OrgDb = org.Hs.eg.db,
  ont = "MF",
  pAdjustMethod = "BH",
  qvalueCutoff = 0.05,
  readable = TRUE
)

head(egoMF)
# Output results from GO analysis to a table
clusterBP_summary <- data.frame(egoBP)

clusterCC_summary <- data.frame(egoCC)

clusterMF_summary <- data.frame(egoMF)


# Visualize
dotplot(egoBP, showCategory = 12)

goplot(egoBP)
## Warning: ggrepel: 43 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

# dotplot(egoCC, showCategory = 12) # no results for Cellular Component

# goplot(egoCC)

dotplot(egoMF, showCategory = 12)

goplot(egoMF)

# # Plot enrichment map
# emapplot(ego, showCategory = 50)
#
# # Calculate term similarity matrix
# sim_matrix <- pairwise_termsim(ego)
#
# # Plot enrichment map
# emapplot(ego, showCategory = 50, sim_matrix = sim_matrix)

# To color genes by log2 fold changes
signif_res_lFC <- signif_res$log2FoldChange

cnetplot(egoBP,
  categorySize = "pvalue",
  showCategory = 5,
  foldChange = signif_res_lFC,
  vertex.label.font = 6
)
## Warning in cnetplot.enrichResult(x, ...): Use 'color.params = list(foldChange = your_value)' instead of 'foldChange'.
##  The foldChange parameter will be removed in the next version.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.

cnetplot(egoMF,
  categorySize = "pvalue",
  showCategory = 5,
  foldChange = signif_res_lFC,
  vertex.label.font = 6
)
## Warning in cnetplot.enrichResult(x, ...): Use 'color.params = list(foldChange = your_value)' instead of 'foldChange'.
##  The foldChange parameter will be removed in the next version.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.

GO over-representation analysis

data(geneList, package = "DOSE")
gene <- names(geneList)[abs(geneList) > 2]

# Entrez gene ID
head(gene)
## [1] "4312"  "8318"  "10874" "55143" "55388" "991"
# GroupGO
ggo <- groupGO(
  gene = signif_genes,
  OrgDb = org.Hs.eg.db,
  keyType = "SYMBOL",
  ont = "CC",
  level = 3,
  readable = TRUE
)

head(ggo)
# # EnrichGO
# ego <- enrichGO(gene          = gene,
#                 universe      = names(geneList),
#                 OrgDb         = org.Hs.eg.db,
#                 ont           = "CC",
#                 pAdjustMethod = "BH",
#                 pvalueCutoff  = 0.01,
#                 qvalueCutoff  = 0.05,
#         readable      = TRUE)
#
# head(ego)
#
# goplot(ego)
#
# gene.df <- bitr(gene, fromType = "ENTREZID",
#         toType = c("ENSEMBL", "SYMBOL"),
#         OrgDb = org.Hs.eg.db)
#
# ego2 <- enrichGO(gene         = gene.df$ENSEMBL,
#                 OrgDb         = org.Hs.eg.db,
#                 keyType       = 'ENSEMBL',
#                 ont           = "CC",
#                 pAdjustMethod = "BH",
#                 pvalueCutoff  = 0.01,
#                 qvalueCutoff  = 0.05)
#
# head(ego2, 3)
#
# goplot(ego2)

# Gene IDs can be mapped to gene Symbols by using the parameter readable=TRUE or setReadable() function.

GO Gene Set Enrichment Analysis

ego3 <- gseGO(
  geneList = geneList,
  OrgDb = org.Hs.eg.db,
  ont = "CC",
  minGSSize = 100,
  maxGSSize = 500,
  pvalueCutoff = 0.05,
  verbose = FALSE
)
## Warning in fgseaMultilevel(pathways = pathways, stats = stats, minSize =
## minSize, : For some pathways, in reality P-values are less than 1e-10. You can
## set the `eps` argument to zero for better estimation.
goplot(ego3)

KEGG enrichment analysis

search_kegg_organism("ece", by = "kegg_code")
ecoli <- search_kegg_organism("Escherichia coli", by = "scientific_name")

dim(ecoli)
## [1] 65  3
head(ecoli)
# KEGG pathway over-representation analysis
data(geneList, package = "DOSE")

gene <- names(geneList)[abs(geneList) > 2]

kk <- enrichKEGG(
  gene = gene,
  organism = "hsa",
  pvalueCutoff = 0.05
)
## Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
## Warning in utils::download.file(url, quiet = TRUE, method = method, ...): the
## 'wininet' method is deprecated for http:// and https:// URLs
## Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
## Warning in utils::download.file(url, quiet = TRUE, method = method, ...): the
## 'wininet' method is deprecated for http:// and https:// URLs
head(kk)
# KEGG pathway gene set enrichment analysis
kk2 <- gseKEGG(
  geneList = geneList,
  organism = "hsa",
  minGSSize = 120,
  pvalueCutoff = 0.05,
  verbose = FALSE
)
## Warning in fgseaMultilevel(pathways = pathways, stats = stats, minSize =
## minSize, : For some pathways, in reality P-values are less than 1e-10. You can
## set the `eps` argument to zero for better estimation.
head(kk2)
# KEGG module over-representation analysis
mkk <- enrichMKEGG(
  gene = gene,
  organism = "hsa",
  pvalueCutoff = 1,
  qvalueCutoff = 1
)
## Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/module"...
## Warning in utils::download.file(url, quiet = TRUE, method = method, ...): the
## 'wininet' method is deprecated for http:// and https:// URLs
## Reading KEGG annotation online: "https://rest.kegg.jp/list/module"...
## Warning in utils::download.file(url, quiet = TRUE, method = method, ...): the
## 'wininet' method is deprecated for http:// and https:// URLs
head(mkk)
# KEGG module gene set enrichment analysis
mkk2 <- gseMKEGG(
  geneList = geneList,
  organism = "hsa",
  pvalueCutoff = 1
)
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
head(mkk2)

Visualize & Exporting enriched KEGG pathways

# # Opens interactive web browser
# browseKEGG(kk, 'hsa04110')

# # Output pathway PNG's
# hsa04110 <- pathview(gene.data  = geneList,
#                      pathway.id = "hsa04110",
#                      species    = "hsa",
#                      limit      = list(gene=max(abs(geneList)), cpd=1))

Reactome enrichment analysis

The input gene ID should be Entrez gene ID. Use clusterProfiler::bitr() to convert biological IDs.

# Reactome pathway over-representation analysis
data(geneList, package = "DOSE")

de <- names(geneList)[abs(geneList) > 1.5]

head(de)
## [1] "4312"  "8318"  "10874" "55143" "55388" "991"
x <- enrichPathway(gene = de, pvalueCutoff = 0.05, readable = TRUE)

head(x)
# Reactome pathway gene set enrichment analysis
y <- gsePathway(geneList,
  pvalueCutoff = 0.2,
  pAdjustMethod = "BH",
  verbose = FALSE
)
## Warning in fgseaMultilevel(pathways = pathways, stats = stats, minSize =
## minSize, : For some pathways, in reality P-values are less than 1e-10. You can
## set the `eps` argument to zero for better estimation.
head(y)
# Pathway Visualization

viewPathway("E2F mediated regulation of DNA replication",
  readable = TRUE,
  foldChange = geneList
)
## Loading required package: graphite

Disease enrichment analysis

# Over-representation analysis for disease ontology
data(geneList)

gene <- names(geneList)[abs(geneList) > 1.5]

head(gene)
## [1] "4312"  "8318"  "10874" "55143" "55388" "991"
x <- enrichDO(
  gene = gene,
  ont = "DO",
  pvalueCutoff = 0.05,
  pAdjustMethod = "BH",
  universe = names(geneList),
  minGSSize = 5,
  maxGSSize = 500,
  qvalueCutoff = 0.05,
  readable = FALSE
)
head(x)

Biological theme comparison

# Comparing multiple gene lists
data(gcSample)

str(gcSample)
## List of 8
##  $ X1: chr [1:216] "4597" "7111" "5266" "2175" ...
##  $ X2: chr [1:805] "23450" "5160" "7126" "26118" ...
##  $ X3: chr [1:392] "894" "7057" "22906" "3339" ...
##  $ X4: chr [1:838] "5573" "7453" "5245" "23450" ...
##  $ X5: chr [1:929] "5982" "7318" "6352" "2101" ...
##  $ X6: chr [1:585] "5337" "9295" "4035" "811" ...
##  $ X7: chr [1:582] "2621" "2665" "5690" "3608" ...
##  $ X8: chr [1:237] "2665" "4735" "1327" "3192" ...
ck <- compareCluster(geneCluster = gcSample, fun = enrichKEGG)
## Warning in utils::download.file(url, quiet = TRUE, method = method, ...): the
## 'wininet' method is deprecated for http:// and https:// URLs
## Warning in utils::download.file(url, quiet = TRUE, method = method, ...): the
## 'wininet' method is deprecated for http:// and https:// URLs
ck <- setReadable(ck, OrgDb = org.Hs.eg.db, keyType = "ENTREZID")

head(ck)
# Formula interface of compareCluster
# compareCluster() function also supports passing a formula to describe more complicated experimental designs
mydf <- data.frame(Entrez = names(geneList), FC = geneList)

mydf <- mydf[abs(mydf$FC) > 1, ]

mydf$group <- "upregulated"

mydf$group[mydf$FC < 0] <- "downregulated"

mydf$othergroup <- "A"

mydf$othergroup[abs(mydf$FC) > 2] <- "B"

formula_res <- compareCluster(Entrez ~ group + othergroup,
  data = mydf,
  fun = "enrichKEGG"
)

head(formula_res)
# Visualization of functional profile comparison
dotplot(ck)

dotplot(formula_res)

dotplot(formula_res, x = "group") +
  facet_grid(~othergroup)

# Gene-Concept Network
cnetplot(ck)
## Warning: ggrepel: 154 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Visualization of functional enrichment result

# Bar Plot
data(geneList)

de <- names(geneList)[abs(geneList) > 2]

edo <- enrichDGN(de)

barplot(edo, showCategory = 20)

mutate(edo, qscore = -log(p.adjust, base = 10)) |>
  barplot(x = "qscore")

# Dot plot
edo2 <- gseDO(geneList)
## preparing geneSet collections...
## GSEA analysis...
## Warning in fgseaMultilevel(pathways = pathways, stats = stats, minSize =
## minSize, : For some pathways, in reality P-values are less than 1e-10. You can
## set the `eps` argument to zero for better estimation.
## leading edge analysis...
## done...
dotplot(edo, showCategory = 30) +
  ggtitle("dotplot for ORA")

dotplot(edo2, showCategory = 30) +
  ggtitle("dotplot for GSEA")

# Gene-Concept Network

## convert gene ID to Symbol
edox <- setReadable(edo, "org.Hs.eg.db", "ENTREZID")

p1 <- cnetplot(edox, foldChange = geneList)
## Warning in cnetplot.enrichResult(x, ...): Use 'color.params = list(foldChange = your_value)' instead of 'foldChange'.
##  The foldChange parameter will be removed in the next version.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## categorySize can be scaled by 'pvalue' or 'geneNum'
p2 <- cnetplot(edox, categorySize = "pvalue", foldChange = geneList)
## Warning in cnetplot.enrichResult(x, ...): Use 'color.params = list(foldChange = your_value)' instead of 'foldChange'.
##  The foldChange parameter will be removed in the next version.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
p3 <- cnetplot(edox, foldChange = geneList, circular = TRUE, colorEdge = TRUE)
## Warning in cnetplot.enrichResult(x, ...): Use 'color.params = list(foldChange = your_value)' instead of 'foldChange'.
##  The foldChange parameter will be removed in the next version.
## Warning in cnetplot.enrichResult(x, ...): Use 'color.params = list(edge = your_value)' instead of 'colorEdge'.
##  The colorEdge parameter will be removed in the next version.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
cowplot::plot_grid(p1, p2, p3,
  ncol = 3,
  labels = LETTERS[1:3],
  rel_widths = c(.8, .8, 1.2)
)
## Warning: ggrepel: 14 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: ggrepel: 14 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: ggrepel: 28 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

## Labelling nodes by selected subset.
p1 <- cnetplot(edox,
  node_label = "category",
  cex_label_category = 1.2
)
## Warning in cnetplot.enrichResult(x, ...): Use 'cex.params = list(category_label = your_value)' instead of 'cex_label_category'.
##  The cex_label_category parameter will be removed in the next version.
p2 <- cnetplot(edox,
  node_label = "gene",
  cex_label_gene = 0.8
)
## Warning in cnetplot.enrichResult(x, ...): Use 'cex.params = list(gene_label = your_value)' instead of 'cex_label_gene'.
##  The cex_label_gene parameter will be removed in the next version.
p3 <- cnetplot(edox, node_label = "all")

p4 <- cnetplot(edox,
  node_label = "none",
  color_category = "firebrick",
  color_gene = "steelblue"
)
## Warning in cnetplot.enrichResult(x, ...): Use 'color.params = list(category = your_value)' instead of 'color_category'.
##  The color_category parameter will be removed in the next version.
## Warning in cnetplot.enrichResult(x, ...): Use 'color.params = list(gene = your_value)' instead of 'color_gene'.
##  The color_gene parameter will be removed in the next version.
cowplot::plot_grid(p1, p2, p3, p4, ncol = 2, labels = LETTERS[1:4])
## Warning: Removed 57 rows containing missing values or values outside the scale range
## (`geom_text_repel()`).
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_text_repel()`).
## Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: ggrepel: 25 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

##  Using cnetplot to visualize data relationships
set.seed(123)

x <- list(A = letters[1:10], B = letters[5:12], C = letters[sample(1:26, 15)])

p1 <- cnetplot(x)

set.seed(123)

d <- setNames(rnorm(26), letters)

p2 <- cnetplot(x, foldChange = d) +
  scale_color_gradient2(name = "associated data", low = "darkgreen", high = "firebrick")
## Warning in cnetplot.enrichResult(x, ...): Use 'color.params = list(foldChange = your_value)' instead of 'foldChange'.
##  The foldChange parameter will be removed in the next version.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
cowplot::plot_grid(p1, p2, ncol = 2, labels = LETTERS[1:2])

# Heatmap-like functional classification
p1 <- heatplot(edox, showCategory = 5)

p2 <- heatplot(edox, foldChange = geneList, showCategory = 5)

cowplot::plot_grid(p1, p2, ncol = 1, labels = LETTERS[1:2])

# Tree plot
edox2 <- pairwise_termsim(edox)

p1 <- treeplot(edox2)
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
p2 <- treeplot(edox2, hclust_method = "average")
## Warning in treeplot.enrichResult(x, ...): Use 'cluster.params = list(method = your_value)' instead of 'hclust_method'.
##  The hclust_method parameter will be removed in the next version.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
aplot::plot_list(p1, p2, tag_levels = "A")

# Enrichment Map
edo <- pairwise_termsim(edo)

p1 <- emapplot(edo)

p2 <- emapplot(edo, cex_category = 1.5)
## Warning in emapplot.enrichResult(x, showCategory = showCategory, ...): Use 'cex.params = list(category_node = your_value)' instead of 'cex_category'.
##  The cex_category parameter will be removed in the next version.
p3 <- emapplot(edo, layout = "kk")
## Warning in emapplot.enrichResult(x, showCategory = showCategory, ...): Use 'layout.params = list(layout = your_value)' instead of 'layout'.
##  The layout parameter will be removed in the next version.
p4 <- emapplot(edo, cex_category = 1.5, layout = "kk")
## Warning in emapplot.enrichResult(x, showCategory = showCategory, ...): Use 'layout.params = list(layout = your_value)' instead of 'layout'.
##  The layout parameter will be removed in the next version.
## Warning in emapplot.enrichResult(x, showCategory = showCategory, ...): Use 'cex.params = list(category_node = your_value)' instead of 'cex_category'.
##  The cex_category parameter will be removed in the next version.
cowplot::plot_grid(p1, p2, p3, p4, ncol = 2, labels = LETTERS[1:4])
## Warning: ggrepel: 20 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: ggrepel: 15 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: ggrepel: 28 unlabeled data points (too many overlaps). Consider increasing max.overlaps
## ggrepel: 28 unlabeled data points (too many overlaps). Consider increasing max.overlaps

# Biological theme comparison
data(gcSample)

xx <- compareCluster(gcSample,
  fun = "enrichKEGG",
  organism = "hsa", pvalueCutoff = 0.05
)

xx <- pairwise_termsim(xx)

p1 <- emapplot(xx)

p2 <- emapplot(xx, legend_n = 2)
## Warning in emapplot.compareClusterResult(x, showCategory = showCategory, : Use 'pie.params = list(legend_n = your_value)' instead of 'legend_n'.
##  The legend_n parameter will be removed in the next version.
p3 <- emapplot(xx, pie = "count")
## Warning in emapplot.compareClusterResult(x, showCategory = showCategory, : Use 'pie.params = list(pie = your_value)' instead of 'pie'.
##  The pie parameter will be removed in the next version.
p4 <- emapplot(xx, pie = "count", cex_category = 1.5, layout = "kk")
## Warning in emapplot.compareClusterResult(x, showCategory = showCategory, : Use 'pie.params = list(pie = your_value)' instead of 'pie'.
##  The pie parameter will be removed in the next version.
## Warning in emapplot.compareClusterResult(x, showCategory = showCategory, : Use 'layout.params = list(layout = your_value)' instead of 'layout'.
##  The layout parameter will be removed in the next version.
## Warning in emapplot.compareClusterResult(x, showCategory = showCategory, : Use 'cex.params = list(category_node = your_value)' instead of 'cex_category'.
##  The cex_category parameter will be removed in the next version.
cowplot::plot_grid(p1, p2, p3, p4, ncol = 2, labels = LETTERS[1:4])
## Warning: ggrepel: 34 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: ggrepel: 36 unlabeled data points (too many overlaps). Consider increasing max.overlaps
## ggrepel: 36 unlabeled data points (too many overlaps). Consider increasing max.overlaps
## Warning: ggrepel: 38 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

# UpSet Plot
# For over-representation analysis, upsetplot will calculate the overlaps among different gene sets
upsetplot(edo)

# For GSEA result, it will plot the fold change distributions of different categories (e.g. unique to pathway, overlaps among different pathways).
upsetplot(kk2)

# ridgeline plot for expression distribution of GSEA result
ridgeplot(edo2)
## Picking joint bandwidth of 0.224

# running score and preranked list of GSEA result
# gseaplot for GSEA result(by = "runningScore")
p1 <- gseaplot(edo2,
  geneSetID = 1,
  by = "runningScore",
  title = edo2$Description[1]
)

p2 <- gseaplot(edo2,
  geneSetID = 1,
  by = "preranked",
  title = edo2$Description[1]
)

p3 <- gseaplot(edo2,
  geneSetID = 1,
  title = edo2$Description[1]
)

cowplot::plot_grid(p1, p2, p3, ncol = 1, labels = LETTERS[1:3])
## Warning in as_grob.default(plot): Cannot convert object of class gglistlist
## into a grob.

# Gseaplot2 for GSEA result.
gseaplot2(edo2, geneSetID = 1, title = edo2$Description[1])

# Gseaplot2 for GSEA result of multile gene sets.
gseaplot2(edo2, geneSetID = 1:3)

# Gseaplot2 for GSEA result of multile gene sets(add pvalue_table)
gseaplot2(edo2,
  geneSetID = 1:3, pvalue_table = TRUE,
  color = c("#E495A5", "#86B875", "#7DB0DD"), ES_geom = "dot"
)

# Gseaplot2 for GSEA result of multile gene sets(add subplots)
p1 <- gseaplot2(edo2, geneSetID = 1:3, subplots = 1)

p2 <- gseaplot2(edo2, geneSetID = 1:3, subplots = 1:2)

cowplot::plot_grid(p1, p2, ncol = 1, labels = LETTERS[1:2])
## Warning in as_grob.default(plot): Cannot convert object of class gglistlist
## into a grob.

# Ranked list of genes belong to the specific gene set.
gsearank(edo2, 1, title = edo2[1, "Description"])

# Gsearank for multiple gene sets
pp <- lapply(1:3, function(i) {
  anno <- edo2[i, c("NES", "pvalue", "p.adjust")]
  lab <- paste0(names(anno), "=", round(anno, 3), collapse = "\n")

  gsearank(edo2, i, edo2[i, 2]) + xlab(NULL) + ylab(NULL) +
    annotate("text", 10000, edo2[i, "enrichmentScore"] * .75, label = lab, hjust = 0, vjust = 0)
})

plot_grid(plotlist = pp, ncol = 1)

# pubmed trend of enriched terms
terms <- edo$Description[1:5]

p <- pmcplot(terms, 2010:2024)

p2 <- pmcplot(terms, 2010:2024, proportion = FALSE)

plot_grid(p, p2, ncol = 2)

Manipulating enrichment result

data(geneList)

de <- names(geneList)[1:100]

x <- enrichDO(de)

# Filter
filter(x, p.adjust < .05, qvalue < 0.2)
## #
## # over-representation test
## #
## #...@organism     Homo sapiens 
## #...@ontology     DO 
## #...@keytype      ENTREZID 
## #...@gene     chr [1:100] "4312" "8318" "10874" "55143" "55388" "991" "6280" "2305" ...
## #...pvalues adjusted by 'BH' with cutoff <0.05 
## #...77 enriched terms found
## 'data.frame':    77 obs. of  9 variables:
##  $ ID         : chr  "DOID:11054" "DOID:1107" "DOID:10534" "DOID:10907" ...
##  $ Description: chr  "urinary bladder cancer" "esophageal carcinoma" "stomach cancer" "microcephaly" ...
##  $ GeneRatio  : chr  "11/84" "10/84" "13/84" "6/84" ...
##  $ BgRatio    : chr  "246/10312" "219/10312" "416/10312" "76/10312" ...
##  $ pvalue     : num  4.55e-06 1.06e-05 2.86e-05 3.38e-05 3.83e-05 ...
##  $ p.adjust   : num  0.00235 0.00275 0.00311 0.00311 0.00311 ...
##  $ qvalue     : num  0.00176 0.00206 0.00233 0.00233 0.00233 ...
##  $ geneID     : chr  "4312/6280/4605/6279/7153/6241/983/5080/332/2146/6790" "4312/6280/3868/6279/8140/890/5918/332/2146/4321" "4312/2305/6279/10403/7153/259266/8140/890/4085/81930/332/2146/3887" "1062/23397/259266/4085/5080/1111" ...
##  $ Count      : int  11 10 13 6 10 10 7 10 8 6 ...
## #...Citation
##   Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an
##   R/Bioconductor package for Disease Ontology Semantic and Enrichment
##   analysis. Bioinformatics 2015, 31(4):608-609
# Arrange
mutate(x,
  geneRatio = parse_ratio(GeneRatio)
) |>
  arrange(desc(geneRatio))
## #
## # over-representation test
## #
## #...@organism     Homo sapiens 
## #...@ontology     DO 
## #...@keytype      ENTREZID 
## #...@gene     chr [1:100] "4312" "8318" "10874" "55143" "55388" "991" "6280" "2305" ...
## #...pvalues adjusted by 'BH' with cutoff <0.05 
## #...77 enriched terms found
## 'data.frame':    77 obs. of  10 variables:
##  $ ID         : chr  "DOID:10534" "DOID:3459" "DOID:11054" "DOID:2394" ...
##  $ Description: chr  "stomach cancer" "breast carcinoma" "urinary bladder cancer" "ovarian cancer" ...
##  $ GeneRatio  : chr  "13/84" "12/84" "11/84" "11/84" ...
##  $ BgRatio    : chr  "416/10312" "493/10312" "246/10312" "441/10312" ...
##  $ pvalue     : num  2.86e-05 6.09e-04 4.55e-06 8.55e-04 1.06e-05 ...
##  $ p.adjust   : num  0.00311 0.01199 0.00235 0.01474 0.00275 ...
##  $ qvalue     : num  0.00233 0.00898 0.00176 0.01105 0.00206 ...
##  $ geneID     : chr  "4312/2305/6279/10403/7153/259266/8140/890/4085/81930/332/2146/3887" "4312/6280/6279/7153/6241/4751/890/4085/332/6286/2146/6790" "4312/6280/4605/6279/7153/6241/983/5080/332/2146/6790" "4312/259266/820/983/10232/6362/332/6286/9212/4321/6790" ...
##  $ Count      : int  13 12 11 11 10 10 10 10 10 9 ...
##  $ geneRatio  : num  0.155 0.143 0.131 0.131 0.119 ...
## #...Citation
##   Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an
##   R/Bioconductor package for Disease Ontology Semantic and Enrichment
##   analysis. Bioinformatics 2015, 31(4):608-609
# Select
select(x, -geneID) |> head()
# Mutate
# k/M
y <- mutate(x,
  richFactor = Count / as.numeric(sub("/\\d+", "", BgRatio))
)

y
## #
## # over-representation test
## #
## #...@organism     Homo sapiens 
## #...@ontology     DO 
## #...@keytype      ENTREZID 
## #...@gene     chr [1:100] "4312" "8318" "10874" "55143" "55388" "991" "6280" "2305" ...
## #...pvalues adjusted by 'BH' with cutoff <0.05 
## #...77 enriched terms found
## 'data.frame':    77 obs. of  10 variables:
##  $ ID         : chr  "DOID:11054" "DOID:1107" "DOID:10534" "DOID:10907" ...
##  $ Description: chr  "urinary bladder cancer" "esophageal carcinoma" "stomach cancer" "microcephaly" ...
##  $ GeneRatio  : chr  "11/84" "10/84" "13/84" "6/84" ...
##  $ BgRatio    : chr  "246/10312" "219/10312" "416/10312" "76/10312" ...
##  $ pvalue     : num  4.55e-06 1.06e-05 2.86e-05 3.38e-05 3.83e-05 ...
##  $ p.adjust   : num  0.00235 0.00275 0.00311 0.00311 0.00311 ...
##  $ qvalue     : num  0.00176 0.00206 0.00233 0.00233 0.00233 ...
##  $ geneID     : chr  "4312/6280/4605/6279/7153/6241/983/5080/332/2146/6790" "4312/6280/3868/6279/8140/890/5918/332/2146/4321" "4312/2305/6279/10403/7153/259266/8140/890/4085/81930/332/2146/3887" "1062/23397/259266/4085/5080/1111" ...
##  $ Count      : int  11 10 13 6 10 10 7 10 8 6 ...
##  $ richFactor : num  0.0447 0.0457 0.0312 0.0789 0.0394 ...
## #...Citation
##   Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an
##   R/Bioconductor package for Disease Ontology Semantic and Enrichment
##   analysis. Bioinformatics 2015, 31(4):608-609
# Visualizing rich factor of enriched terms using lolliplot.
ggplot(y,
  showCategory = 20,
  aes(richFactor, fct_reorder(Description, richFactor))
) +
  geom_segment(aes(xend = 0, yend = Description)) +
  geom_point(aes(color = p.adjust, size = Count)) +
  scale_color_viridis_c(guide = guide_colorbar(reverse = TRUE)) +
  scale_size_continuous(range = c(2, 10)) +
  theme_minimal() +
  xlab("rich factor") +
  ylab(NULL) +
  ggtitle("Enriched Disease Ontology")

# Slice
x <- gsePathway(geneList)
## preparing geneSet collections...
## GSEA analysis...
## Warning in fgseaMultilevel(pathways = pathways, stats = stats, minSize =
## minSize, : For some pathways, in reality P-values are less than 1e-10. You can
## set the `eps` argument to zero for better estimation.
## leading edge analysis...
## done...
y <- arrange(x, abs(NES)) |>
  group_by(.add = sign(NES)) |>
  slice(1:5)
## Warning: The `add` argument of `group_by()` is deprecated as of dplyr 1.0.0.
## ℹ Please use the `.add` argument instead.
## ℹ The deprecated feature was likely used in the dplyr package.
##   Please report the issue at <https://github.com/tidyverse/dplyr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
y <- arrange(x, abs(NES)) |>
  group_by(sign(NES)) |>
  slice(1:5)

# ggplot(y, aes(NES, fct_reorder(Description, NES), fill = qvalues),
#   showCategory = 10
# ) +
#   geom_col(orientation = "y") +
#   scale_fill_continuous(
#     low = "red", high = "blue",
#     guide = guide_colorbar(reverse = TRUE)
#   ) +
#   theme_minimal() +
#   ylab(NULL)

# Summarise
pbar <- function(x) {
  pi=seq(0, 1, length.out=11)

  mutate(x, pp = cut(p.adjust, pi)) %>%
    group_by(pp) %>% 
    summarise(cnt = n()) %>% 
    ggplot(aes(pp, cnt)) + geom_col() + 
    theme_minimal() +
    xlab("p value intervals") +
    ylab("Frequency") + 
    ggtitle("p value distribution")
}    

x <- enrichDO(de, pvalueCutoff=1, qvalueCutoff=1)

set.seed(2020-09-10)

random_genes <- sample(names(geneList), 100)

y <- enrichDO(random_genes, pvalueCutoff=1, qvalueCutoff=1)

p1 <- pbar(x)
p2 <- pbar(y)

cowplot::plot_grid(p1, p2, ncol=1, labels = LETTERS[1:2])

Prepare geneList

GSEA analysis requires a ranked gene list, which contains three features:

numeric vector: fold change or other type of numerical variable
named vector: every number has a name, the corresponding gene ID
sorted vector: number should be sorted in decreasing order

If you import your data from a csv file, the file should contains two columns, one for gene ID (no duplicated ID allowed) and another one for fold change. You can prepare your own geneList via the following command:

Showing specific pathways

data(geneList)
de <- names(geneList)[1:100]

x <- enrichDO(de)

## show top 10 most significant pathways and want to exclude the second one
## dotplot(x, showCategory = x$Description[1:10][-2])

set.seed(123)
selected_pathways <- sample(x$Description, 10)
selected_pathways
##  [1] "esophagus adenocarcinoma"               
##  [2] "aortic disease"                         
##  [3] "bronchiolitis obliterans"               
##  [4] "constipation"                           
##  [5] "endometrial cancer"                     
##  [6] "lymph node disease"                     
##  [7] "acute myeloid leukemia"                 
##  [8] "myasthenia gravis"                      
##  [9] "malignant pleural mesothelioma"         
## [10] "gastrointestinal system benign neoplasm"
p1 <- dotplot(x, showCategory = 10, font.size=14)
p2 <- dotplot(x, showCategory = selected_pathways, font.size=14)


cowplot::plot_grid(p1, p2, labels=LETTERS[1:2])

# extract genes of a specific term/pathway
id <- x$ID[1:3]
id
## [1] "DOID:11054" "DOID:1107"  "DOID:10534"
x[[id[1]]]
##  [1] "4312" "6280" "4605" "6279" "7153" "6241" "983"  "5080" "332"  "2146"
## [11] "6790"
geneInCategory(x)[id]
## $`DOID:11054`
##  [1] "4312" "6280" "4605" "6279" "7153" "6241" "983"  "5080" "332"  "2146"
## [11] "6790"
## 
## $`DOID:1107`
##  [1] "4312" "6280" "3868" "6279" "8140" "890"  "5918" "332"  "2146" "4321"
## 
## $`DOID:10534`
##  [1] "4312"   "2305"   "6279"   "10403"  "7153"   "259266" "8140"   "890"   
##  [9] "4085"   "81930"  "332"    "2146"   "3887"
# Wrap long axis labels
y <- enrichPathway(de)

p1 <- dotplot(y, label_format = 20) 

p2 <- dotplot(y, label_format = function(x) stringr::str_wrap(x, width=20))

cowplot::plot_grid(p1, p2, ncol=2, labels=c("A", "B")) 

# # Gene set enrichment analysis (GSEA) using clusterProfiler and Pathview
# mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
#
# mart <- useEnsembl(biomart = "ensembl",
#                      dataset = "hsapiens_gene_ensembl")
#
# genes <- getBM(
#   filters = "chromosome_name",
#   attributes = "ensembl_gene_id",
#   values = all_genes,
#   mart = mart
# )
#
#
# indNA <- which(is.na(genes$entrezgene_id))
#
# genes_noNA <- genes[-indNA, ]
#
# indnodup <- which(duplicated(genes_noNA$ entrezgene_id) == F)
#
# genes_noNA_nodup <- genes_noNA[indnodup, ]
#
# lFC <- res$log2FoldChange[-indNA]
#
# lFC <- lFC[indnodup]
#
# names(lFC) <- genes_noNA_nodup$entrezgene_id
#
# # Sort fold changes in decreasing order
# lFC <- sort(lFC, decreasing = TRUE)
#
#
# # Perform the GSEA using KEGG gene sets:
# gseaKEGG <- gseKEGG(
#   geneList = lFC,
#   organism = "mmu",
#   nPerm = 1000, # default number permutations
#   minGSSize = 5, # minimum gene set size
#   pvalueCutoff = 0.1, # padj cutoff value
#   verbose = FALSE
# )
#
# # Extract the GSEA results
# gseaKEGG_results <- gseaKEGG@result

Session Info

sessionInfo()
## R version 4.3.3 (2024-02-29 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] grid      stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] graphite_1.46.0             conflicted_1.2.0           
##  [3] lubridate_1.9.3             forcats_1.0.0              
##  [5] stringr_1.5.1               purrr_1.0.2                
##  [7] readr_2.1.5                 tidyr_1.3.1                
##  [9] tibble_3.2.1                tidyverse_2.0.0            
## [11] ggstance_0.3.7              europepmc_0.4.3            
## [13] cowplot_1.1.3               ggupset_0.3.0              
## [15] ReactomePA_1.44.0           biomaRt_2.56.1             
## [17] enrichplot_1.20.3           ensembldb_2.24.1           
## [19] AnnotationFilter_1.24.0     GenomicFeatures_1.52.2     
## [21] AnnotationHub_3.8.0         BiocFileCache_2.8.0        
## [23] dbplyr_2.5.0                clusterProfiler_4.8.3      
## [25] pathview_1.40.0             DOSE_3.26.2                
## [27] org.Hs.eg.db_3.17.0         org.Mm.eg.db_3.17.0        
## [29] AnnotationDbi_1.62.2        qvalue_2.32.0              
## [31] ggvenn_0.1.10               dplyr_1.1.4                
## [33] SparseArray_1.0.12          S4Arrays_1.0.6             
## [35] abind_1.4-5                 Matrix_1.6-5               
## [37] DT_0.33                     gridExtra_2.3              
## [39] EnhancedVolcano_1.18.0      ggrepel_0.9.5              
## [41] ggplot2_3.5.0               RColorBrewer_1.1-3         
## [43] pheatmap_1.0.12             ashr_2.2-63                
## [45] apeglm_1.22.1               vsn_3.68.0                 
## [47] DESeq2_1.40.2               SummarizedExperiment_1.30.2
## [49] Biobase_2.60.0              MatrixGenerics_1.12.3      
## [51] matrixStats_1.3.0           GenomicRanges_1.52.1       
## [53] GenomeInfoDb_1.36.4         IRanges_2.34.1             
## [55] S4Vectors_0.38.2            BiocGenerics_0.46.0        
## [57] prettydoc_0.4.1            
## 
## loaded via a namespace (and not attached):
##   [1] fs_1.6.3                      ProtGenerics_1.32.0          
##   [3] bitops_1.0-7                  HDO.db_0.99.1                
##   [5] httr_1.4.7                    Rgraphviz_2.44.0             
##   [7] numDeriv_2016.8-1.1           tools_4.3.3                  
##   [9] utf8_1.2.4                    R6_2.5.1                     
##  [11] mgcv_1.9-1                    lazyeval_0.2.2               
##  [13] withr_3.0.0                   prettyunits_1.2.0            
##  [15] preprocessCore_1.62.1         cli_3.6.2                    
##  [17] scatterpie_0.2.2              labeling_0.4.3               
##  [19] sass_0.4.9                    KEGGgraph_1.60.0             
##  [21] SQUAREM_2021.1                mvtnorm_1.2-4                
##  [23] ggridges_0.5.6                mixsqp_0.3-54                
##  [25] Rsamtools_2.16.0              yulab.utils_0.1.4            
##  [27] gson_0.1.0                    invgamma_1.1                 
##  [29] bbmle_1.0.25.1                limma_3.56.2                 
##  [31] rstudioapi_0.16.0             RSQLite_2.3.6                
##  [33] generics_0.1.3                gridGraphics_0.5-1           
##  [35] BiocIO_1.10.0                 GO.db_3.17.0                 
##  [37] fansi_1.0.6                   lifecycle_1.0.4              
##  [39] yaml_2.3.8                    blob_1.2.4                   
##  [41] promises_1.3.0                crayon_1.5.2                 
##  [43] bdsmatrix_1.3-7               lattice_0.22-5               
##  [45] KEGGREST_1.40.1               pillar_1.9.0                 
##  [47] knitr_1.46                    fgsea_1.26.0                 
##  [49] rjson_0.2.21                  codetools_0.2-19             
##  [51] fastmatch_1.1-4               glue_1.7.0                   
##  [53] downloader_0.4                ggfun_0.1.4                  
##  [55] data.table_1.15.4             vctrs_0.6.5                  
##  [57] png_0.1-8                     treeio_1.24.3                
##  [59] urltools_1.7.3                gtable_0.3.4                 
##  [61] emdbook_1.3.13                cachem_1.0.8                 
##  [63] xfun_0.43                     mime_0.12                    
##  [65] tidygraph_1.3.1               coda_0.19-4.1                
##  [67] interactiveDisplayBase_1.38.0 nlme_3.1-164                 
##  [69] ggtree_3.8.2                  bit64_4.0.5                  
##  [71] progress_1.2.3                filelock_1.0.3               
##  [73] bslib_0.7.0                   affyio_1.70.0                
##  [75] irlba_2.3.5.1                 colorspace_2.1-0             
##  [77] DBI_1.2.2                     tidyselect_1.2.1             
##  [79] bit_4.0.5                     compiler_4.3.3               
##  [81] curl_5.2.1                    graph_1.78.0                 
##  [83] xml2_1.3.6                    DelayedArray_0.26.7          
##  [85] triebeard_0.4.1               shadowtext_0.1.3             
##  [87] rtracklayer_1.60.1            scales_1.3.0                 
##  [89] affy_1.78.2                   rappdirs_0.3.3               
##  [91] digest_0.6.35                 rmarkdown_2.26               
##  [93] XVector_0.40.0                htmltools_0.5.8.1            
##  [95] pkgconfig_2.0.3               highr_0.10                   
##  [97] fastmap_1.1.1                 rlang_1.1.3                  
##  [99] htmlwidgets_1.6.4             shiny_1.8.1.1                
## [101] farver_2.1.1                  jquerylib_0.1.4              
## [103] jsonlite_1.8.8                BiocParallel_1.34.2          
## [105] GOSemSim_2.26.1               RCurl_1.98-1.14              
## [107] magrittr_2.0.3                GenomeInfoDbData_1.2.10      
## [109] ggplotify_0.1.2               patchwork_1.2.0              
## [111] munsell_0.5.1                 Rcpp_1.0.12                  
## [113] ggnewscale_0.4.10             ape_5.8                      
## [115] viridis_0.6.5                 stringi_1.8.3                
## [117] ggraph_2.2.1                  zlibbioc_1.46.0              
## [119] MASS_7.3-60.0.1               plyr_1.8.9                   
## [121] parallel_4.3.3                Biostrings_2.68.1            
## [123] graphlayouts_1.1.1            splines_4.3.3                
## [125] hms_1.1.3                     locfit_1.5-9.9               
## [127] igraph_2.0.3                  reshape2_1.4.4               
## [129] BiocVersion_3.17.1            XML_3.99-0.16.1              
## [131] evaluate_0.23                 BiocManager_1.30.22          
## [133] tzdb_0.4.0                    tweenr_2.0.3                 
## [135] httpuv_1.6.15                 polyclip_1.10-6              
## [137] ggforce_0.4.2                 xtable_1.8-4                 
## [139] restfulr_0.0.15               reactome.db_1.84.0           
## [141] tidytree_0.4.6                later_1.3.2                  
## [143] viridisLite_0.4.2             snow_0.4-4                   
## [145] truncnorm_1.0-9               aplot_0.2.2                  
## [147] memoise_2.0.1                 GenomicAlignments_1.36.0     
## [149] timechange_0.3.0