1简介

在单细胞RNA-seq分析中,基因签名(或“模块”)评分构成了一种简单而强大的方法来评估生物信号的强度,通常与转录组中特定的细胞类型或生物过程相关。

UCell是一个R包,用于评估单细胞数据集中的基因签名。UCell签名分数基于Mann-Whitney U统计量,对数据集大小和异构性具有鲁棒性,其计算所需的计算时间和内存比其他可用方法更少,即使在计算能力有限的机器上,也可以在几分钟内处理大型数据集。UCell可以应用于任何单单元数据矩阵,并包括直接与Seurat对象交互的函数。

2快速启动

要测试您的安装,请加载一个小型示例数据集并运行UCell:

库(UCell)数据(sample.matrix)基因。sets <- list(Tcell_signature = c("CD2","CD3E","CD3D"), Myeloid_signature = c("SPI1","FCER1G","CSF1R")) scores <- ScoreSignatures_UCell(样本。矩阵,特征=gene.sets)头(分数)
Tcell_signature_UCell Myeloid_signature_UCell ## L4_TCACTATTCATCTCTA 0.6904444 0 ## L1_TCCTTCTTCTTTACAC 0.5935556 0 # L5_AAAGTGAAGGCGCTCT 0.6093333 0 # E2L3_CCTCAGTAGTGCAGGT 0.8508889 0 # L5_CCCTCTCGTTCTAAGC 0.6562222 0

3.获取一些测试数据

在这个演示中,我们将下载肺癌的单细胞数据集(Zilionis等人(2019)免疫)透过scRNA-seq包中。此数据集包含>170,000个单细胞;为了简单起见,在这个演示中,根据作者的注释,我们将专注于免疫细胞,并将样本减少到5000个细胞。

library(scRNAseq) lung <- ZilionisLungData() immune <- lung$Used & lung$used_in_NSCLC_immune lung <- lung[,immune] lung <- lung[,1:5000] exp.mat <- Matrix::Matrix(counts(lung),sparse = TRUE)

4定义基因特征

在这里,我们定义了一些简单的基因集基于“人类细胞景观”签名Han et al.(2020)自然.您可以编辑现有签名,或在列表中添加新签名作为元素。

签名< -列表(Tcell = c(“CD3D”、“CD3E”,“CD3G”,“张”,“TRAC”),骨髓= c(“CD14”、“LYZ”,“CSF1R”、“FCER1G”、“SPI1”、“LCK——”),NK = c(“KLRD1”、“NCAM1”、“NKG7”、“CD3D——”、“CD3E——”),Plasma_cell = c(“IGKC”、“IGHG3”、“IGHG1”、“IGHA1”、“CD19 -”))

5运行UCell

运行ScoreSignatures_UCell直接得到所有细胞的特征分数

u.scores <- ScoreSignatures_UCell(exp.mat,features=签名)head(u.scores)
## Tcell_UCell Myeloid_UCell NK_UCell Plasma_cell_UCell ## bcHTNA 0 0.5234000 0 0.00000000 ## bcHNVA 0 0.5120000 0.01991667 ## bcALZN 0 0.3593333 0 0.00000000 ## bcFWBP 0 0.1558000 0 0.00000000 ## bcBJYE 0 0.4639333 0 0.00000000 ## bcGSBJ 0 0.5460000 0 0.00000000

显示预测分数的分布

library(reshape2) library(ggplot2) melting <- reshape2::melt(u.s scores) colnames(melting) <- c("Cell","Signature","UCell_score") p <- ggplot(melting, aes(x=Signature, y=UCell_score)) + geom_violin(aes(fill=Signature), scale = "width") + geom_boxplot(width=0.1, outlier.size=0) + theme_bw() + theme(轴。text.x=element_blank()) p

6预计算基因排名

UCell中需要时间和内存的步骤是计算每个单个细胞的基因排名。如果我们计划对基因签名进行实验,编辑它们或添加新的细胞亚型,就有可能一次性预先计算出基因排名,然后在这些预先计算的排名上应用新的签名。运行StoreRankings_UCell在数据集上预先计算基因排名的函数:

set.seed(123) rank <- StoreRankings_UCell(exp.mat) rank [1:5,1:5]
## 5 x 5稀疏矩阵类“dgCMatrix”. . . . .## 5_8S_rRNA . . . . .## 7sk . . . . .## a1bg . . . . .## a1bg-as1 . . . . .

然后,我们可以应用我们的签名集,或任何其他新的签名到预先计算的秩。计算速度会快得多。

u.scores set.seed(123)。2<- ScoreSignatures_UCell(features=signatures, precalc.ranks = ranks) melted <- reshape2::melt(u.scores.2) colnames(melted) <- c("Cell","Signature","UCell_score") p <- ggplot(melted, aes(x=Signature, y=UCell_score)) + geom_violin(aes(fill=Signature), scale = "width") + geom_boxplot(width=0.1, outlier.size = 0) + theme_bw() + theme(axis.text.x=element_blank()) p

新的。签名<- list(桅杆。细胞= c(“TPSAB1”、“TPSB2”,“CPA3”,“MS4A2”),淋巴= c u.scores(“LCK”))。3 <- ScoreSignatures_UCell(features=new。签名,precalc。ranks = ranks)熔化<- reshape2::熔化(u.s les .3) colnames(熔化)<- c("Cell","Signature","UCell_score") p <- ggplot(熔化,aes(x=Signature, y=UCell_score)) + geom_violin(aes(fill=Signature), scale = "width") + geom_boxplot(width=0.1, outlier.size=0) + theme_bw() + theme(轴。text.x=element_blank()) p

7多核处理

如果您的计算机具有多核功能和足够的RAM,并行运行UCell可以大大加快分析速度。下面的例子运行在一个单核上-你可以通过设置例如。工人= 4平行于4个核:

BPPARAM <- BiocParallel:: multicoream (workers=1) .scores <- ScoreSignatures_UCell(exp. BPPARAM:: multicoream (workers=1)垫,特性=签名,BPPARAM = BPPARAM)

8与singlecelexperiment或Seurat交互

SingleCellExperiment而且修拉是单细胞分析的流行环境。UCell包实现了直接与这些管道交互的功能,如上提供的专用演示所述个人简介登陆页面

9资源

如有任何问题,请到UCell GitHub仓库

更多的演示可在个人简介登陆页面UCell演示库

如果您发现UCell有用,您还可以查看scGate包该系统依赖于UCell分数,根据基因签名自动纯化感兴趣的种群。

另请参阅SignatuR便于基因标记的存储和检索。

10参考文献

  • 安德烈塔,M,卡莫纳,s.j. (2021)UCell:稳健和可扩展的单细胞基因签名评分计算与结构生物技术杂志
  • 齐利奥尼斯,R.,英格布洛姆,C.,…,克莱因,M. (2019)人类和小鼠肺癌的单细胞转录组学揭示了个体和物种间保守的髓系种群免疫力

11会话信息

sessionInfo ()
## R版本4.2.1(2022-06-23)##平台:x86_64-pc-linux-gnu(64位)##运行在:Ubuntu 20.04.4 LTS ## ##矩阵产品:默认## BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas。/home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack。所以## ## locale: ## [1] LC_CTYPE=en_US。UTF-8 LC_NUMERIC= c# # [3] LC_TIME=en_GB LC_COLLATE= c# # [5] LC_MONETARY=en_US。utf - 8 LC_MESSAGES = en_US。UTF-8 ## [7] LC_PAPER=en_US。UTF-8 LC_NAME= c# # [9] LC_ADDRESS=C lc_phone = c# # [11] LC_MEASUREMENT=en_US。UTF-8 LC_IDENTIFICATION=C ## ##附加的基本包:## [1]stats4 stats graphics grDevices utils datasets methods ##[8]基础## ##其他附加包:# # # # [1] reshape2_1.4.4 scater_1.25.2 [3] scuttle_1.7.2 patchwork_1.1.1 # # [5] ggplot2_3.3.6 sp_1.5-0 # # [7] SeuratObject_4.1.0 Seurat_4.1.1 # # [9] UCell_2.1.2 scRNAseq_2.11.0 # # [11] SingleCellExperiment_1.19.0 SummarizedExperiment_1.27.1 # # [13] Biobase_2.57.1 GenomicRanges_1.49.0 # # [15] GenomeInfoDb_1.33.3 IRanges_2.31.0 # # [17] S4Vectors_0.35.1 BiocGenerics_0.43.1 # # [19] MatrixGenerics_1.9.1 matrixStats_0.62.0 # # [21] BiocStyle_2.25.0 # # # #通过加载一个名称空间(而不是附加):# # # # [1] utf8_1.2.2 reticulate_1.25 [3] tidyselect_1.1.2 RSQLite_2.2.15 # # [5] AnnotationDbi_1.59.1 htmlwidgets_1.5.4 # # [7] grid_4.2.1 BiocParallel_1.31.12 # # [9] Rtsne_0.16 ScaledMatrix_1.5.0 # # [11] munsell_0.5.0 codetools_0.2-18 # # [13] ica_1.0-3 future_1.27.0 # # [15] miniUI_0.1.1.1 withr_2.5.0 # # [17] spatstat.random_2.2-0 colorspace_2.0-3 # # [19] progressr_0.10.1 filelock_1.0.2 # # [21] highr_0.9 knitr_1.39 # # [23] ROCR_1.0-11 tensor_1.5 # # [25] listenv_0.8.0 labeling_0.4.2 # # [27]GenomeInfoDbData_1.2.8 polyclip_1.10-0 # # [29] farver_2.1.1 bit64_4.0.5 # # [31] parallelly_1.32.1 vctrs_0.4.1 # # [33] generics_0.1.3 xfun_0.31 # # [35] BiocFileCache_2.5.0 R6_2.5.1 # # [37] ggbeeswarm_0.6.0 rsvd_1.0.5 # # [39] AnnotationFilter_1.21.0 bitops_1.0-7 # # [41] spatstat.utils_2.3-1 cachem_1.0.6 # # [43] DelayedArray_0.23.1 assertthat_0.2.1 # # [45] promises_1.2.0.1 BiocIO_1.7.1 # # [47] scales_1.2.0 beeswarm_0.4.0 # # [49] rgeos_0.5-9 gtable_0.3.0 # # [51] beachmat_2.13.4 globals_0.16.0 # #[53] goftest_1.2-3 ensembldb_2.21.3 # # [55] rlang_1.0.4 splines_4.2.1 # # [57] rtracklayer_1.57.0 lazyeval_0.2.2 # # [59] spatstat.geom_2.4-0 BiocManager_1.30.18 # # [61] yaml_2.3.5 abind_1.4-5 # # [63] GenomicFeatures_1.49.5 httpuv_1.6.5 # # [65] tools_4.2.1 bookdown_0.27 # # [67] ellipsis_0.3.2 spatstat.core_2.4-4 # # [69] jquerylib_0.1.4 RColorBrewer_1.1-3 # # [71] ggridges_0.5.3 Rcpp_1.0.9 # # [73] plyr_1.8.7 sparseMatrixStats_1.9.0 # # [75] progress_1.2.2 zlibbioc_1.43.0 # # [77] purrr_0.3.4RCurl_1.98-1.8 ## [79] prettyunits_1.1.1 rpart_4.1.16 ## [83] pbapply_1.0 -6 ggrepel_0.9.1 ## [87] cluster_2.1.3 magrittr_2.0.3 ## [89] magick_2.7.3 rspectra_0.16 .1 ## [91] data.table_1.14.2 scattermore_0.8 ## [93] lmtest_0.9-40 RANN_2.6.1 ## [95] ProtGenerics_1.29.0 fitdistrplus_1. 1.1-8 ## [97] hms_1.1.1 mime_0.12 ## [101] XML_3.99-0.10 gridExtra_2.3 ## [103] compiler_4.2.1 biomaRt_2.53.2 ## [105] tibble_3.1.8 KernSmooth_2.23-20 ## [107] crayon_1.5.1 htmltools_0.5.3 ## [109] mgcv_1.8-40 later_1.3.0 ## [111] tidyr_1.2.0 DBI_1.1.3 ## [113] ExperimentHub_2.5.0 dbplyr_2.2.1 ## [115] MASS_7.3-58.1 rappdirs_0.3.3 ## [117] Matrix_1.4-1 cli_3.3.0 ## [119] parallel_4.2.1 igraph_1.3.4 ## [121] pkgconfig_2.0.3 GenomicAlignments_1.33.1 ## [123] plotly_4.10.0 spatstat.sparse_2.1-1 ## [125] xml2_1.3.3 vipor_0.4.5 ## [127] bslib_0.4.0 XVector_0.37.0 ## [129] stringr_1.4.0 digest_0.6.29 ## [131] sctransform_0.3.3 RcppAnnoy_0.0.19 ## [133] spatstat.data_2.2-0 Biostrings_2.65.1 ## [135] rmarkdown_2.14 leiden_0.4.2 ## [137] uwot_0.1.11 DelayedMatrixStats_1.19.0 ## [139] restfulr_0.0.15 curl_4.3.2 ## [141] shiny_1.7.2 Rsamtools_2.13.3 ## [143] rjson_0.2.21 nlme_3.1-158 ## [145] lifecycle_1.0.1 jsonlite_1.8.0 ## [147] BiocNeighbors_1.15.1 viridisLite_0.4.0 ## [149] fansi_1.0.3 pillar_1.8.0 ## [151] lattice_0.20-45 KEGGREST_1.37.3 ## [153] fastmap_1.1.0 httr_1.4.3 ## [155] survival_3.3-1 interactiveDisplayBase_1.35.0 ## [157] glue_1.6.2 png_0.1-7 ## [159] BiocVersion_3.16.0 bit_4.0.4 ## [161] stringi_1.7.8 sass_0.4.2 ## [163] blob_1.2.3 BiocSingular_1.13.0 ## [165] AnnotationHub_3.5.0 memoise_2.0.1 ## [167] dplyr_1.0.9 irlba_2.3.5 ## [169] future.apply_1.9.0