内容

1为MultiAssayExperiment集成HDF5后端

1.1依赖关系

库(MultiAssayExperiment)库(HDF5Array)库(summarizeexperiment)

1.2HDF5Array和DelayedArray构造函数

HDF5Array包提供了大型数据集的磁盘表示,而不需要将它们加载到内存中。方便的惰性求值操作允许用户基于元数据操作如此大的数据文件。的DelayedMatrixDelayedArray包提供了一种连接到存储在磁盘上的大型矩阵的方法。

首先,我们创建一个用于构造的小矩阵DelayedMatrix类。

smallMatrix <- matrix(rnorm(10e5), ncol = 20)

类的行名和列名添加到矩阵对象中,以便与MultiAssayExperiment表示。

rownames(smallMatrix) <- paste0("GENE", seq_len(nrow(smallMatrix))) colnames(smallMatrix) <- paste0("SampleID", seq_len(ncol(smallMatrix))))

这里我们使用DelayedArray构造函数来创建DelayedMatrix对象。

<- DelayedArray(smallMatrix)类
## [1] "DelayedMatrix" ## attr(,"package") ## [1] "DelayedArray"
#显示方法smallMatrix
## <50000 x 20>矩阵类DelayedMatrix和类型“double”:## SampleID1 SampleID2 SampleID3…SampleID19 SampleID20 ## GENE1 -0.4312294 1.4409797 -2.0504121。0.7351786 -1.5447315 ## gene2 0.4393512 0.9604603 -0.3668742。0.7862599 1.2339122 gene3 0.1843361 -1.0383368 -0.4712812。-0.7851665 -3.1241170基因4 -0.4610344 -0.5515895 -0.5869010。0.6907843 0.4806580 ## gene5 -0.2497588 0.1431237 -1.4726718。1.0122940 0.5614055 ## ... ... ...## gene49996 0.65586323 1.38831001 -0.17642264。0.20264386 1.68516901 ## gene49997 -0.25786834 0.35891334 -0.79591330。-0.86248700 -0.78932440基因49998 -1.46505565 -0.90918203 -0.71795594。 -1.26711203 -1.03178018 ## GENE49999 -0.22687653 -0.31894112 -1.83867213 . -0.45287311 0.09292224 ## GENE50000 -0.26584102 1.52669880 0.07709258 . 0.24782881 -2.19351922
暗(smallMatrix)
## [1] 50000 20

1.3写入带有dimnames的文件

最后,rhdf5包店dimnames在一个标准的地点。

为了利用此功能,我们将使用writeHDF5Arraywith.dimnames论点:

test5 <- tempfile(fileext = ".h5") writeHDF5Array(smallMatrix, filepath = test5, name = "smallMatrix", with。dimnames = TRUE)
## <50000 x 20>矩阵类HDF5Matrix和类型“double”:## SampleID1 SampleID2 SampleID3…SampleID19 SampleID20 ## GENE1 -0.4312294 1.4409797 -2.0504121。0.7351786 -1.5447315 ## gene2 0.4393512 0.9604603 -0.3668742。0.7862599 1.2339122 gene3 0.1843361 -1.0383368 -0.4712812。-0.7851665 -3.1241170基因4 -0.4610344 -0.5515895 -0.5869010。0.6907843 0.4806580 ## gene5 -0.2497588 0.1431237 -1.4726718。1.0122940 0.5614055 ## ... ... ...## gene49996 0.65586323 1.38831001 -0.17642264。0.20264386 1.68516901 ## gene49997 -0.25786834 0.35891334 -0.79591330。-0.86248700 -0.78932440基因49998 -1.46505565 -0.90918203 -0.71795594。 -1.26711203 -1.03178018 ## GENE49999 -0.22687653 -0.31894112 -1.83867213 . -0.45287311 0.09292224 ## GENE50000 -0.26584102 1.52669880 0.07709258 . 0.24782881 -2.19351922

来查看我们使用的文件结构h5ls

h5ls (testh5)
##组名otype dclass dim ## 0 /. smallmatrix_dimnames H5I_GROUP ## 1 /。smallMatrix_dimnames 1 H5I_DATASET STRING 50000 ## 2 /。smallMatrix_dimnames 2 H5I_DATASET STRING 20 ## 3 / smallMatrix H5I_DATASET FLOAT 50000 x 20

1.4导入HDF5文件

方法也可以加载HDF5文件中的大型矩阵HDF5ArraySeed而且DelayedArray功能。

hdf5Data <- HDF5ArraySeed(file = testh5, name = "smallMatrix") <- DelayedArray(hdf5Data) class(newDelayedMatrix)
# #[1]“HDF5Matrix”# # attr(“包”)# #[1]“HDF5Array”
newDelayedMatrix
## <50000 x 20>矩阵类HDF5Matrix和类型“double”:## SampleID1 SampleID2 SampleID3…SampleID19 SampleID20 ## GENE1 -0.4312294 1.4409797 -2.0504121。0.7351786 -1.5447315 ## gene2 0.4393512 0.9604603 -0.3668742。0.7862599 1.2339122 gene3 0.1843361 -1.0383368 -0.4712812。-0.7851665 -3.1241170基因4 -0.4610344 -0.5515895 -0.5869010。0.6907843 0.4806580 ## gene5 -0.2497588 0.1431237 -1.4726718。1.0122940 0.5614055 ## ... ... ...## gene49996 0.65586323 1.38831001 -0.17642264。0.20264386 1.68516901 ## gene49997 -0.25786834 0.35891334 -0.79591330。-0.86248700 -0.78932440基因49998 -1.46505565 -0.90918203 -0.71795594。 -1.26711203 -1.03178018 ## GENE49999 -0.22687653 -0.31894112 -1.83867213 . -0.45287311 0.09292224 ## GENE50000 -0.26584102 1.52669880 0.07709258 . 0.24782881 -2.19351922

1.5使用一个DelayedMatrixMultiAssayExperiment

一个DelayedMatrix单独符合MultiAssayExperimentAPI要求。如下所示,DelayedMatrix能不能放进一个有名字的列表然后进入MultiAssayExperiment构造函数。

HDF5MAE <- MultiAssayExperiment(实验= list(smallMatrix = smallMatrix))
##数据帧20行3列##分析主colname ## <因子> <字符> <字符> ## 1 smallMatrix SampleID1 SampleID1 ## 2 smallMatrix SampleID2 SampleID2 ## 3 smallMatrix SampleID3 SampleID3 ## 4 smallMatrix SampleID4 SampleID4 ## 5 smallMatrix SampleID5 SampleID5 ## ... ... ... ...## 17 smallMatrix SampleID17 SampleID17 ## 18 smallMatrix SampleID18 SampleID18 ## 19 smallMatrix SampleID19 SampleID19 ## 20 smallMatrix SampleID20
colData (HDF5MAE)
## 20行0列的数据框架

1.5.1SummarizedExperimentDelayedMatrix后端

更丰富的信息DelayedMatrix与?一起使用时可以创建SummarizedExperiment类,它甚至可以包含rowRanges.的灵活性MultiAssayExperimentAPI支持需求最小的类。此外,这SummarizedExperimentDelayedMatrix后端可以是更大的MultiAssayExperiment对象。下面是如何工作的一个最小示例:

HDF5SE <- summary experiment (assays = smallMatrix) assay(HDF5SE)
## <50000 x 20>矩阵类DelayedMatrix和类型“double”:## SampleID1 SampleID2 SampleID3…SampleID19 SampleID20 ## GENE1 -0.4312294 1.4409797 -2.0504121。0.7351786 -1.5447315 ## gene2 0.4393512 0.9604603 -0.3668742。0.7862599 1.2339122 gene3 0.1843361 -1.0383368 -0.4712812。-0.7851665 -3.1241170基因4 -0.4610344 -0.5515895 -0.5869010。0.6907843 0.4806580 ## gene5 -0.2497588 0.1431237 -1.4726718。1.0122940 0.5614055 ## ... ... ...## gene49996 0.65586323 1.38831001 -0.17642264。0.20264386 1.68516901 ## gene49997 -0.25786834 0.35891334 -0.79591330。-0.86248700 -0.78932440基因49998 -1.46505565 -0.90918203 -0.71795594。 -1.26711203 -1.03178018 ## GENE49999 -0.22687653 -0.31894112 -1.83867213 . -0.45287311 0.09292224 ## GENE50000 -0.26584102 1.52669880 0.07709258 . 0.24782881 -2.19351922
MultiAssayExperiment(list(HDF5SE = HDF5SE)
一个MultiAssayExperiment对象,列出了一个自定义名称和相应的类。HDF5SE: summarizeexperimental with 50000行和20列## experiments() -获取ExperimentList实例## colData() -主/表型DataFrame ## sampleMap() -样本协调DataFrame ## ' $ ', '[', '[[' -提取colData列,子集,或实验## *格式()-转换为长或宽的DataFrame ## assays() -转换ExperimentList为矩阵的SimpleList ## exportClass() -保存数据到平面文件

其他场景目前正在开发中HDF5Matrix远程托管。在考虑数据的磁盘上和磁盘外表示时,存在许多机会MultiAssayExperiment

2会话信息

sessionInfo ()
## R版本4.2.1(2022-06-23)##平台:x86_64-pc-linux-gnu(64位)##运行在Ubuntu 20.04.5 LTS ## ##矩阵产品:默认## BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas。/home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack。所以## ## locale: ## [1] LC_CTYPE=en_US。UTF-8 LC_NUMERIC= c# # [3] LC_TIME=en_GB LC_COLLATE= c# # [5] LC_MONETARY=en_US。utf - 8 LC_MESSAGES = en_US。UTF-8 ## [7] LC_PAPER=en_US。UTF-8 LC_NAME= c# # [9] LC_ADDRESS=C lc_phone = c# # [11] LC_MEASUREMENT=en_US。UTF-8 LC_IDENTIFICATION=C ## ##附加的基本包:## [1]stats4 stats graphics grDevices utils datasets methods ##[8]基础## ##其他附加包:# # # # [1] HDF5Array_1.26.0 rhdf5_2.42.0 [3] DelayedArray_0.24.0 Matrix_1.5-1 # # [5] survminer_0.4.9 ggpubr_0.4.0 # # [7] ggplot2_3.3.6 survival_3.4-0 # # [9] UpSetR_1.4.0 RaggedExperiment_1.22.0 # # [11] MultiAssayExperiment_1.24.0 SummarizedExperiment_1.28.0 # # [13] Biobase_2.58.0 GenomicRanges_1.50.0 # # [15] GenomeInfoDb_1.34.0 IRanges_2.32.0 # # [17] S4Vectors_0.36.0 BiocGenerics_0.44.0 # # [19] MatrixGenerics_1.10.0 matrixStats_0.62.0 # # [21] BiocStyle_2.26.0 # # # #加载(而不是通过一个名称空间attached): ## [1] ggtext_0.1.2 bitops_1.0-7 R.cache_0.16.0 ## [4] tools_4.2.1 backports_1.4.1 bslib_0.4.0 ## [7] utf8_1.2.2 R6_2.5.1 DBI_1.1.3 ## [10] colorspace_2.0-3 rhdf5filters_1.10.0 withr_2.5.0 ## [13] tidyselect_1.2.0 gridExtra_2.3 compiler_4.2.1 ## [16] cli_3.4.1 xml2_1.3.3 labeling_0.4.2 ## [19] bookdown_0.29 sass_0.4.2 scales_1.2.1 ## [22] survMisc_0.5.6 commonmark_1.8.1 stringr_1.4.1 ## [25] digest_0.6.30 rmarkdown_2.17 R.utils_2.12.1 ## [28] XVector_0.38.0 pkgconfig_2.0.3 htmltools_0.5.3 ## [31] fastmap_1.1.0 highr_0.9 rlang_1.0.6 ## [34] jquerylib_0.1.4 farver_2.1.1 generics_0.1.3 ## [37] zoo_1.8-11 jsonlite_1.8.3 dplyr_1.0.10 ## [40] car_3.1-1 R.oo_1.25.0 RCurl_1.98-1.9 ## [43] magrittr_2.0.3 GenomeInfoDbData_1.2.9 Rhdf5lib_1.20.0 ## [46] Rcpp_1.0.9 munsell_0.5.0 fansi_1.0.3 ## [49] abind_1.4-5 lifecycle_1.0.3 R.methodsS3_1.8.2 ## [52] stringi_1.7.8 yaml_2.3.6 carData_3.0-5 ## [55] zlibbioc_1.44.0 plyr_1.8.7 grid_4.2.1 ## [58] lattice_0.20-45 splines_4.2.1 gridtext_0.1.5 ## [61] magick_2.7.3 knitr_1.40 pillar_1.8.1 ## [64] markdown_1.3 ggsignif_0.6.4 glue_1.6.2 ## [67] evaluate_0.17 data.table_1.14.4 BiocManager_1.30.19 ## [70] vctrs_0.5.0 gtable_0.3.1 purrr_0.3.5 ## [73] tidyr_1.2.1 km.ci_0.5-6 assertthat_0.2.1 ## [76] cachem_1.0.6 xfun_0.34 BiocBaseUtils_1.0.0 ## [79] xtable_1.8-4 broom_1.0.1 rstatix_0.7.0 ## [82] tibble_3.1.8 KMsurv_0.1-5 ellipsis_0.3.2 ## [85] R.rsp_0.45.0