从LOLA开始

在本文中，您将使用LOLA包附带的小型示例数据集来初步了解LOLA工作流中最常见的功能。

运行LOLA分析需要3个条件:

一个区域数据库。
userSets:感兴趣的区域(至少有一组区域作为GRanges对象，或多个感兴趣的区域作为GRangesList对象)
userUniverse:测试包含在感兴趣的区域集中的区域集(单个GRanges对象)

让我们加载一个示例regionDBloadRegionDB ()．下面是LOLA附带的一个小例子。数据库位置应该指向一个包含集合子文件夹的文件夹:

图书馆（“洛拉”) dbPath =执行（“extdata”，“hg19”，包=“洛拉”) regionDB =loadRegionDB(dbPath)

##阅读集合注释:

## ucsc_example: found collection annotation:/tmp/RtmpRoVwke/Rinst7aae3f28a8be/LOLA/extdata/hg19/ucsc_example/collection.txt

##读取区域注释…

在readRegionSetAnnotation(dbLocation, collections)中##警告:你没有安装simpleCache，所以你将不能在读入## regionDB后缓存它。安装simpleCache可以加快数据库##的加载。

# # / tmp / RtmpRoVwke / Rinst7aae3f28a8be洛拉/ extdata / hg19 / / ucsc_example /地区

在“ucsc_example”中找到索引文件:/tmp/RtmpRoVwke/Rinst7aae3f28a8be/LOLA/extdata/hg19/ucsc_example/index.txt

## ' [.data]中的警告。table ' (indexDT，， ':= ' (col, as.character(NA))， with ## =FALSE): with=FALSE with:=在2014年10月发布的v1.9.4中已弃用。请将:=的LHS用括号括起来;例如，DT[， ## (myVar):=sum(b)，by=a]分配到变量myVar中持有的列名。看到了吗?':='其他例子。正如2014年警告的那样，现在这是一个警告。

## ' [.data]中的警告。table ' (indexDT，， ':= ' (col, as.character(NA))， with ## =FALSE): with=FALSE with:=在2014年10月发布的v1.9.4中已弃用。请将:=的LHS用括号括起来;例如，DT[， ## (myVar):=sum(b)，by=a]分配到变量myVar中持有的列名。看到了吗?':='其他例子。正如2014年警告的那样，现在这是一个警告。## ' [.data]中的警告。table ' (indexDT，， ':= ' (col, as.character(NA))， with ## =FALSE): with=FALSE with:=在2014年10月发布的v1.9.4中已弃用。请将:=的LHS用括号括起来;例如，DT[， ## (myVar):=sum(b)，by=a]分配到变量myVar中持有的列名。 ## See ?':=' for other examples. As warned in 2014, this is now a warning. ## Warning in `[.data.table`(indexDT, , `:=`(col, as.character(NA)), with ## = FALSE): with=FALSE together with := was deprecated in v1.9.4 released ## Oct 2014. Please wrap the LHS of := with parentheses; e.g., DT[, ## (myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. ## See ?':=' for other examples. As warned in 2014, this is now a warning. ## Warning in `[.data.table`(indexDT, , `:=`(col, as.character(NA)), with ## = FALSE): with=FALSE together with := was deprecated in v1.9.4 released ## Oct 2014. Please wrap the LHS of := with parentheses; e.g., DT[, ## (myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. ## See ?':=' for other examples. As warned in 2014, this is now a warning.

##集合:ucsc_example。正在创建大小文件…

# # ucsc_example

##读取5个文件…

## 1: /tmp/RtmpRoVwke/Rinst7aae3f28a8be/LOLA/extdata/hg19/ucsc_example/regions/cpgIslandExt.bed

## 2: /tmp/RtmpRoVwke/Rinst7aae3f28a8be/LOLA/extdata/hg19/ucsc_example/regions/ laminb1lld .bed

## 3: /tmp/RtmpRoVwke/Rinst7aae3f28a8be/LOLA/extdata/hg19/ucsc_example/regions/ numtsassemble .bed

## 4: /tmp/RtmpRoVwke/Rinst7aae3f28a8be/LOLA/extdata/hg19/ucsc_example/regions/ vistaenhancors .bed

5: /tmp/RtmpRoVwke/Rinst7aae3f28a8be/LOLA/extdata/hg19/ucsc_example/regions/vistaEnhancers_colNames.bed

regionDB是一个R (list)对象，它有几个元素:

的名字(regionDB)

##[1]“dbLocation”“regionAnno”“collectionAnno”“regionGRL”

dbLocation:一个字符串，记录传递到的数据库文件夹的位置loadRegionDB ()．
collectionAnno:data.table对集合进行注释，使用与您的集合数据库中的注释文件。
regionAnno:data.table注释每个区域集，用行对应数据库中的床文件(还有一个集合记录每个区域集属于哪个集合的列)。
regionGRL:GRangesList对象保存实际的区域，每个区域集有一个列表元素，顺序如下regionAnno．

现在加载了数据库，让我们加载一些示例数据(感兴趣的区域和测试的宇宙):

数据（“sample_input”，包=“洛拉”）#加载用户集数据（“sample_universe”，包=“洛拉”）#加载userUniverse

现在我们有一个GRanges对象叫做userSets和GRanges对象userUniverse．这就是我们进行浓缩计算所需要的。calcLocEnrichemnt ()将测试您的userset和regionDB中的每个region set之间的重叠。

locResults =runLOLA(userSets, userUniverse, regionDB，核=1）

计算单元集重叠…

计算宇宙集重叠…

##计算费雪分数…

runLOLA测试regionDB中每个用户集和每个区域集之间的成对重叠。然后，它使用费雪精确测试来评估重叠的重要性。结果是data.table有几列:

colnames(locResults)

##[1]“userSet”“dbSet”“collection”“pValueLog”##[5]“logOddsRatio”“support”“rnkPV”“rnkLO”##[9]“rnkSup”“maxRnk”“meanRnk”“b”##[13]“c”“d”“description”“cellType”##[17]“组织”“抗体”“治疗”“dataSource”##[21]“filename”“qValue”“size”

头(locResults)

# # userSet dbSet收集pValueLog logOddsRatio支持rnkPV rnkLO # # 1: setB 2 ucsc_example 264.4863407 - 7.7578283 850年1 1 # # 2:刚毛2 ucsc_example 254.6188080 - 8.6487312 632年1 1 # # 3:setB ucsc_example 34.6073689 - 3.3494078 5747 2 2 # # 4:刚毛4 ucsc_example 1.7169689 1.2377725 124 2 2 # # 5:刚毛5 ucsc_example 1.7169689 1.2377725 124 2 2 # # 6:刚毛3 ucsc_example 0.1877354 - 0.9135696 8 4 4 # # rnkSup maxRnk meanRnk b c d描述# # 1:2 2 1.33 452 4981 20546 ucsc_example # # 2:2 2 1.33 670 2510 23017 ucsc_example # # 3: 1 2 1.67 20018 84 980 CpG岛UCSC的注释# # 4:3 3 2.33 761 3018 22926 ucsc_example # # 5: 3 3 2.33 761 3018 22926 ucsc_example # # 6: 5 5 4.33 66 3134 23621 ucsc_example # # cellType组织抗体治疗数据源# # 1:# # 2:不详不详不详不详不详不详不详不详不详不详# # 3:# # 4:不详不详不详不详不详不详不详不详不详不详# # 5:# # 6:不详不详不详不详不详不详不详不详不详不详qValue大小# # 1 # #文件名:laminB1Lads。床3.263317e-264 1302 ## 2: laminB1Lads。bed 1.202713e-254 1302 ## 3: cpgIslandExt.bed 8.232086e-35 28691 ## 4: vistaEnhancers.bed 3.837612e-02 1339 ## 5: vistaEnhancers_colNames.bed 3.837612e-02 1340 ## 6: numtSAssembled.bed 1.000000e+00 78

如果你不熟悉怎么做data.table是用R写的，值得一读这个强大包的文档．列userSet而且dbSet是对应GRangeList对象的索引，用于标识每个成对比较。有一系列列描述统计检验的结果，例如pValueLog，logOdds，以及列联表(支持是重叠吗b，c,d完成2x2表)。排序列只是对测试进行排序pValueLog，logOdds,或支持；属性之后的一系列列注释了数据库区域，具体取决于您如何填充指数表。

你可以在R中探索这些结果，例如，按不同的顺序排序:

locResults [订单(支持,减少=真正的),)

## userSet dbSet collection pValueLog logOddsRatio支持rnkPV ## 1: setB 1 ucsc_example 3.460737e+01 3.3494078 5747 2 ## 2: setA 1 ucsc_example 2.818334e-02 0.8704355 3002 5 ## 3: setB 2 ucsc_example 2.644863e+02 7.7578283 850 1 ## 4: setA 2 ucsc_example 2.546188e+02 8.6487312 632 1 ## 5: setA 4 ucsc_example 1.716969e+00 1.2377725 124 2 ## 7: setB 4 ucsc_example 1.716969e+00 1.2377725 124 2 ## 7: setB 4 ucsc_example 0.000000e+00 0.3489379 80 4 ## 8:setB 5 ucsc_example 0.000000 e + 00 0.3489379 80 4 # # 9:刚毛3 ucsc_example 1.877354 e-01 0.9135696 8 4 # # 10: setB 3 ucsc_example 9.184826 e-06 0.2052377 4 3 # # rnkLO rnkSup maxRnk meanRnk b c d # # 1: 2 1 2 1.67 20018 84 980 # # 2: 3.67 22763 140 924 5 1 5 # # 3: 1 2 2 1.33 452 4981 20546 # # 4: 1.33 670 2510 23017 1 2 2 # # 5: 2 3 3 2.33 761 3018 22926 # # 6: 2 3 3 2.33 761 3018 22926 # # 7: 3 3 4 3.33 805 5751 20193 # # 8: 3 3 4 3.33 805 5751 20193 # # 9: 4 5 5 4.33 66 3134 23621 # # 10:4.33 70 5827 20928 5 5 5 # #描述cellType组织抗体治疗# # 1:CpG岛从UCSC的注释缺缺缺缺# # 2:CpG岛从UCSC的注释缺缺缺缺# # 3:ucsc_example缺缺缺缺# # 4:ucsc_example缺缺缺缺# # 5:ucsc_example缺缺缺缺# # 6:ucsc_example缺缺缺缺# # 7:ucsc_example缺缺缺缺# # 8:ucsc_example缺缺缺缺# # 9:ucsc_example缺缺缺缺# # 10:ucsc_example缺缺缺缺# #数据源文件名qValue大小# # 1:NA cpgIslandExt。床8.232086e-35 28691 ## 2: NA cpgIslandExt。床1.000000e+00 28691 ## 3: NA laminB1Lads。床3.263317e-264 1302 ## 4: NA laminB1Lads。bed 1.202713e-254 1302 ## 5: NA vistaEnhancers.bed 3.837612e-02 1339 ## 6: NA vistaEnhancers_colNames.bed 3.837612e-02 1340 ## 7: NA vistaEnhancers.bed 1.000000e+00 1339 ## 8: NA vistaEnhancers_colNames.bed 1.000000e+00 1340 ## 9: NA numtSAssembled.bed 1.000000e+00 78 ## 10: NA numtSAssembled.bed 1.000000e+00 78

你可以按其中一列排序:

locResults [订单(maxRnk减少=真正的),)

## userSet dbSet collection pValueLog logOddsRatio支持rnkPV ## 1: setA 3 ucsc_example 2.818334e- 01 0.9135696 84 ## 2: setA 1 ucsc_example 2.818334e-02 0.8704355 3002 5 ## 3: setB 3 ucsc_example 9.184826e-06 0.2052377 43 ## 4: setB 4 ucsc_example 0.000000e+00 0.3489379 80 4 ## 5: setB 5 ucsc_example 0.000000e+00 0.3489379 80 4 ## 6: setA 4 ucsc_example 1.716969e+00 1.2377725 124 2 ## 7: setA 5 ucsc_example 1.716969e+00 1.2377725 124 2 ## 8:setB 2 ucsc_example 2.644863 e + 02 7.7578283 850 1 # # 9:刚毛2 ucsc_example 2.546188 e + 02 8.6487312 632 1 # # 10: setB ucsc_example 3.460737 e + 01 3.3494078 5747 2 # # rnkLO rnkSup maxRnk meanRnk b c d # # 1: 4 5 5 4.33 66 3134 23621 # # 2: 3.67 22763 140 924 5 1 5 # # 3: 4.33 70 5827 20928 5 5 5 # # 4: 3 3 4 3.33 805 5751 20193 # # 5: 3 3 4 3.33 805 5751 20193 # # 6: 2 3 3 2.33 761 3018 22926 # # 7: 2 3 3 2.33 761 3018 22926 # # 8: 1.33 452 4981 20546 1 2 2 # # 9: 1 2 2 1.33 670 2510 23017 # # 10:2 1 2 1.67 20018 84 980 # #描述cellType组织抗体治疗# # 1:ucsc_example缺缺缺缺# # 2:CpG岛从UCSC的注释缺缺缺缺# # 3:ucsc_example缺缺缺缺# # 4:ucsc_example缺缺缺缺# # 5:ucsc_example缺缺缺缺# # 6:ucsc_example缺缺缺缺# # 7:ucsc_example缺缺缺缺# # 8:ucsc_example缺缺缺缺# # 9:ucsc_example缺缺缺缺# # 10:CpG岛从UCSC的注释缺缺缺缺# #数据源文件名qValue大小# # 1:NA numtSAssembled。床1.000000e+00 78 ## 2: NA cpgIslandExt。bed 1.000000e+00 28691 ## 3: NA numtSAssembled.bed 1.000000e+00 78 ## 4: NA vistaEnhancers.bed 1.000000e+00 1339 ## 5: NA vistaEnhancers_colNames.bed 1.000000e+00 1340 ## 6: NA vistaEnhancers.bed 3.837612e-02 1339 ## 7: NA vistaEnhancers_colNames.bed 3.837612e-02 1340 ## 8: NA laminB1Lads.bed 3.263317e-264 1302 ## 9: NA laminB1Lads.bed 1.202713e-254 1302 ## 10: NA cpgIslandExt.bed 8.232086e-35 28691

最后，将结果记录到文件中，如下所示:

写出结果:

writeCombinedEnrichment(locResults倒转褶皱=“lolaResults”）

# # lolaResults / userSet_setA.tsv

# # lolaResults / userSet_setB.tsv

默认情况下，该函数将把整个表写入tsv文件。我建议使用inclesplits参数，它告诉函数也打印出由userSet设置的附加表，这样您测试的每个区域集都有自己的结果表。它只是让探索结果更容易一些。

writeCombinedEnrichment(locResults倒转褶皱=“lolaResults”，includeSplits =真正的）

##覆盖lolaResults/userSet_setA.tsv…

##覆盖lolaResults/userSet_setB.tsv…

##覆盖lolaResults/ allenrichment .tsv…

从数据库中提取特定的区域集

如果您有一个大型数据库，您可能有兴趣将LOLA数据库格式用于其他项目，或用于额外的后续分析。在本例中，您可能只对数据库中的一个区域集感兴趣，也可能只对其中的几个区域集感兴趣。LOLA提供了从已加载或未加载的数据库中提取特定区域集的函数。

假设您只想要一个具有“vistaEnhancers”区域集中区域的对象。你可以像这样从加载的数据库中获取它:

getRegionSet(regionDB集合=“ucsc_example”，文件名=“vistaEnhancers.bed”）

##长度为1的GRangesList对象:##[[1]]##包含1339个范围和0个元数据列的GRanges对象:## seqnames ranges strand ##    ## 1 chr1 [3190581,3191428] * ## 2 chr1 [8130439,8131887] * ## 3 chr1 [10593123, 10594209] * ## 4 chr1 [10732070, 10733118] * ## 5 chr1 [10757664, 10758631] * ## ... ... ... ...## 1335 chrX [139380916,139382199] * ## 1336 chrX [139593502,139594774] * ## 1337 chrX [139674499, 139675403] * ## 1338 chrX [147829016,147830159] * ## 1339 chrX [150407692,150409052] * ## ## ------- ## seqinfo:来自未指定基因组的69个序列;没有seqlengths

或者，如果您还没有加载数据库，您可以只给出数据库的路径，LOLA将只加载您感兴趣的特定区域集。它可以包含多个文件名或集合:

getRegionSet(dbPath集合=“ucsc_example”，文件名=“vistaEnhancers.bed”）

##读取集合注释:ucsc_example

## ucsc_example: found collection annotation:/tmp/RtmpRoVwke/Rinst7aae3f28a8be/LOLA/extdata/hg19/ucsc_example/collection.txt

##读取区域注释…

在readRegionSetAnnotation(dbLocation, collections)中##警告:你没有安装simpleCache，所以你将不能在读入## regionDB后缓存它。安装simpleCache可以加快数据库##的加载。

# # / tmp / RtmpRoVwke / Rinst7aae3f28a8be洛拉/ extdata / hg19 / / ucsc_example /地区

在“ucsc_example”中找到索引文件:/tmp/RtmpRoVwke/Rinst7aae3f28a8be/LOLA/extdata/hg19/ucsc_example/index.txt

## ' [.data]中的警告。table ' (indexDT，， ':= ' (col, as.character(NA))， with ## =FALSE): with=FALSE with:=在2014年10月发布的v1.9.4中已弃用。请将:=的LHS用括号括起来;例如，DT[， ## (myVar):=sum(b)，by=a]分配到变量myVar中持有的列名。看到了吗?':='其他例子。正如2014年警告的那样，现在这是一个警告。

## ' [.data]中的警告。table ' (indexDT，， ':= ' (col, as.character(NA))， with ## =FALSE): with=FALSE with:=在2014年10月发布的v1.9.4中已弃用。请将:=的LHS用括号括起来;例如，DT[， ## (myVar):=sum(b)，by=a]分配到变量myVar中持有的列名。看到了吗?':='其他例子。正如2014年警告的那样，现在这是一个警告。## ' [.data]中的警告。table ' (indexDT，， ':= ' (col, as.character(NA))， with ## =FALSE): with=FALSE with:=在2014年10月发布的v1.9.4中已弃用。请将:=的LHS用括号括起来;例如，DT[， ## (myVar):=sum(b)，by=a]分配到变量myVar中持有的列名。 ## See ?':=' for other examples. As warned in 2014, this is now a warning. ## Warning in `[.data.table`(indexDT, , `:=`(col, as.character(NA)), with ## = FALSE): with=FALSE together with := was deprecated in v1.9.4 released ## Oct 2014. Please wrap the LHS of := with parentheses; e.g., DT[, ## (myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. ## See ?':=' for other examples. As warned in 2014, this is now a warning. ## Warning in `[.data.table`(indexDT, , `:=`(col, as.character(NA)), with ## = FALSE): with=FALSE together with := was deprecated in v1.9.4 released ## Oct 2014. Please wrap the LHS of := with parentheses; e.g., DT[, ## (myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. ## See ?':=' for other examples. As warned in 2014, this is now a warning.

##集合:ucsc_example。正在创建大小文件…

##读取1个文件…

## 1: /tmp/RtmpRoVwke/Rinst7aae3f28a8be/LOLA/extdata/hg19/ucsc_example/regions/vistaEnhancers.bed

##长度为1的GRangesList对象:##[[1]]##包含1339个范围和0个元数据列的GRanges对象:## seqnames ranges strand ##    ## 1 chr1 [3190581,3191428] * ## 2 chr1 [8130439,8131887] * ## 3 chr1 [10593123, 10594209] * ## 4 chr1 [10732070, 10733118] * ## 5 chr1 [10757664, 10758631] * ## ... ... ... ...## 1335 chrX [139380916,139382199] * ## 1336 chrX [139593502,139594774] * ## 1337 chrX [139674499, 139675403] * ## 1338 chrX [147829016,147830159] * ## 1339 chrX [150407692,150409052] * ## ## ------- ## seqinfo:来自未指定基因组的23个序列;没有seqlengths

现在您已经对函数有了基本的了解，接下来可以学习一些其他的小知识，例如使用LOLA Core，以了解它如何在实际数据集上工作。

从LOLA开始

内森·谢菲尔德

2017-04-24

从LOLA开始

探索LOLA结果

从数据库中提取特定的区域集