作者的英文文档对这个包的用法描述的非常清楚, 我只是记录一下自己学习该包用法的一点感悟。

 安装并加载必须的packages

如果你还没有安装,就运行下面的代码安装:

source('http://bioconductor.org/biocLite.R');

install.packages('deconstructSigs')
# dependencies 'BSgenome', 'BSgenome.Hsapiens.UCSC.hg19' 
BiocInstaller::biocLite('BSgenome')
BiocInstaller::biocLite('BSgenome.Hsapiens.UCSC.hg19')

如果你安装好了,就直接加载它们即可

suppressPackageStartupMessages(library("deconstructSigs")) 

输入数据的准备

head(sample.mut.ref)
##   Sample  chr      pos ref alt
## 1      1 chr1   905907   A   T
## 2      1 chr1  1192480   C   A
## 3      1 chr1  1854885   G   C
## 4      1 chr1  9713992   G   A
## 5      1 chr1 12908093   C   A
## 6      1 chr1 17257855   C   T
class(sample.mut.ref)
## [1] "data.frame"
dim(sample.mut.ref)
## [1] 657   5

只需要一个很简单的数据框即可,5列,分别是样本ID,chr,pos,ref,alt ,很容易从MAF文件里面提取。

理解包自带的signature数据

head(t(signatures.nature2013))
##         Signature.1A Signature.1B Signature.2 Signature.3 Signature.4
## A[C>A]A       0.0112       0.0104      0.0105      0.0240      0.0365
## A[C>A]C       0.0092       0.0093      0.0061      0.0197      0.0309
## A[C>A]G       0.0015       0.0016      0.0013      0.0019      0.0183
## A[C>A]T       0.0063       0.0067      0.0037      0.0172      0.0243
## C[C>A]A       0.0067       0.0090      0.0061      0.0194      0.0461
## C[C>A]C       0.0074       0.0047      0.0012      0.0161      0.0614
##         Signature.5 Signature.6 Signature.7 Signature.8 Signature.9
## A[C>A]A      0.0149      0.0017      0.0004      0.0368      0.0120
## A[C>A]C      0.0089      0.0028      0.0005      0.0287      0.0067
## A[C>A]G      0.0022      0.0005      0.0000      0.0017      0.0005
## A[C>A]T      0.0092      0.0019      0.0004      0.0300      0.0068
## C[C>A]A      0.0097      0.0101      0.0012      0.0303      0.0098
## C[C>A]C      0.0050      0.0241      0.0006      0.0203      0.0057
##         Signature.10 Signature.11 Signature.12 Signature.13 Signature.14
## A[C>A]A       0.0007       0.0002       0.0077       0.0007       0.0001
## A[C>A]C       0.0010       0.0010       0.0047       0.0001       0.0042
## A[C>A]G       0.0003       0.0000       0.0017       0.0001       0.0005
## A[C>A]T       0.0092       0.0002       0.0046       0.0002       0.0296
## C[C>A]A       0.0031       0.0007       0.0135       0.0035       0.0056
## C[C>A]C       0.0009       0.0017       0.0112       0.0011       0.0102
##         Signature.15 Signature.16 Signature.17 Signature.18 Signature.19
## A[C>A]A       0.0013       0.0161       0.0018       0.0500       0.0107
## A[C>A]C       0.0040       0.0097       0.0003       0.0076       0.0074
## A[C>A]G       0.0000       0.0022       0.0000       0.0017       0.0005
## A[C>A]T       0.0057       0.0088       0.0032       0.0181       0.0074
## C[C>A]A       0.0106       0.0159       0.0010       0.0965       0.0112
## C[C>A]C       0.0084       0.0100       0.0004       0.0219       0.0159
##         Signature.20 Signature.21 Signature.R1 Signature.R2 Signature.R3
## A[C>A]A       0.0013       0.0001       0.0210       0.0137       0.0044
## A[C>A]C       0.0024       0.0007       0.0065       0.0046       0.0047
## A[C>A]G       0.0000       0.0000       0.0000       0.0048       0.0003
## A[C>A]T       0.0029       0.0006       0.0058       0.0081       0.0034
## C[C>A]A       0.0178       0.0020       0.0076       0.3117       0.0156
## C[C>A]C       0.0375       0.0014       0.0023       0.1477       0.0072
##         Signature.U1 Signature.U2
## A[C>A]A       0.0105       0.0221
## A[C>A]C       0.0005       0.0123
## A[C>A]G       0.0000       0.0028
## A[C>A]T       0.0112       0.0118
## C[C>A]A       0.0173       0.0057
## C[C>A]C       0.0051       0.0081
head(t(signatures.cosmic))
##         Signature.1  Signature.2 Signature.3 Signature.4 Signature.5
## A[C>A]A 0.011098326 6.827082e-04  0.02217231      0.0365 0.014941548
## A[C>A]C 0.009149341 6.191072e-04  0.01787168      0.0309 0.008960918
## A[C>A]G 0.001490070 9.927896e-05  0.00213834      0.0183 0.002207846
## A[C>A]T 0.006233885 3.238914e-04  0.01626515      0.0243 0.009206905
## C[C>A]A 0.006595870 6.774450e-04  0.01878173      0.0461 0.009674904
## C[C>A]C 0.007342368 2.136810e-04  0.01576046      0.0614 0.004952301
##         Signature.6 Signature.7 Signature.8 Signature.9 Signature.10
## A[C>A]A      0.0017      0.0004 0.036718004      0.0120       0.0007
## A[C>A]C      0.0028      0.0005 0.033245722      0.0067       0.0010
## A[C>A]G      0.0005      0.0000 0.002525311      0.0005       0.0003
## A[C>A]T      0.0019      0.0004 0.033598550      0.0068       0.0092
## C[C>A]A      0.0101      0.0012 0.031723757      0.0098       0.0031
## C[C>A]C      0.0241      0.0006 0.025505407      0.0057       0.0009
##         Signature.11 Signature.12 Signature.13 Signature.14 Signature.15
## A[C>A]A       0.0002       0.0077 3.347572e-04       0.0001       0.0013
## A[C>A]C       0.0010       0.0047 6.487361e-04       0.0042       0.0040
## A[C>A]G       0.0000       0.0017 3.814459e-05       0.0005       0.0000
## A[C>A]T       0.0002       0.0046 8.466585e-04       0.0296       0.0057
## C[C>A]A       0.0007       0.0135 1.710090e-03       0.0056       0.0106
## C[C>A]C       0.0017       0.0112 1.159257e-03       0.0102       0.0084
##         Signature.16 Signature.17 Signature.18 Signature.19 Signature.20
## A[C>A]A       0.0161 1.832019e-03  0.050536419       0.0107 1.179962e-03
## A[C>A]C       0.0097 3.422356e-04  0.010939825       0.0074 2.211505e-03
## A[C>A]G       0.0022 1.576225e-06  0.002288073       0.0005 1.616910e-07
## A[C>A]T       0.0088 3.179665e-03  0.019424091       0.0074 3.008010e-03
## C[C>A]A       0.0159 1.032430e-03  0.088768109       0.0112 1.737711e-02
## C[C>A]C       0.0100 4.218801e-04  0.020641391       0.0159 3.650246e-02
##         Signature.21 Signature.22 Signature.23 Signature.24 Signature.25
## A[C>A]A       0.0001 0.0015040704 0.0004533607   0.02864599  0.009896768
## A[C>A]C       0.0007 0.0024510110 0.0003668005   0.02021464  0.006998929
## A[C>A]G       0.0000 0.0000000000 0.0000000000   0.02047900  0.001448443
## A[C>A]T       0.0006 0.0009224525 0.0000000000   0.02460015  0.004966565
## C[C>A]A       0.0020 0.0045496929 0.0001647394   0.06355928  0.014832948
## C[C>A]C       0.0014 0.0037644739 0.0007368748   0.03375700  0.007822175
##         Signature.26 Signature.27 Signature.28 Signature.29 Signature.30
## A[C>A]A 0.0020397729 0.0052056269 0.0013974388   0.06998199    0.0000000
## A[C>A]C 0.0014871623 0.0047382274 0.0009171877   0.05515236    0.0000000
## A[C>A]G 0.0002839456 0.0007826979 0.0000000000   0.01784698    0.0019673
## A[C>A]T 0.0005978656 0.0027182425 0.0005134100   0.02680472    0.0000000
## C[C>A]A 0.0037058501 0.0050650733 0.0011685156   0.05141021    0.0000000
## C[C>A]C 0.0039807234 0.0022341533 0.0003342918   0.02582565    0.0000000
head(t(randomly.generated.tumors)[,1:5])
##                    1           2           3           4           5
## A[C>A]A 1.314202e-03 0.011030105 0.015149499 0.016360877 0.020639201
## A[C>A]C 1.961651e-03 0.007874988 0.012403431 0.012688212 0.005849898
## A[C>A]G 9.687434e-05 0.003877974 0.006080275 0.001582708 0.001652969
## A[C>A]T 2.529105e-03 0.007840416 0.009771184 0.010520043 0.008789111
## C[C>A]A 5.644115e-03 0.014087362 0.017718979 0.017555471 0.037798557
## C[C>A]C 3.838037e-03 0.015529936 0.021613106 0.011758016 0.014110015

可以看到signatures.nature2013里面是对每个样本的96种突变情况算出了百分比,而signatures.cosmic是对30个signature的96种突变情况算出了百分比,随机生成的randomly.generated.tumors里面有100个样本。那么我们也应该对自己的数据算出这个96种突变情况算出了百分比。

算出96种突变情况算出了百分比

sigs.input <- mut.to.sigs.input(mut.ref = sample.mut.ref, 
                                sample.id = "Sample", 
                                chr = "chr", 
                                pos = "pos", 
                                ref = "ref", 
                                alt = "alt")
class(sigs.input)
## [1] "data.frame"
head(t(sigs.input))
##          1 2
## A[C>A]A  9 1
## A[C>A]C  7 1
## A[C>A]G  5 0
## A[C>A]T  7 0
## C[C>A]A 10 3
## C[C>A]C 18 2

很明显,还是一个普通的数据框,这样的计算百分比其实难度很小,我在直播我的基因组上面也讲过。

推断signature的组成

# Determine the signatures contributing to the two example samples
sample_1 = whichSignatures(tumor.ref = sigs.input, 
                           signatures.ref = signatures.nature2013, 
                           sample.id = 1, 
                           contexts.needed = TRUE,
                           tri.counts.method = 'default')

sample_2 = whichSignatures(tumor.ref = sigs.input, 
                           signatures.ref = signatures.nature2013, 
                           sample.id = 2, 
                           contexts.needed = TRUE,
                           tri.counts.method = 'default')

可视化

# Plot example
plot_example <- whichSignatures(tumor.ref = randomly.generated.tumors, 
                       signatures.ref = signatures.nature2013, 
                       sample.id = 13)

# Plot output
plotSignatures(plot_example, sub = 'example')

这里是挑出第13个样本,看看它的signature的组成。randomly.generated.tumors是随机生成的一个signature,里面有100个样本,不过要记住,这个数据集里面的样本,其实是一堆样本的意思。因为对单个样本算signature意义不大,单个样本,如果是WES,那么somatic mutation可能就百来个而已。