卵巢癌突变过程驱动现场特异性免疫逃避

　　所有入学的患者均同意制度生物测量银行协议和MSK-IMPACT29测试，所有分析均根据Biospecimen研究方案进行。所有方案均由纪念斯隆·克特林癌症中心的机构审查委员会（IRB）批准。经过IRB批准的标准操作程序征求知情同意，患者同意。在进行任何与研究相关的程序之前，已获得所有患者的书面知情同意。该研究是根据赫尔辛基宣言和良好的临床实践指南进行的。　　我们在前期诊断性腹腔镜或伪造手术时收集了42例HGSOC患者的新鲜肿瘤组织。从原发性和多个转移性部位的腹水和肿瘤组织，包括双侧ADNEXA，Omentum，骨盆腹膜，双侧上象限和肠道，以预定的，系统的时尚（每位患者的四种原发性和转移性组织的中位数）进行，并放置在COLD RPMI中。在手术前收集血液样本，以分离正常WGS的周围血液单核细胞（PBMC）。将分离的细胞冷冻并储存在-80°C下。此外，将组织冻结以进行大量DNA提取和肿瘤WGS。还对组织学，免疫组织化学和多重免疫表型表征进行福尔马林固定和石蜡嵌入（FFPE）。　　我们使用五种不同的实验测定法对患者样品进行了分析：　　立即处理肿瘤组织以进行组织解离。将新鲜组织切成1毫米块，并在37°C下使用人类肿瘤解离试剂盒（Miltenyi biotec）在Gentlemacs Octo分离剂上分离。解离后，将单细胞悬浮液过滤，并用氯化铵钾（ACK）裂解缓冲液洗涤。用锥虫蓝色染色细胞，并使用Countess II自动细胞计数器（Thermofisher）评估细胞计数和活力（有关详细的方案，请参见参考文献30）。　　用Ghostred780 Live/Dead Marker（Tonbo Biosciences）和人类信任FCX FC FC受体阻滞溶液（Biolegend）的混合物（Biolegend）将新鲜分离的细胞染色。然后将染色的样品孵育并用Alexa Fluor 700抗人CD45抗体（Biolegend）染色。染色后，将细胞洗涤并重悬于RPMI+2％FC中，并提交用于细胞分类。通过在BD FACSARIA III流式细胞仪（BD Biosciences）上通过荧光辅助细胞分类（BD Biosciences），将细胞分类为CD45+和CD45-分数。制备正面和阴性对照，并用于在流式细胞仪上建立补偿。将细胞分类为包含RPMI+2％FC的试管进行测序。　　用锥虫蓝色染色流动的肿瘤细胞，并使用伯爵夫人II自动细胞计数器（Thermofisher）评估细胞数和生存力。在质量控制之后，将单细胞悬浮液加载到铬芯片B上（10倍基因组学，PN 2000060）。根据制造商的协议，使用铬单细胞3'试剂盒V3（10倍基因组学，PN 1000075）进行了1,400-5,000个细胞的宝石生成，cDNA合成，cDNA扩增和文库制备。cDNA扩增包括12个周期，并使用0.4-419 ng的材料来制备具有8-14个循环PCR的测序文库。　　Equimolar amounts of indexed libraries were pooled and sequenced on a HiSeq 2500 in rapid mode or on a NovaSeq 6000 in a 28-bp/91-bp, 100-bp/100-bp or 150-bp/150-bp paired-end run using HiSeq Rapid SBS kit v2 or NovaSeq 6000 SP, S1, S2 or S4 Reagent kit (100, 200 or 300 cycles) (Illumina). 　　在带电的显微镜载玻片上切成冷冻的储线组织切成部分。在组织学综述后，如果需要富集肿瘤细胞31，则将肿瘤组织进行显微解释，并进行大量WGS的DNA提取。使用Dneasy血液和组织试剂盒（QIAGEN）提取基因组DNA，并使用Qubit 3荧光计在Qubit 3荧光计上进行定量，并使用Qubit 1×DsDNA HS Assay试剂盒（Invitrogen）进行定量。　　根据制造商的方案，在55°C下孵育1小时，在冷PBS中提出了PBMC，并用Dneasy血液和组织试剂盒（Qiagen，69504）分离DNA，并用Dneasy Blood＆Tissue套件（Qiagen，69504）分离DNA。将DNA在0.5×缓冲液中洗脱。　　使用Quant-It Picogreen dsDNA分析（Thermofisher，P11496）测量DNA量，并使用贴纸D1000屏幕截图（Agilent，5067-5582）评估DNA质量。使用Agilent Bioanalyzer进行PICOGREEN定量和质量控制后，使用LE220-POPES POOKE ULTRASONITOTER（Covaris，500569）剪切500 ng基因组DNA，并使用KAPA Hyper Prep Kit（Kapa Biosystems，KKK8504）制备了测序库。简而言之，使用Ampure XP珠（Beckman Coulter，A63882）进行了0.5倍的尺寸选择，后进行了插入后进行清理。库不会通过PCR扩增，并以相等的体积合并，并根据其初始测序性能进行量化。使用Novaseq 6000 SBS V1套件和S1，S2或S4 Flow Cell（Illumina），在150 bp/150 bp配对的运行中以Novaseq 6000的速度运行样品。　　在先进的免疫功能平台实验室中，使用了存档的FFPE组织进行组织学综述，包括评估空间拓扑和肿瘤浸润淋巴细胞（TIL）以及免疫组织化学表征和MPIF分析，用于TME的MPIF分析。幻灯片最初是由妇科病理学家审查的诊断和FIGO（国际妇科和妇产科联合会）阶段分配的。对每个感兴趣地点的代表性H＆E染色幻灯片进行数字扫描以产生虚拟幻灯片。然后，两名高级妇科病理学家随后审查了这些图像，以了解浆液管内上皮内癌（STIC）（STIC），固定体系结构（固体，伪式 - 内象体和过渡性细胞样模式），微毛虫样体系，微毛虫体系结构32HPFS）和肿瘤细胞含量（可行百分比）。还以定量的TIL评分评估了带有TIL的区域（低， <42 TILs per HPF in a hotspot; high, 42 or more TILs per HPF in a hotspot)32. Histopathology slides were scanned into whole-slide images using a Leica Aperio AT2 scanner (Leica Biosystems) at ×20 magnification. The most representative tissue block was selected for slide scanning. 　　We carried out multiparameter quantification of epithelial and immune cell subsets and activation markers using the AkoyaBio Vectra automated imaging system at the MSKCC Parker Institute for Cancer Immunotherapy. We stained whole slides of FFPE tissue for markers of ovarian cancer cells (panCK + CK8–CK18) and of specific leukocyte subsets, including macrophages (CD68) and cytotoxic T cells (CD8), known immune inhibitory proteins (PD-L1) and markers of the activation/exhaustion status of CD8+ T cells (PD-1, TOX). FOVs were chosen to include either the entire tissue with minimal field overlap if the tissue was small or a distribution of fields with 50% stroma/tumour at the edge plus some central areas of tumour-dense fields. Quality control was performed on marker intensities so that they fell in the range of 5–30 arbitrary units and helped guide spectral unmixing. Lower values might be close to background, while higher values prompted us to check for channel spillage. 　　Primary antibody staining conditions were optimized using standard immunohistochemical staining on the Leica Bond RX automated research stainer with DAB detection (Leica Bond Polymer Refine Detection, DS9800). Using 4-µm FFPE tissue sections and serial antibody titrations, the optimal antibody concentration was determined followed by transition to a seven-colour multiplex assay with equivalency. Optimal primary antibody stripping conditions between rounds in the seven-colour assay were performed following a cycle of tyramide deposition followed by heat-induced stripping (see below) and subsequent chromogenic development (Leica Bond Polymer Regine Detection, DS9800) with visual inspection for chromogenic product with a light microscope by a senior pathologist. Multiplex assay antibodies and conditions are described in Supplementary Table 6. 　　Tissue sections were baked for 3 h at 62 °C in vertical slide orientation with subsequent deparaffinization performed on the Leica Bond RX followed by 30 min of antigen retrieval with Leica Bond ER2 and six sequential cycles of staining with each round including a 30-min combined block and primary antibody incubation (Akoya Antibody Diluent/Block, ARD1001). 　　For panCK and CK8–CK18, detection was performed using a secondary horseradish peroxidase (HRP)-conjugated polymer (Akoya Opal Polymer HRP Ms + Rb, ARH1001; 10-min incubation). Detection of all other primary antibodies was performed using a goat anti-mouse Poly HRP secondary antibody or goat anti-rabbit Poly HRP secondary antibody (Invitrogen, B40961 and B40962; 10-min incubation). The HRP-conjugated secondary antibody polymer was detected by fluorescent tyramide signal amplification using Opal dyes 520, 540, 570, 620, 650 and 690 (Akoya, FP1487001KT, FP1494001KT, FP1488001KT, FP1495001KT, FP1496001KT, FP1497001KT). The covalent tyramide reaction was followed by heat-induced stripping of the primary antibody–secondary antibody complex using PerkinElmer AR9 buffer (AR900250ML) and Leica Bond ER2 (90% ER2 and 10% AR9) at 100 °C for 20 min before the next cycle (one cycle of stripping for CD68, PD-1, PD-L1, CD8 and panCK/CK8/CK18 and two cycles of stripping for TOX). After six sequential rounds of staining, sections were stained with Hoechst (Invitrogen, 33342) to visualize nuclei and mounted with ProLong Gold antifade reagent mounting medium (Invitrogen, P36930). 　　Seven-colour multiplex-stained slides were imaged using Vectra Multispectral Imaging System version 3 (PerkinElmer). Scanning was performed at ×20 magnification (×200 final magnification). Filter cubes used for multispectral imaging were DAPI, FITC, Cy3, Texas Red and Cy5. A spectral library containing the emitted spectral peaks of the fluorophores in this study was created using Vectra image analysis software (PerkinElmer). Using multispectral images from slides singly stained for each marker, the spectral library was used to separate each multispectral cube into individual components (spectral unmixing), allowing for identification of the seven marker channels of interest, using InForm 2.4 image analysis software. 　　The pipeline was built using the 10x Genomics Martian language and computational pipeline framework. CellRanger software (version 3.1.0) was used to perform read alignment, barcode filtering and unique molecular identifier (UMI) quantification using the 10x GRCh38 transcriptome (version 3.0.0) for FASTQ inputs. 　　CellRanger-filtered matrices were loaded into individual Seurat objects using the Seurat R package (version 3.0.1)34,35. The resulting gene-by-cell matrix was normalized and scaled for each sample. Cells retained for analysis had a minimum of 500 expressed genes and 1,000 UMI counts and had less than 25% mitochondrial gene expression. Cell cycle phase was assigned using the Seurat CellCycleScoring function. Scrublet (version 0.2.1) was used to calculate and filter cells with a doublet score greater than 0.25. Sample matrices were merged by patient and subsequently renormalized and scaled using default Seurat functions. 　　Major cell type assignments were computed for each patient with CellAssign (version 0.99.2)36 using a set of curated marker genes. Marker genes were compiled for nine major cell types related to HGSOC (Supplementary Table 4). These major cell types were defined as T cells, B cells, plasma cells, myeloid cells, DCs, mast cells, endothelial cells, fibroblasts and ovarian cancer cells. Before running CellAssign, cells with zero expression for all marker genes were removed from the count matrix. Cell-specific size factors were computed using scran (version 3.11). Default CellAssign parameters were used with a design matrix of patient batch labels. CellAssign returned a probability distribution over the major cell types, and individual cells were labelled by the resulting most probable cell type. 　　Principal-component analysis (PCA) was performed on the filtered feature-by-barcode matrix. UMAP embeddings including cohort-level and patient-level embeddings for all major cell types were based on the first 50 principal components. UMAP embeddings of major cell type supersets (see below) were based on the 50 batch-corrected harmony components. Diffusion map embeddings and pseudotime estimates were computed using the R package destiny (v3.0.1) for the subset of CD8+ T cells37. 　　Major cell types identified across samples were split into six supersets: (1) T cells; (2) B cells and plasma cells; (3) myeloid cells, DCs and mast cells; (4) fibroblasts; (5) endothelial cells; and (6) ovarian cancer cells. For each superset, the R package harmony (version 0.1) was used for batch correction to account for patient-specific effects38. 　　Graph-based clustering was performed for each superset using the Louvain algorithm implemented in Seurat (version 3.0.1) at three different resolutions (0.1, 0.2 and 0.3). Differential expression between identified clusters was computed using a two-sided Wilcoxon rank-sum test as implemented in Seurat FindMarkers. Final results were filtered on log(fold change) >0.25和Benjamini – Hochberg调整后的P< 0.05. Clusters were annotated on the basis of marker genes identified in differential gene expression analysis. Patient-specific clusters not represented across the full cohort were identified using relative entropy. Relative entropy per cluster was defined as the maximum entropy per cluster divided by the empirical entropy of patient compositions. Clusters with a relative entropy of <0.8 were considered patient-specific clusters and disregarded for downstream analyses. 　　For T cell clusters, T cells and NK cells were clustered in two steps. Initial coarse-grained clustering resulted in ten different T and NK cell clusters, including four CD4+ T cell clusters, three CD8+ T cell clusters, two NK cell clusters and one cycling T/NK cell cluster (Extended Data Fig. 5a). Subclustering identified a total of 41 distinct fine-grained clusters, broadly defining major T cell and NK cell subtypes (Fig. 2a and Extended Data Fig. 5b). These included populations of CD4+ naive and central memory cells (expressing IL7R and TCF7), CD4+ effector memory cells (IL7R, CCL5 and KLRB1), early and late dysfunctional CD4+ T cells (expressing dysfunctional T cell markers CXCL13, TOX2 and PDCD1), regulatory T cells (FOXP3 and IL2RA) and type 17 helper T cells (KLRB1, RORA and RORC). In the CD8+ compartment, we also identified populations of naive/central memory (expressing KLF2, KLF3 and TCF7), activated/cytotoxic (GZMH, GZMK and HLA-DR) and early and late dysfunctional (CXCL13, TOX2, LAG3, HAVCR2, TIGIT and PDCD1) T cells. Notably, the early dysfunctional cluster, in addition to exhaustion-associated genes, was characterized by expression of CXCR6 and ITGAE, commonly used to define tissue-resident memory T cells. In the innate compartment, we similarly identified several clusters, including a γδ T cell cluster and several NK cell clusters. Finally, in all compartments, we identified populations of cells marked by expression of type I IFN response genes such as ISG15 and IFIT3, herein named CD4-ISG, CD8-ISG and NK-ISG, with strong upregulation of the JAK–STAT signalling pathway as the dominant feature of these cells (Fig. 2b). The remaining clusters consisted of cycling T and NK cells expressing S phase markers such as MKI67 and G2M markers such as TOP2A (Supplementary Table 4). 　　For myeloid cell clusters, cDCs of the myeloid lineage were separated into cDC1s, cDC2s and mDCs, marked by expression of CLEC9A, S100B and BIRC3, respectively (Extended Data Figs. 5d and 6a). pDCs were marked by expression of PTGDS. Macrophage clusters were described with respect to their classical (M1-like) or alternative (M2-like) polarization. Six different clusters encompassing both classical and alternatively activated macrophages were identified, as well as a cluster of cycling macrophages (Cycling.M) and a cluster of actively phagocytic macrophages (Clearing.M). The M1-like and M2-like clusters were labelled according to the top genes defining the clusters (M1.S100A8, M2.CXCL10, M2.SELENOP, M2.MARCO, M2.COL1A1, M2.MMP9) (Extended Data Figs. 5d and 6b). Among these, the M1.S100A8 cluster was the only unambiguous M1-type macrophage cluster, marked by expression of pro-inflammatory calcium-binding protein genes S100A8 and S100A922. The M2.CXCL10 cluster was characterized by expression of both M1 (for example, CXCL10) and M2 (for example, PDL1 and C1QC) markers. CXCL10 is an established downstream target of type I and type II IFN signalling and was found to be expressed along with other CXC-motif chemokines (CXCL9 and CXCL11). The remaining M2 clusters were all marked by high expression of complement component C1QC, which is known to promote M2 polarization23. 　　InferCNV (version 1.3.5)39,40 was used to identify large-scale copy number alterations in ovarian cancer cells classified by CellAssign. To do this, 3,200 non-cancer cells were randomly sampled from the cohort and used as the set of reference ‘normal’ cells. After subtracting out reference expression in non-cancer cells, chromosome-level smoothing and denoising with InferCNV, we derived a processed expression matrix that represented copy number signals. Cancer cell subclusters were identified by ward.D2 hierarchical clustering and the ‘random_trees’ partition method using P < 0.05. 　　Cell state scores were calculated for the exhausted phenotype within the set of T cells using a manually curated list of genes as input to the Seurat AddModuleScore method40. The curated list of genes was derived from a review of single-cell analyses of CD8+ T cell states in human cancers41 (Supplementary Table 4). 　　Patient specificity scores were computed by using a shared nearest-neighbour graph. For a given cell, patient specificity was defined as the observed fraction of nearest neighbours divided by the expected fraction of nearest neighbours in the patient subgraph. Here the expected fraction of neighbours from the same patient was defined as the global fraction of cells for each patient. Scores were log2 transformed. Hence, a positive patient specificity score indicates an over-representation of cells derived from the same patient among its nearest neighbours, a negative score indicates an under-representation of cells from the same patient and a score of 0 reflects a perfectly mixed neighbourhood of patient labels. 　　To calculate intra-sample diversity of cluster composition, we used the Shannon entropy H: 　　where pc is the proportional abundance of cluster c and C is the total number of clusters. 　　To estimate the similarity or dissimilarity between samples, we used the Bray–Curtis dissimilarity index D for samples i and j, defined as 　　where and are the counts for cluster c in samples i and j, respectively, and C is the total number of clusters. This measure D takes values between 0 (identical samples: for all j) and 1 (disjoint samples: implies ). We only considered the triangular distance matrix D such that i < j. The pairwise distance matrix was estimated by randomly subsampling the dataset with a minimum number of cells per sample and averaging over the subsampled datasets after 100 iterations. We then evaluated intra- and inter-patient dissimilarity on the basis of the distributions of the off-diagonal elements in the averaged distance matrix (for example, all pairs of adnexal samples or all pairs of HRD-Dup samples). 　　These definitions were used to estimate the intra-sample diversity, intra-patient dissimilarity and inter-patient dissimilarity of cluster composition of cell states within each major cell type superset (cancer cells, Fig. 3g; T and NK cells, Figs. 2d and 4f; myeloid cells, Extended Data Figs. 6d and 11b). Rarefaction of samples was applied in estimation of the Bray–Curtis dissimilarity matrix on the basis of the number of cells for each subset (n = 400 cells per sample). 　　Finally, we also used non-metric multidimensional scaling (NMDS) to visualize the pairwise distances of cell type abundances in low-dimensional space. We used the pairwise dissimilarity matrix D to calculate the rank order of the Bray–Curtis distance and project differences in cluster composition in two dimensions using NMDS (cancer cells, Extended Data Fig. 8i; T and NK cells, Extended Data Figs. 7c and 10c; myeloid cells, Extended Data Figs. 7h and 11f). 　　To estimate the effect of mutational signatures and tumour site specificity on the composition of cell clusters, we considered a GLM where we included interactions between signature, site and cluster identity for each major cell type defined in the scRNA-seq, H&E and mpIF data. The data matrix included the counts of every cluster c, sampled from site s in a patient with mutational signature subtype m. Using a binomial linear model, one can analyse counts of repeated observations of cell types or cell states as binary choices: 　　where Nc is the cell count for cluster c in a sample, N is the total number of cells in the sample and the probability to detect the cluster can be described by the logit function 　　To account for the effect of mutational signature and anatomical tumour site on the cluster abundance observed in scRNA-seq data, we formulated a GLM of the observed cell counts Nc for a cell type or cell state described by the logit function, which is distributed as 　　where β0 is a shared constant baseline per cluster that must be inferred; βc, βm and βs are individual fixed-effect terms to be inferred; βcm and βcs are cluster–signature and cluster–site interaction effects to be inferred; xc, xm and xs are elements of the model design matrix X; and σε represents measurement noise. We note that for each cluster c we had multiple measurement replicates of Nc across signatures and sites. This formulation was used to fit a GLM of major cell types (Fig. 1f). We also used this formulation to separately fit GLMs of cluster composition for each superset of coarse-grained immune cell types (T and NK cells, Extended Data Fig. 7b; myeloid cells, Extended Data Figs. 7g and 11e) and GLMs of cluster composition for fine-grained immune cell states (T and NK cells, Fig. 2c; DCs, Extended Data Fig. 11a; macrophages, Extended Data Fig. 11a). 　　To model the abundance of major cell types in the scRNA-seq data from CD45+ and CD45− samples, the GLM included a covariate for CD45+/− flow sorting with additional fixed-effect sorting coefficients βf and additional cluster sorting interactions βcf to be inferred, plus an additional element xf in the model design matrix (Fig. 1f). Similarly, GLMs for H&E and mpIF data accounted for differences in cell type abundance observed in the tumour and stroma regions, incorporating a covariate for the tumour or stroma region counts with additional fixed-effect region coefficients βr and additional cluster–region coefficients βcr to be inferred, plus an additional element xr in the model design matrix (Fig. 1f). 　　To quantify interactions between mutational signature and anatomical tumour site, we also fitted GLMs with an additional interaction term: 　　where βcsm terms were cluster-specific signature–site interaction effects to be inferred. This formulation was used to fit GLMs of cluster composition of cell states within each major cell type superset, both for fine-grained clusters (cancer cells, Fig. 3d; T and NK cells, Fig. 4b; DCs, Extended Data Fig. 11a; macrophages, Extended Data Fig. 11a) and coarse-grained clusters (T and NK cells, Extended Data Fig. 10e; myeloid cells, Extended Data Fig. 11e). 　　To determine potentially interacting cell type subclusters for the receptor-ligand pair PD-1–PD-L1/PD-L2, we first computed the fraction of sender cells (cancer cell or myeloid cell clusters) expressing the PD-L1 and PD-L2 ligands (CD274 or PDCD1LG2 read counts >0 in >10％的细胞）和表达PD-1受体的接收细胞（T细胞簇）的分数（PDCD1读数> 0> 0 in> 10％的细胞）。共表达网络的构建如下：对于一组相同突变亚型的患者（扩展数据图13F），如果配体（CD274或PDCD1LG2）和受体（PDCD1）在该组中，则在子接收器（CD274或PDCD1LG2）和接收器中，在该组中，将在发送器细胞簇和接收器细胞簇之间绘制边缘。　　使用Burrows-Wheeler Anigner（BWA-MEM）V0.7.17-R1188（https://sourceforge..net/project/projects/bio-bwa/），使用Burrows-Wheeler Aligner（BWA-MEM）V0.7.17.17-R1188（BWA-MEM）v0.7.17.17-rigner（BWA-MEM）对齐读取读数。　　使用https：//github.com/shahahcompbio/maintseq的MutationSeq（版本4.3.8; Model V4.1.2.NPZ）调用单核苷酸变体（SNV）和Indels。我们还使用带有默认参数设置的Strelka（2.8.2版）来识别体snvs和Indels42。然后，使用SNPEFF4（版本5.0E）对SNV和Indels进行注释，以获得变异效应和基因编码状态。我们通过从突变Seq（概率≥0.9）和strelka预测的SOMATIC SNV中预测的高概率调用来确定一组高信任SNV。通过删除以下两个区域的位置进一步过滤SNV的高信心集：（1）UCSC基因组浏览黑名单（DUKE和DAC）和（2）在“ CRG可矫正性36mer track”中定义的区域与两个不超过两个核苷酸匹配的区域，允许两个核苷酸材料，从而使两个核苷属于基因，甚至在基因上是独特的。从Strelka的这组高信任SNV和Somatic Indels进行后处理涉及删除已知变体（SNV和Indels），这些变体是从1000个基因组项目（发行20130502）和DBSNP（版本DBSNP DBSNP 142.HUMAN 9606）中获得的。然后将通过上述过滤器的高信心体SNV和Indels集用于特征计算中，以进行突变签名分析和新抗原预测。　　使用块（0.2.12）43（版本0.1.08）44和Destruct（版本0.4.18）衍生自NFUSE45执行的块（版本0.2.12）43预测重排断点，可在https://github.com/amcpherson/destruct上获得。简而言之，破坏性提取的不和谐和非映射从BAM文件读取，并使用种子和扩展策略重新调整读取。试图跨推定的断点进行分开对齐，以进行与单个基因座完全对齐的读数。根据相同断点产生的可能性，将不一致的比对聚集。使用先前描述的方法46将乘积映射读数分配给单个映射位置。最后，启发式过滤器删除了预测的断点，而序列侧面预测断点的序列的读取范围差。　　我们应用了严格的三步过滤标准，以识别高信心断点呼叫进行下游分析，如下所示：　　步骤1：向前带来了两种算法（块状和破坏）预测的断点。　　步骤2：我们删除了（1）分类差的区域中的断点，（2）断路距离≤30bp的事件，（3）被注释的断点作为删除，断点大小为删除 <1,000 bp. Furthermore, only high-confidence breakpoints that had at least five supporting reads in the tumour sample and no read support in the matched normal sample were used in the analysis. The breakpoints were further filtered by removing positions that fell in either of the following regions: (1) UCSC Genome Browser blacklists (Duke and DAC) and (2) regions defined in the ‘CRG Alignability 36mer track’ with more than two nucleotide mismatches, requiring a 36-nucleotide fragment to be unique in the genome even after allowing for two differing nucleotides. 　　Step 3: Predictions with a small break distance and a low number of supporting reads in tumour samples were excluded. 　　Genome-wide allele-specific copy number was called in matched tumour–normal WGS samples using ReMixT47 and TitanCNA48 with default parameters. A parameter grid search for multiple purity and ploidy solutions was carried out, and the top solution was selected after manual assessment of the copy number segmentations. All tumour samples were run with ploidy = 2 and ploidy = 4 initializations. 　　We used a commercial assay (Myriad Genetics ‘myChoice CDx’) to test for genome-wide LOH, the number of chromosomal breakpoints in large-scale state transitions and telomeric allelic imbalance. If the resulting HRD score was greater than 42, the sample was deemed to be HRD. 　　Genomic DNA isolated from FFPE tumour tissue and matched normal blood was subjected to hybridization capture and sequenced with deep coverage (700×)49. Variant calling for the MSK-IMPACT gene panel and copy number analysis were performed using the MSK-IMPACT clinical pipeline (https://github.com/mskcc/Innovation-IMPACT-Pipeline). 　　We analysed mutational signatures by integrating SNVs and structural variations detected by bulk WGS in a unified probabilistic approach called multimodal correlated topic models (MMCTM)6. MMCTM analysis enables robust determination of mutational signatures and their correlation structure and delineation of subgroupings based on point mutation signatures50 and structural variations. 　　We estimated signature probabilities for bulk WGS samples in the MSK SPECTRUM cohort (n = 40) using MMCTM, on the basis of SNV and structural variation signatures inferred from HGSOC (n = 170) and triple-negative breast cancer (n = 139) bulk whole genomes (total n = 309) (Extended Data Fig. 2b). By clustering the meta-cohort of 309 HGSOC and triple-negative breast cancer samples using UMAP and HDBSCAN51, we used the meta-cohort as a training dataset to fit a k-nearest-neighbour (kNN) classifier and applied the kNN classifier to the SPECTRUM samples (n = 40), assigning them into one of four strata defined solely by SNV and structural variation signature probabilities. A nearest-neighbour graph was built using a Euclidean distance metric, and classification into strata was computed by a majority vote of the k nearest neighbours of the unknown test sample (k = 30), requiring m votes for an assignment (m = 25). The four strata included those with samples enriched for (1) BRCA1-associated HRD point mutation signatures accompanied by tandem duplications (HRD-Dup), (2) BRCA2-associated HRD point mutation signatures accompanied by interstitial deletions (HRD-Del), (3) foldback inversions mediated by breakage–fusion bridge cycles (FBI) and (4) a group of ambiguous samples near the classifier decision boundaries (‘Undetermined’) (Extended Data Fig. 2c). 　　To validate the MMCTM mutational signatures, we used two independent computational methods (Extended Data Fig. 2b). We applied HRDetect18 to validate HRD status on the basis of SNV signatures previously associated with HRD (SBS3, SBS8), short microhomology-mediated indels (ID8) and rearrangement signatures (RS3, RS5). Samples with an HRDetect score of >0.1定义为HRD。我们还应用CHORD19来验证HRD-DEL病例中HRD-DUP的HRD状态和分层。和弦结合了SNV，Indels和结构变化，并依赖于重复（1-100 kb），以区分BRCA1样与BRCA2样HRD。　　WGS衍生的HRD签名与BRCA1或BRCA2损失的七个病例中有七个一致（扩展数据图2B）。在六个病例中的五个中，基于WGS的和护理标准的人力资源管理状况是一致的。通过所有三种独立的WGS签名推理（MMCTM，HRDetect和Chord），将不一致的情况（024）视为HRD。　　我们使用RemixT47推断的WGS复制号码将复制号码更改为MSK Spectrum COHORT中的焦距放大和删除。对于局灶性放大，我们计算了每个基因的百分位数，相对于整个基因组的总拷贝数变化的累积分布。根据每个基因之间的平均拷贝数，我们将高级放大归类为前2％的垃圾箱的放大，而log2转换的变化比倍数大于1的变化。对于纯合删除，我们考虑了重叠的片段中的基因拷贝数，我们将份额分类为10 kb或少于0.5的副本，将其分类为10 kb或更大的拷贝数。　　同样，我们使用了Facets52推断出的冲击拷贝数来描述MSK Impact HGSOC同事中的焦点放大和纯合删除。基于每个段的中间拷贝数比率鉴定焦点扩增和缺失，仅考虑比10 MB短的片段，其基因抑制了十个或更少的基因以抑制ARM级别的事件。总拷贝数大于8的段被认为是高级扩增。纯合删除被要求用于片段，总拷贝数为0。　　为了检测由SCRNA-SEQ介绍的单个单元格中HLA基因座的等位基因特异性拷贝数LOH，我们推断了使用信号5的HLA I和II基因的染色体ARM 6P上的等位基因特异性变化，使用信号5。我们首先使用细胞NP53称为SCRNA-SEQ肿瘤数据中的种系杂合杂合单核苷酸多态性（SNP）。作为输入，我们使用了每个样本中相应的正常WGS数据集中识别的杂合子SNP集。细胞NP中提供的提升脚本用于将SNP坐标从GRCH37（HG19）参考基因组提起到GRCH38参考基因组。基因分型后，我们汇总了所有细胞的SNP计数，并将B等位基因定义为每个SNP等位基因频率最低的等位基因。由于SNP计数在SCRNA-Seq数据中非常稀疏，因此我们跨染色体臂汇总了B等位基因的细胞水平计数，以计算每个单元中每个臂的BAF。然后，我们生成了一个细胞染色体臂BAF矩阵，并将其掺入Seurat基因表达对象中。为了将等位基因不平衡状态（平衡，不平衡，LOH）分配给每个细胞中的染色体臂，我们使用每个单元的每个臂的平均BAF如下：平衡，BAF≥0.35；不平衡，0.15≤BAF<0.35;loh，baf <0.15。文档和代码可在https://shahahcompbio.github.io/signals/上找到。　　为了验证我们对HLA基因座染色体ARM 6P的等位基因特异性变化的观察，我们从肿瘤中检测到基因级HLA I类LOH，并匹配正常WGS数据，并从肿瘤中匹配，并使用LOHHLA24匹配了正常的MSK-Impact数据。　　为了通过WGS验证HLA LOH状态，我们使用了40名患者的40对肿瘤 - 正常对。使用RemixT47估算肿瘤纯度和倍性，并用于随后的HLA LOH分析。为了通过MSK Impact验证HLA LOH状态，我们根据HGSOC或HGSFT ONCOTREE CLANSICAIGY在MSK-Impact组中选择了1,298例与HGSOC组织学的1,298例肿瘤 - 正常对。该队列不包括来自MSK Spectrum队列的患者的MSK影响样本。　　患者HLA参考是根据WGS和MSK-IMPACT数据的肿瘤（V4）55的正常读数构建的。使用RemixT47估算了WGS数据集中的肿瘤纯度和倍性，并用于随后的HLA LOH分析。同样，使用Facets52估算了MSK-Impact数据集的肿瘤纯度和倍性。使用Lohhla在肿瘤样品中要求HLA LOH进行等位基因。如果估计的拷贝数<0.2，则观察到每个HLA基因的LOH，并且等位基因不平衡的统计显着性为P <0.01，这对两个HLA同源物之间的对数（R）值的成对差异（配对t检验）进行了测试。　　我们为扫描的H＆E图像构建了蜂窝注释的培训数据集。在MSK幻灯片查看器上进行了H＆E载玻片中存在的细胞和组织类型的专家描述和定量，这是一种用于综述和注释组织病理学图像的计算病理界面。使用Stardist进行核分割，这是一种基于U-NET神经网络结构的核检测方法56,57。使用3μM核边界的细胞扩展近似膜分割。培训数据集涵盖了一组来自代表性的患者和部位的61张幻灯片。为了对肿瘤，基质，脉管系统和坏死区域进行分类，我们使用QuPath（v0.2.3）56训练了一个基于人工神经网络（ANN）基于人工神经网络（ANN）的像素分类器，该分类器（v0.2.3）56在图像内的多个通道和尺度上以高阶像素特征运行。此外，使用MSK幻灯片查看器的研究人员在其中19个幻灯片中注释了淋巴细胞和“其他”细胞。将这些注释导入Qupath之后，以及由Stardist产生的细胞分割和特征向量后，我们训练了基于ANN的细胞分类器，该分类器在细胞测量中运行以识别淋巴细胞。然后，我们应用了这些模型，以在35名患者的100个全坡度H＆E图像中进行推断。分割的样品总共产生了24,628,462个细胞，我们使用模型输出来计算淋巴细胞密度和其他空间得出的测量值的统计。　　我们使用QuPath（v0.2.3）56中的流域算法基于DAPI强度进行了核分割，将最小DAPI阈值设置为1个任意单位的最小DAPI阈值，其预期核面积在5μm2和100μm2之间。使用3μM核边界的细胞扩展近似膜分割。从35名患者的100个组织样品中的1,349个质量过滤的FOV开始，分割总共有10,892,612个细胞。为了注释肿瘤和基质的区域，我们训练了一个像素分类器，其中有panck+（肿瘤）和panck-（基质）区域的示例。核分割后，我们提取了在细胞质（Panck，CD68，CD8，PD-1，PD-L1）和细胞核（TOX）中表达的功能标记的每个细胞的像素强度，以定义细胞类型和细胞态。所有通道均以每载幻灯片至少一个FOV手动阈值进行手动阈值，并通过将这些阈值设置在平均像素强度上来确定标记阳性。将多个细胞类型标记（Panck，CD68，CD8）的双重或三重阳性的分段对象计数为单独的细胞，总共产生12,359,463个单细胞。标记分配用于定义上皮细胞（PCK+PD-L1-，PANCK+PD-L1+），巨噬细胞（CD68+PD-L1-，CD68+PD-L1+）和CD8+T细胞（CD8+PD-1- TOX-，CD8+PD-1+PD-1+PD-1+PD-1+tox-至CD8+PD-1+PD-1+pd+pd+pdds+pd+pdd+pdds+desx+，　　空间拓扑的分析包括估计空间密度和近端距离距离。空间密度估计是通过将每个FOV边界的10μm距离带中的细胞计数汇总在跨FOV的10μM距离带中，并通过FOV分组，并由给定的感兴趣表型的细胞总数进行标准化，从而获得了与肿瘤 - 肌关系边界的距离。将误差线计算为观察给定表型的概率P的标准误差，其中n是距离频带中的单元格总数。使用距离i和j的距离矩阵rij计算最近邻居之间的细胞间距离，其中矩阵中的（i，j）元素的值是从单元I到细胞j的径向距离。在计算每个邻居的计算后，估计了每个表型最近近距离距离的汇总统计数据。还根据最近的邻居确定了固定半径R内表型的接近计数。　　有关研究设计的更多信息可在与本文有关的自然投资组合报告摘要中获得。

本文来自作者[lejiaoyi]投稿，不代表言希号立场，如若转载，请注明出处：https://lejiaoyi.cn/zlan/202506-1253.html