6533b839fe1ef96bd12a5cfc

RESEARCH PRODUCT

Methods of spatial cluster detection in rare childhood cancers: Benchmarking data and results from a simulation study on nephroblastoma

Hermann BrennerToni LangeClaudia SpixMichael M. SchündelnKayvan BozorgmehrMaximilian KnollChristian StockChristian Stock

subject

Simulation studyComputer scienceScan statisticBayesian probabilityMedizinContext (language use)lcsh:Computer applications to medicine. Medical informaticsBayesian03 medical and health sciences0302 clinical medicineRandom distributionStatisticsCluster analysislcsh:Science (General)NephroblastomaData Article030304 developmental biology0303 health sciencesMultidisciplinaryBenchmarkingIdentification (information)Besag-NewellLaplace's methodSpatial clusterlcsh:R858-859.7Besag York MolliéRaw dataChildhood cancerSpatial scan statistic030217 neurology & neurosurgerylcsh:Q1-390

description

Abstract The potential existence of spatial clusters in childhood cancer incidence is a debated topic. Identification of rare disease clusters in general may help to better understand disease etiology and develop preventive strategies against such entities. The incidence of newly diagnosed childhood malignancies under 15 years of age is 140/1,000,000. In this context, the subgroup of nephroblastoma represents an extremely rare entity with an annual incidence of 7/1,000,000. We evaluated widely used statistical approaches for spatial cluster detection in childhood cancer (Ref. [22] Schundeln et al., 2021, Cancer Epidemiology). For the simulation study, random high risk clusters of 1 to 50 adjacent districts (NUTS-level 3, nomenclature des unites territoriales statistiques) were generated on the basis of the 402 German administrative districts. Each cluster was simulated with different relative risk levels (1 to 100). For each combination of cluster size and risk level 2000 iterations were performed. Simulated data was then analyzed by three local clustering tests: Besag-Newell method, spatial scan statistic and the Bayesian Besag-York-Mollie approach (fit by Integrated Nested Laplace Approximation). The performance characteristics of all three methods were systematically documented (sensitivity, specificity, positive/negative predictive values, exact- and minimum power, correct classification, positive/negative diagnostic likelihood and false positive/negative rate). This data article links to a Mendeley online repository which includes the raw data of simulated high-risk clusters and simulated cases on the district level for an all-childhood-malignancy scenario as well as for cases of nephroblastoma. These data was used for the evaluation of the three cluster detection methods. The R code for simulation and analysis are available from GitHub. The article also includes analyzed data summarizing the performance of the cluster detection tests in very rare disease entities, using the example of simulated nephroblastoma cases. The raw data from the study can be used for benchmarking analyses applying different spatial statistical methods systematically and evaluating their performance characteristics comparatively. The analyzed data from the nephroblastoma example can be useful to interpret the performance of the three applied local cluster detection tests in the setting of extremely rare disease entities. As a practical application, data and R code can be used for performance analyses when planning to establish surveillance systems for rare disease entities.

10.1016/j.dib.2020.106683http://www.sciencedirect.com/science/article/pii/S2352340920315626