0000000000010990

AUTHOR

Andreas Borg

showing 8 related works from this author

Evaluation of Record Linkage Methods for Iterative Insertions

2009

Summary Objectives: There have been many developments and applications of mathematical methods in the context of record linkage as one area of interdisciplinary research efforts. However, comparative evaluations of record linkage methods are still underrepresented. In this paper improvements of the Fellegi-Sunter model are compared with other elaborated classification methods in order to direct further research endeavors to the most promising methodologies. Methods: The task of linking records can be viewed as a special form of object identification. We consider several non-stochastic methods and procedures for the record linkage task in addition to the Fellegi-Sunter model and perform an e…

Boosting (machine learning)Medical Records Systems ComputerizedComputer scienceDecision treeHealth Informaticscomputer.software_genreMachine learningFuzzy LogicHealth Information ManagementGermanyExpectation–maximization algorithmHumansRegistriesAdvanced and Specialized NursingElectronic Data ProcessingModels Statisticalbusiness.industryData CollectionDecision TreesSupport vector machineClassification methodsMedical Record LinkageData miningArtificial intelligencebusinesscomputerAlgorithmsSoftwareRecord linkageMethods of Information in Medicine
researchProduct

Bagging, bumping, multiview, and active learning for record linkage with empirical results on patient identity data

2011

Record linkage or deduplication deals with the detection and deletion of duplicates in and across files. For this task, this paper introduces and evaluates two new machine-learning methods (bumping and multiview) together with bagging, a tree-based ensemble-approach. Whereas bumping represents a tree-based approach as well, multiview is based on the combination of different methods and the semi-supervised learning principle. After providing a theoretical background of the methods, initial empirical results on patient identity data are given. In the empirical evaluation, we calibrate the methods on three different kinds of training data. The results show that the smallest training data set, …

Patient Identification SystemsTraining setComputer scienceActive learning (machine learning)business.industryHealth InformaticsEmpirical Researchcomputer.software_genreMachine learningComputer Science ApplicationsTask (project management)Set (abstract data type)Tree (data structure)Artificial IntelligenceIdentity (object-oriented programming)HumansBumpingMedical Record LinkageArtificial intelligenceData miningbusinesscomputerSoftwareRecord linkageComputer Methods and Programs in Biomedicine
researchProduct

Deterministic Linkage as a Preceding Filter for Other Record Linkage Methods

2015

Deterministic record linkage (RL) is frequently regarded as a rival to more sophisticated strategies like probabilistic RL. We investigate the effect of combining deterministic linkage with other linkage techniques. For this task, we use a simple deterministic linkage strategy as a preceding filter: a data pair is classified as ‘match' if all values of attributes considered agree exactly, otherwise as ‘nonmatch'. This strategy is separately combined with two probabilistic RL methods based on the Fellegi–Sunter model and with two classification tree methods (CART and Bagging). An empirical comparison was conducted on two real data sets. We used four different partitions into training data a…

Linkage (software)education.field_of_studyComputer scienceDecision tree learningPopulationProbabilistic logiccomputer.software_genreFilter (higher-order function)Expectation–maximization algorithmComputer Science (miscellaneous)Data miningeducationcomputerAlgorithmRecord linkageTest dataInternational Journal of Information Technology & Decision Making
researchProduct

Active learning strategies for the deduplication of electronic patient data using classification trees.

2012

Graphical abstractDisplay Omitted Highlights? Active learning for medical record linkage is used on a large data set. ? We compare a simple active learning strategy with a more sophisticated variant. ? The active learning method of Sarawagi and Bhamidipaty (2002) 6] is extended. ? We deliver insights into the variations of the results due to random sampling in the active learning strategies. IntroductionSupervised record linkage methods often require a clerical review to gain informative training data. Active learning means to actively prompt the user to label data with special characteristics in order to minimise the review costs. We conducted an empirical evaluation to investigate whether…

Active learningComputer scienceActive learning (machine learning)Information Storage and RetrievalContext (language use)Health InformaticsSemi-supervised learningMachine learningcomputer.software_genreSet (abstract data type)Artificial IntelligenceBaggingData deduplicationElectronic Health RecordsHumansbusiness.industryString (computer science)Decision TreesOnline machine learningComputer Science ApplicationsData miningArtificial intelligenceMedical Record LinkageString metricbusinesscomputerAlgorithmsJournal of biomedical informatics
researchProduct

Controlling false match rates in record linkage using extreme value theory

2011

AbstractCleansing data from synonyms and homonyms is a relevant task in fields where high quality of data is crucial, for example in disease registries and medical research networks. Record linkage provides methods for minimizing synonym and homonym errors thereby improving data quality. We focus our attention to the case of homonym errors (in the following denoted as ‘false matches’), in which records belonging to different entities are wrongly classified as equal. Synonym errors (‘false non-matches’) occur when a single entity maps to multiple records in the linkage result. They are not considered in this study because in our application domain they are not as crucial as false matches. Fa…

Data cleansingData cleansingBiomedical ResearchDatabases FactualCalibration (statistics)Computer scienceHealth Informaticscomputer.software_genrePlot (graphics)Mean excess plotStatisticsRegistriesExtreme value theoryLinkage (software)Models StatisticalComputational BiologyFellegi–Sunter modelMixture modelGeneralized Pareto distributionComputer Science ApplicationsData qualityStatistics of extreme valuesDatabase Management SystemsMedical Record LinkageData miningcomputerAlgorithmsMedical InformaticsRecord linkageJournal of Biomedical Informatics
researchProduct

Missing values in deduplication of electronic patient data

2011

Data deduplication refers to the process in which records referring to the same real-world entities are detected in datasets such that duplicated records can be eliminated. The denotation ‘record linkage’ is used here for the same problem.1 A typical application is the deduplication of medical registry data.2 3 Medical registries are institutions that collect medical and personal data in a standardized and comprehensive way. The primary aims are the creation of a pool of patients eligible for clinical or epidemiological studies and the computation of certain indices such as the incidence in order to oversee the development of diseases. The latter task in particular requires a database in wh…

Computer sciencemedia_common.quotation_subjectInferenceHealth InformaticsAmbiguityPatient dataMissing datacomputer.software_genreResearch and ApplicationsRegressionNeoplasmsStatisticsData deduplicationElectronic Health RecordsHumansData miningImputation (statistics)Medical Record LinkageRegistriescomputerRecord linkagemedia_common
researchProduct

MAGICPL: A Generic Process Description Language for Distributed Pseudonymization Scenarios

2021

Abstract Objectives Pseudonymization is an important aspect of projects dealing with sensitive patient data. Most projects build their own specialized, hard-coded, solutions. However, these overlap in many aspects of their functionality. As any re-implementation binds resources, we would like to propose a solution that facilitates and encourages the reuse of existing components. Methods We analyzed already-established data protection concepts to gain an insight into their common features and the ways in which their components were linked together. We found that we could represent these pseudonymization processes with a simple descriptive language, which we have called MAGICPL, plus a relati…

Service (systems architecture)Biomedical ResearchComputer scienceProcess (engineering)computer.internet_protocolHealth InformaticsReuse03 medical and health sciences0302 clinical medicineHealth Information ManagementComponent (UML)Humans030212 general & internal medicinePseudonymizationComputer SecurityLanguageAdvanced and Specialized NursingClass (computer programming)Application programming interfacebusiness.industry030220 oncology & carcinogenesisSoftware engineeringbusinesscomputerConfidentialitySoftwareXMLMethods of Information in Medicine
researchProduct

A practical framework for data management processes and their evaluation in population-based medical registries.

2013

We present a framework for data management processes in population-based medical registries. Existing guidelines lack the concreteness we deem necessary for them to be of practical use, especially concerning the establishment of new registries. Therefore, we propose adjustments and concretisations with regard to data quality, data privacy, data security and registry purposes.First, we separately elaborate on the issues to be included into the framework and present proposals for their improvements. Thereafter, we provide a framework for medical registries based on quasi-standard-operation procedures.The main result is a concise and scientifically based framework that tries to be both broad a…

Information privacyNursing (miscellaneous)Computer scienceData managementPopulationData securityHealth InformaticsConcretenessComputer securitycomputer.software_genreData acquisitionHealth Information ManagementGermanyNeoplasmsHumansRegistrieseducationComputer Securityeducation.field_of_studybusiness.industryReference StandardsData scienceData qualityPopulation SurveillanceComputer data storageMedical Record LinkagebusinesscomputerConfidentialityInformatics for healthsocial care
researchProduct