6533b823fe1ef96bd127eb9d

RESEARCH PRODUCT

Chaînage de bases de données anonymisées pour les études épidémiologiques multicentriques nationales et internationales : proposition d'un algorithme cryptographique

Gouenou CoatrieuxManiane FassaBenoît RiandeyGilles TrouessinCatherine QuantinFrançois-andré Allaert

subject

Identité du patient020205 medical informaticsEpidemiologyComputer scienceHash functionEncryptionCryptographyPatient identificationSécuritéDossier médical du patient02 engineering and technologyComputer securitycomputer.software_genreEncryptionPublic-key cryptography03 medical and health sciences[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR]0302 clinical medicineAnonymized dataHashingChainage de données0202 electrical engineering electronic engineering information engineeringCryptographic hash functionDonnées anonymisées[INFO.INFO-DB] Computer Science [cs]/Databases [cs.DB]030212 general & internal medicineChiffrementMulticenter studies[INFO.INFO-CR] Computer Science [cs]/Cryptography and Security [cs.CR]Secure Hash Algorithm[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB]business.industryUniversal hashingLinkageHachagePublic Health Environmental and Occupational Health16. Peace & justice3. Good health[SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologieEtudes multicentriquesSecurity[SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologiebusinesscomputerPersonally identifiable information

description

Background: Compiling individual records coming from different sources is very important for multicenter epidemiological studies; however, European directives and other national legislation concerning nominal data processing must be respected. These legal aspects can be satisfied by implementing mechanisms that allow anonymization of patient data (such as hashing techniques). Moreover, for security reasons, official recommendations suggest using different cryptographic keys in combination with a cryptographic hash function for each study. Unfortunately, this type of anonymization procedure is in contradiction with common requirements in public health and biomedical research because it becomes almost impossible to link records from separate data collections where the same entity is not referenced in the same way. Solving this paradox using a methodology based on the combination of hashing and enciphering techniques is the main aim of this article.Methods: The method relies on one of the best-known hashing functions (the Secure Hash Algorithm) to ensure the anonymity of personal information while providing greater resistance to dictionary attacks, combined with encryption techniques. The originality of the method lies in how the hashing and enciphering techniques are combined: as in asymmetric encryption, two keys are used but the private key depends on the patient's identity.Results: The combination of hashing and enciphering techniques greatly improves the overall security of the proposed scheme.Conclusion: This methodology makes the stored data available for use in the field of public health for the benefit of patients, while respecting legal and security requirements.

10.1016/j.respe.2008.11.002https://hal.science/hal-00472976/document