Search results for "Hashing"

showing 8 items of 8 documents

Direct lookup and hash-based metadata placement for local file systems

2013

New challenges to file systems' metadata performance are imposed by the continuously growing number of files existing in file systems. The total amount of metadata can become too big to be cached, potentially leading to multiple storage device accesses for a single metadata lookup operation. This paper takes a look at the limitations of traditional file system designs and discusses an alternative metadata handling approach, using hash-based concepts already established for metadata and data placement in distributed storage systems. Furthermore, a POSIX compliant prototype implementation based on these concepts is introduced and benchmarked. A variety of file system metadata and data operati…

File systemData elementDatabaseComputer scienceFitxers informàtics -- OganitzacióComputer fileFile organization (Computer science)Meta Data Servicescomputer.file_formatMetadata placementRandomizationcomputer.software_genreMetadata repositoryTorrent fileMetadataFile system designDirect lookupHashingOperating systemData_FILESVersioning file systemMetadata performancecomputer:Informàtica::Sistemes operatius [Àrees temàtiques de la UPC]
researchProduct

Chaînage de bases de données anonymisées pour les études épidémiologiques multicentriques nationales et internationales : proposition d'un algorithme…

2009

Background: Compiling individual records coming from different sources is very important for multicenter epidemiological studies; however, European directives and other national legislation concerning nominal data processing must be respected. These legal aspects can be satisfied by implementing mechanisms that allow anonymization of patient data (such as hashing techniques). Moreover, for security reasons, official recommendations suggest using different cryptographic keys in combination with a cryptographic hash function for each study. Unfortunately, this type of anonymization procedure is in contradiction with common requirements in public health and biomedical research because it becom…

Identité du patient020205 medical informaticsEpidemiologyComputer scienceHash functionEncryptionCryptographyPatient identificationSécuritéDossier médical du patient02 engineering and technologyComputer securitycomputer.software_genreEncryptionPublic-key cryptography03 medical and health sciences[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR]0302 clinical medicineAnonymized dataHashingChainage de données0202 electrical engineering electronic engineering information engineeringCryptographic hash functionDonnées anonymisées[INFO.INFO-DB] Computer Science [cs]/Databases [cs.DB]030212 general & internal medicineChiffrementMulticenter studies[INFO.INFO-CR] Computer Science [cs]/Cryptography and Security [cs.CR]Secure Hash Algorithm[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB]business.industryUniversal hashingLinkageHachagePublic Health Environmental and Occupational Health16. Peace & justice3. Good health[SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologieEtudes multicentriquesSecurity[SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologiebusinesscomputerPersonally identifiable information
researchProduct

Large Scale Knowledge Matching with Balanced Efficiency-Effectiveness Using LSH Forest

2017

Evolving Knowledge Ecosystems were proposed to approach the Big Data challenge, following the hypothesis that knowledge evolves in a way similar to biological systems. Therefore, the inner working of the knowledge ecosystem can be spotted from natural evolution. An evolving knowledge ecosystem consists of Knowledge Organisms, which form a representation of the knowledge, and the environment in which they reside. The environment consists of contexts, which are composed of so-called knowledge tokens. These tokens are ontological fragments extracted from information tokens, in turn, which originate from the streams of information flowing into the ecosystem. In this article we investigate the u…

LSH forestekosysteemit (ekologia)evolving knowledge ecosystemsminhashbig datalocality-sensitive hashingtietotekniikkarandom hyperplane hashing
researchProduct

Locality-Sensitive Hashing for Massive String-Based Ontology Matching

2014

This paper reports initial research results related to the use of locality-sensitive hashing (LSH) for string-based matching of big ontologies. Two ways of transforming the matching problem into a LSH problem are proposed and experimental results are reported. The performed experiments show that using LSH for ontology matching could lead to a very fast matching process. The quality of the alignment achieved in these experiments is comparable to state-of-the-art matchers, but much faster. Further research is needed to find out whether the use of different metrics or specific hardware would improve the results. peerReviewed

Matching (statistics)Computer sciencebusiness.industryString (computer science)Hash functionBig datastring-based ontology matchingProcess (computing)computer.software_genreLocality-sensitive hashinglocality-sensitive hashingData miningbusinessOntology alignmentcomputer2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)
researchProduct

On the Influence of PRNGs on Data Distribution

2012

The amount of digital information produced grows rapidly and constantly. Storage systems use clustered architectures designed to store and process this information efficiently. Their use introduces new challenges in storage systems development, like load-balancing and data distribution. A variety of randomized solutions handling data placement issues have been proposed and utilized. However, to the best of our knowledge, there has not yet been a structured analysis of the influence of pseudo random number generators (PRNGs) on the data distribution. In the first part of this paper we consider Consistent Hashing [1] as a combination of two consecutive phases: distribution of bins and distrib…

Pseudorandom number generatorStructured analysisTheoretical computer scienceDistributed databaseComputer scienceRandom number generationServerLoad balancing (computing)Consistent hashingData structure2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing
researchProduct

Locality-sensitive hashing enables signal classification in high-throughput mass spectrometry raw data at scale

2021

Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: First, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Existing approaches for signal detection are usually not well suited for processing large amounts of data in parallel or rely on strong assumptions concerning the signals properties. In this study, it is shown that locali…

business.industryComputer scienceScalabilityHash functionPattern recognitionDetection theoryArtificial intelligenceMass spectrometrybusinessRaw dataThresholdingSynthetic dataLocality-sensitive hashing
researchProduct

Balanced Large Scale Knowledge Matching Using LSH Forest

2015

Evolving Knowledge Ecosystems were proposed recently to approach the Big Data challenge, following the hypothesis that knowledge evolves in a way similar to biological systems. Therefore, the inner working of the knowledge ecosystem can be spotted from natural evolution. An evolving knowledge ecosystem consists of Knowledge Organisms, which form a representation of the knowledge, and the environment in which they reside. The environment consists of contexts, which are composed of so-called knowledge tokens. These tokens are ontological fragments extracted from information tokens, in turn, which originate from the streams of information flowing into the ecosystem. In this article we investig…

evolving knowledge ecosystemsInformation retrievalComputer sciencebusiness.industryBig data02 engineering and technologyKnowledge ecosystemcomputer.software_genreLSH forestbig data020204 information systemsSchema (psychology)0202 electrical engineering electronic engineering information engineeringOntology020201 artificial intelligence & image processingData mininglocality-sensitive hashingbusinesscomputer
researchProduct

Twister Tries

2015

Many commonly used data-mining techniques utilized across research fields perform poorly when used for large data sets. Sequential agglomerative hierarchical non-overlapping clustering is one technique for which the algorithms’ scaling properties prohibit clustering of a large amount of items. Besides the unfavorable time complexity of O(n 2 ), these algorithms have a space complexity of O(n 2 ), which can be reduced to O(n) if the time complexity is allowed to rise to O(n 2 log2 n). In this paper, we propose the use of locality-sensitive hashing combined with a novel data structure called twister tries to provide an approximate clustering for average linkage. Our approach requires only lin…

ta113Hierarchical agglomerative clusteringta112Fuzzy clusteringBrown clusteringComputer scienceSingle-linkage clusteringcomputer.software_genreHierarchical clusteringLocality-sensitive hashingData setCURE data clustering algorithmlocality-sensitive hashingaverage linkageData miningHierarchical clustering of networkslinear complexityCluster analysishierarchical clusteringAlgorithmcomputerTime complexityProceedings of the 2015 ACM SIGMOD International Conference on Management of Data
researchProduct