0000000000547710

AUTHOR

Johann-christoph Freytag

showing 1 related works from this author

Set similarity joins on mapreduce

2018

Set similarity joins, which compute pairs of similar sets, constitute an important operator primitive in a variety of applications, including applications that must process large amounts of data. To handle these data volumes, several distributed set similarity join algorithms have been proposed. Unfortunately, little is known about the relative performance, strengths and weaknesses of these techniques. Previous comparisons are limited to a small subset of relevant algorithms, and the large differences in the various test setups make it hard to draw overall conclusions. In this paper we survey ten recent, distributed set similarity join algorithms, all based on the MapReduce paradigm. We emp…

Computer scienceProcess (engineering)General EngineeringJoinsScale (descriptive set theory)02 engineering and technologycomputer.software_genreSet (abstract data type)Range (mathematics)Operator (computer programming)Similarity (network science)020204 information systems0202 electrical engineering electronic engineering information engineeringJoin (sigma algebra)020201 artificial intelligence & image processingData miningcomputerProceedings of the VLDB Endowment
researchProduct