The aim of this project is to extend an existing project, called Dedoop, with additional features. Dedoop enables similarities to be identified between pairs of entities within large datasets. The data is processed using MapReduce-based methods. Identifying similarities is a computation that requires a great deal of processing time. If a change has been made to one or more entities in the original set, the calculation must be restarted. The aim of this thesis is to avoid having to repeat the comparisons in full and to find a way, using previously calculated match results, to limit the process to only those specific subsets that require recalculation.
ThriftBooks sells millions of used books at the lowest everyday prices. We personally assess every book's quality and offer rare, out-of-print treasures. We deliver the joy of reading in recyclable packaging with free standard shipping on US orders over $20. ThriftBooks.com. Read more. Spend less.