BF-MapReduce: A Bloom Filter Based Efficient Lightweight Search

Abstract

MapReduce is an attractive programming model for large-scale data-parallel applications. However, the original MapReduce framework also needs some optimizations to improve its performance. In this paper, we propose a novel bloom filter based lightweight MapReduce index (BF-MapReduce). Instead of scanning the whole dataset, our approach uses an auxiliary index to quickly skip unnecessary data segments, which can efficiently degrade the processing cost at map phase. Moreover, in order to deal with multi-dimension dataset, a converting schema is proposed. It can map multi-dimension data into one-dimension index. The experimental results show that our approach is efficient and lightweight. It can reduce the task running time dramatically with a little storage and maintenance cost.

Publication
IEEE Conference on Collaboration and Internet Computing (CIC)