CFP last date
02 December 2024
Reseach Article

Contraption of Suffix Array Blocking for Efficacious Record Linkage and De-duplication

by Yamini Warke, Arti Mohanpurkar
Communications on Applied Electronics
Foundation of Computer Science (FCS), NY, USA
Volume 1 - Number 6
Year of Publication: 2015
Authors: Yamini Warke, Arti Mohanpurkar
10.5120/cae-1565

Yamini Warke, Arti Mohanpurkar . Contraption of Suffix Array Blocking for Efficacious Record Linkage and De-duplication. Communications on Applied Electronics. 1, 6 ( April 2015), 6-9. DOI=10.5120/cae-1565

@article{ 10.5120/cae-1565,
author = { Yamini Warke, Arti Mohanpurkar },
title = { Contraption of Suffix Array Blocking for Efficacious Record Linkage and De-duplication },
journal = { Communications on Applied Electronics },
issue_date = { April 2015 },
volume = { 1 },
number = { 6 },
month = { April },
year = { 2015 },
issn = { 2394-4714 },
pages = { 6-9 },
numpages = {9},
url = { https://www.caeaccess.org/archives/volume1/number6/334-1565/ },
doi = { 10.5120/cae-1565 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-09-04T18:37:41.982607+05:30
%A Yamini Warke
%A Arti Mohanpurkar
%T Contraption of Suffix Array Blocking for Efficacious Record Linkage and De-duplication
%J Communications on Applied Electronics
%@ 2394-4714
%V 1
%N 6
%P 6-9
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Information is united for common purpose from many sidedness computerized files is referred as record linkage. The basic methods compare name and address information across pairs of files to determine those pairs of records that are associated with the same entity. An entity might be a business, a person, or some other type of unit that is listed. De-duplication is a scold of identifying one or more records in receptacle which represents same object or entity. The same data may be depicting in different way in all possible database causing problem. Diverse indexing techniques have been elaborated for record linkage and de-duplication, in modern time. They are intended to reducing the number of record pairs to be compared in similarity matching process, while at the same time maintaining high matching quality. This paper presents, contraption of suffix array blocking for efficacious record linkage and de-duplication based on different similarity measures.

References
  1. Winkler, William E. "Overview of record linkage and current research directions. " In Bureau of the Census. 2006.
  2. Christen, Peter. "A survey of indexing techniques for scalable record linkage and deduplication. " IEEE transactions on Knowledge and DataEngineering, vol. 24. 9, 2012,1537-55.
  3. Baxter, Rohan, Peter Christen, and Tim Churches. "A comparison of fast blocking methods for record linkage. " In ACM SIGKDD, vol. 3, 2003. 25-27.
  4. Christen, Peter. ''Towards parameter-free blocking for scalable record linkage''. Department of Computer Science, Faculty of Engineering and Information Technology, Australian National University, 2007.
  5. Aizawa, A. , & Oyama, K. ''A fast linkage detection scheme for multisource information integration. '' IEEE In Web Information Retrieval and Integration,. Proceedings. International Workshop on Challenges ,vol. 05 , April 2005, 30-39.
  6. Dunn, Halbert L. "Record Linkage*. " American Journal of Public Health and the Nations Health, vol. 12, 1946,1412-1416.
  7. Elfeky, Mohamed G. , Vassilios S. Verykios, and Ahmed K. Elmagarmid. "TAILOR: A record linkage toolbox. "IEEE,18th International Conference on, Data Engineering, Proceedings, 2002, 17-28.
  8. Fellegi, I. P. , & Sunter, A. ''A theory for record linkage. '' Journal of the American Statistical Association, vol. 64 (328), 1969. 1183-1210.
  9. Gill, Leicester, Michael Goldacre, Hugh Simmons, Glenys Bettley, and Myfanwy Griffith. "Computerised linking of medical records: methodological guidelines. " Journal of Epidemiology and Community Health,vol. 4, 1993, 316-319.
  10. Gu, Lifang, Rohan Baxter, Deanne Vickers, and Chris Rainsford. "Record linkage: Current practice and future directions. " CSIRO Mathematical and Information Sciences Technical Report . vol. 3, 2003,13-38.
  11. M. A. Hernandez and S. J. Stolfo. '' Real-world data is dirty: Data cleansing and the merge/purge problem. ''Data Mining and Knowledge Discovery,vol 2(1), 1998,9–37.
  12. Newcombe, Howard B. , and James M. Kennedy. "Record linkage: making maximum use of the discriminating power of identifying information. "Communications of the ACM , vol. 11, 1962, 563-566.
  13. Jin, Liang, Chen Li, and Sharad Mehrotra. "Efficient record linkage in large data sets. " IEEE Eighth International Conference on Database Systems for Advanced Applications. 2003, 137-146.
  14. Marshall, J. T. "Canada's national vital statistics index. " Population Studies,vol. 2, 1947, 204-211.
  15. McCallum, A. , Nigam, K. , & Ungar, L. H. ''Efficient clustering of highdimensional data sets with application to reference matching. '' In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, August 2000 , 169-178.
  16. William, E. W. ''Overview of record linkage and current research directions. '' Technical Report, 2006.
  17. De Vries, Timothy, Hui Ke, Sanjay Chawla, and Peter Christen. "Robust record linkage blocking using suffix arrays. " ACM conference on Information and knowledge management, 2009, 305-314
Index Terms

Computer Science
Information Sciences

Keywords

Record linkage suffix array blocking.