By Thomas N. Herzog

This e-book is helping practitioners achieve a deeper figuring out, at an utilized point, of the problems thinking about bettering information caliber via modifying, imputation, and list linkage. the 1st a part of the e-book offers with equipment and types. right here, we specialize in the Fellegi-Holt edit-imputation version, the Little-Rubin multiple-imputation scheme, and the Fellegi-Sunter list linkage version. short examples are incorporated to teach how those options work.

In the second one a part of the e-book, the authors current real-world case experiences during which a number of of those strategies are used. They hide a wide selection of software components. those contain personal loan warrantly assurance, scientific, biomedical, road security, and social assurance in addition to the development of checklist frames and administrative lists.

Readers will locate this publication a mix of functional suggestion, mathematical rigor, administration perception and philosophy. The lengthy record of references on the finish of the booklet allows readers to delve extra deeply into the topics mentioned the following. The authors additionally talk about the software program that has been constructed to use the concepts defined in our text.

Show description

Read or Download Data Quality and Record Linkage Techniques PDF

Similar information theory books

Quantum Communications and Cryptography

All present equipment of safe conversation corresponding to public-key cryptography can finally be damaged by way of speedier computing. on the interface of physics and laptop technology lies a robust answer for safe communications: quantum cryptography. simply because eavesdropping alterations the actual nature of the knowledge, clients in a quantum alternate can simply discover eavesdroppers.

Complexity Theory

Complexity conception is the speculation of identifying the required assets for the answer of algorithmic difficulties and, for that reason, the boundaries what's attainable with the on hand assets. the consequences hinder the quest for non-existing effective algorithms. the idea of NP-completeness has prompted the advance of all components of desktop technology.

Toeplitz and Circulant Matrices: A review (Foundations and Trends in Communications and Information The)

Toeplitz and Circulant Matrices: A evaluation derives in an instructional demeanour the basic theorems at the asymptotic habit of eigenvalues, inverses, and items of banded Toeplitz matrices and Toeplitz matrices with completely summable components. Mathematical attractiveness and generality are sacrificed for conceptual simplicity and perception within the wish of creating those effects to be had to engineers missing both the history or patience to assault the mathematical literature at the topic.

Information Theory and the Brain

Info concept and the mind offers with a brand new and increasing zone of neuroscience that gives a framework for knowing neuronal processing. This framework is derived from a convention held in Newquay, united kingdom, the place a bunch of scientists from world wide met to debate the subject. This ebook starts with an advent to the fundamental thoughts of knowledge conception after which illustrates those strategies with examples from examine during the last 40 years.

Extra info for Data Quality and Record Linkage Techniques

Example text

8. Practical Tips We strongly recommend that data analysts organize their tests systematically and proceed in a way that (1) allows them to readily correct obvious errors that are easy to fix but (2) leaves more complicated situations for the more sophisticated methods of Chapter 7, our chapter on editing and imputation. 4 Deming [2006] is a condensed version of Deming [1944] that Scheuren edited. 8. Practical Tips 47 We also recommend that the analysts preserve the original dataset and also save intermediate versions of the dataset periodically.

The facilitator can ensure that all critical items are on the list and individual stakeholders are communicating in a fashion that allows understanding. For instance, an experienced systems programmer may be able to explain how various characteristics in the database can help customer-relation representatives and data analysts who may be interested in marketing. An experienced customer-relation representative may be able to identify certain data fields that must be readily available during interactions with customers.

In this chapter, we first discuss a few key properties of high-quality databases/lists. This is followed by a number of typical examples in which lists might be merged. Finally, we present some additional metrics for use in assessing the quality of lists produced by merging two or more lists. Although quantification and the use of appropriate metrics are needed for the quality process, most current quantification approaches are created in an ad hoc fashion that is specific to a given database and its use.

Download PDF sample

Rated 4.02 of 5 – based on 11 votes