Golden Record is Dataspace’s record matching and deduplication technology

Is it a match or a duplicate? When trying to match similar records across data sets you’ll run into two similar, but different, concepts: matching and deduplication. Quite simply, deduplicating means finding (technically, eliminating) what is effectively the same record multiple times in a file or database. Matching, on the other hand, means finding what is effectively a record with the same key in different data sets.

As an example, consider the following two data sets. One is a company’s customer database. The other contains purchased demographic data. (FYI, I have obtained permission from this company to show their data)

Sales Data

Demographic Data

A quick glance at the customer database indicates that record one, Steve Jones, and record nine, also Steve Jones, are exact duplicates. Similarly you can match the two Steve Jones records with the Steve Jones record in the purchased demographic data.

Complications

A closer look at the data, however, shows a couple of complications that your matching algorithms really need to consider to be effective (actually, the list of potential complications in data is endless. Let’s talk about it!).

What Constitutes a Match / Duplicate?

The first, rather obvious question is, exactly what constitutes a match? Look at our customer data again:

Sales Data

As we noted, record one and record nine are most likely duplicates. But what about record three? It has no email address? Does the Steve Jones on record three represent the same person as the Steve Jones on records one and nine? Your matching solution should be able to negotiate questions like this.

Synonyms

Alternatively, what about Stephen Jones on record four? He’s got the same last name and the same email address as the Steve Jones on records one and nine. Is he the same person? Probably. One way to handle situations like this is with a robust synonyms file. Make sure your matching solution can handle synonyms.

Do You Have a Matching / Deduplication Problem?

We’ll talk about more possible matching issues in future posts. For now, If you have a matching or deduplication need, check out our matching product, Golden Record. Then, let’s talk! You can reach me at Benjamin.Taub@Dataspace.com.

Thanks for reading!

Ben

Dataspace's Golden Record

Today, the world is undergoing a massive event: the implementation of data privacy laws. (And you thought this was another post about coronavirus, didn’t you?). Once the pandemic passes, data and legal professionals are going to have to return to figuring out data privacy compliance. Europe has GDPR , California has CCPA. and, in both cases, violators are subject to huge fines. And more are on the way. Over the next few years, we’ll be required to comply with a variety of state, national, and international privacy regulations.

Each of these laws provides consumers with rights, like the right to be forgotten and the right to know where and how your data is used.

If yours is like like most organizations I’ve worked with, customer and person data is stored in an array of places: CRM systems, sales systems, email marketing systems, spreadsheets and a ton more. This presents a huge problem: How to comply if I don’t even know everywhere that a person’s data might lurk?

We’re developing a tool to help you minimize the data privacy compliance risk: Golden Record. Golden Record ties together records from all sorts of sources, letting you know all the systems where a contact’s data resides. You get the ability to handle questions from one place. Just as importantly, Golden Record’s algorithms work even if your data sets don’t all share a common customer number.  Think about it, how many of the spreadsheets that hold customer data also have the customer ID in them?

Want more information on how record matching and a tool like Golden Record can help protect you from incurring expensive fines? Let’s talk! You can reach me at Benjamin.Taub@Dataspace.com or via LinkedIn.

Thanks & stay safe!

Ben