Question Everything:Three new approaches to MDM optimization

If you’ve ever asked questions about MDM in the past, you might have gotten answers like these: “It’s always been done that way.”  “It is what it is.”  “That’s how they showed me to do it when I went to Boot Camp.”  While the “tried and tested” best practices of MDM have been in place for over 15 years now, there are some evolving customer demands that have us thinking about the normal approaches to configuration and questioning some of the default behavior patterns.

In this post we will discuss three questions that may cause you to rethink how you are doing MDM as well.

  1. What should happen to the non-surviving data when a merge occurs?
  2. Should having anonymous values trigger a score that reflects that data is different?
  3. Do all tasks need to be resolved from Inspector?

What Should Happen To The Non-Surviving Data When A Merge Occurs?

In the default behavior of IBM InfoSphere MDM’s Virtual Hub (formerly known as Initiate), when two records are merged one is noted as a Survivor and the other becomes Obsolete.  The Obsolete record becomes completely deactivated – you cannot search for it, it will not be compared, and the record will reject all incoming updates.

Sounds fine, right?  But think about the data that the Obsolete record held before the Merge. In the default behavior, only the data that belonged to the Survivor record is searchable and usable going forward.  We recently complete a project for a Canadian Province where they questioned this approach.  Their point – a very valid one – was this: Shouldn’t the values from the Obsolete record be added to the Survivor’s historical information so that you can search on that data to retrieve the Survivor?

Wow, what a simple question?  This question led to a relatively straight-forward solution.  When two records are merged, the demographic data from the Obsolete record is copied into Survivor with an “Inactive” status.  Voilá!  Now, the data from the Obsolete record has a life beyond merger.  Those demographic values are now searchable and scorable in the context of MDM.

Should Having Anonymous Values Trigger A Score That Reflects That Data Is Different?

Many of you are familiar with the IBM InfoSphere MDM concept of “Anonymous Values” where commonly used “fake” values such as BABY, CONFIDENTIAL, (999) 999-9999, or 01/01/1901 are treated as if they are NULL during analysis.  The theory is that by removing these values from the logical processing, we get a more accurate assessment of matching the records.  When it comes to attributes like Dates, Phones, and Identifiers the anonymous processing works as you would expect. However, analysis on Names does not always behave the way might intend when those anonymous values are stripped out.

While anonymous Name handling has always been promoted as a way to clarify differences, in reality the removal of the anonymous Names from MDM matching leads the Member Comparison process to register an Exact Match on when the other Names are factored in.

Record 1 Name Tokens: Record 2 Name Tokens: Match Result: Overall Result:
BABYGIRL (Anonymous) AMANDA No comparison EQUAL
R R Exact


Instead of listing those values as anonymous during standardization, you can set up scores in the weight tables, where those “Anonymous” names earn a score of Zero when comparing.  That way, when one record has the “Anonymous” name and the second record has the real name, you get a result that indicates that they are “Different” and ends up adjusting the score accordingly.  If both records have the “Anonymous” name, you will see that they are “Equal” but no score will be added.

Record 1 Name Tokens: Record 2 Name Tokens: Match Result: Overall Result:
BABYGIRL (Anonymous) AMANDA Disagree Partial
R R Exact


Remember, though… anonymous values also play a role in searching your Buckets.  So, you should still have these Anonymous names listed in a secondary list that is attached to your Name Buckets.  The end result will be worth the reconfiguration!

Do All Tasks Need To Be Resolved In Inspector?

This is a question we get asked a lot… and the answer is surprising to many.  No, you don’t have to work all of your tasks in Inspector.  But let us clarify… while the tasks don’t always have to be worked in the MDM tools, they should still be handled appropriately.

When we really look at the majority of tasks, they fall into two types: Potential Linkage and Potential Duplicate.  Potential Linkages (which are records from different sources with questionable scores) should be worked from the Inspector interface to establish the correct linking between records.  However, when it comes to Potential Duplicates (or records from the same source) resolving the records from Inspector is not always the best way to handle the process.

If you resolve Potential Duplicates, either with a merge or a clarification that records are different, the MDM engine creates a Rule.  Those rules act like a legal precedent, upholding the prior decision each time data changes call into question the validity of the match.  In essence, the MDM engine will have to ask for permission each time it reviews those records in the future.  Lots of rules can cause your probabilistic engine to behave more like a deterministic one, which slows down the engine and leads to unexpected results.

On a related note, if you merge two Potential Duplicates in Inspector, you still need to go back to the Source and do the same thing.  But if you merge from the Source, it automatically merges in MDM.  One step vs. Two… pretty simple.  Merging from the Source will not only perform the merge in MDM it will also remove the Task from the queue.  We have many clients who are using the Task list as a report, then going through and working the Merges (where the clinical data also resides) instead of in Inspector.

Keep Questioning!

We want to leave you with a final thought.  Keep questioning… keep asking why something works a specific way, why a seeming “default” setting is there, and why the “out of the box” configuration is doing what it’s doing.  In most cases the answer is simply “because that’s how it’s always been done.

If you want to optimize your MDM, please reach out to IMT at and we would be happy to help you find the best practices that work for you.