Articles

Getting Started With SSIS Productivity Pack – Premium Lookup


Thank you for watching this video from KingswaySoft. Today I will be introducing the Premium Lookup
component from within the SSIS Productivity Pack. The SSIS Productivity Pack is a collection
of premium SSIS components which enable greater developer productivity and increases the power
of SSIS. The Premium Lookup component can be used to
search for rows in a lookup table based on fields from a primary input. It offers the most flexible way to perform
lookup against any data sources. The component supports both exact match and
approximate match (also often referred to as fuzzy match) for lookup purposes. Each primary row is compared to the lookup
table (or source) based on the specified matching fields and determined to either be an unmatched
or matched row and is directed to the applicable output. In the case of a matched row, a single input
row may produce multiple matched rows if there were several rows in the lookup table that
matched. We’ll quickly configure two source components. I will be using the Premium Flat File Source
component, which is one of the other components offered in the SSIS Productivity Pack. Now let’s drag the Premium Lookup component
from the SSIS Toolbox to the design surface and connect it to the sources. We will specify one source component to be
the Primary Input, and the other to be the Lookup Table Input. Double click to open its Editor form. The ‘Fields to Compare’ grid displays
all the available fields from the Lookup Table Input. Each of these fields can be paired to a field
in the Primary Input. The Match Type Column can be set to either
Exact Match, the generic Fuzzy Match, or a more specific match type. Exact match columns must be matched exactly
to be considered matches. It’s important to note that exact match
filters perform much better than fuzzy match filters. Rows are compared for exact matches before
any fuzzy matching is attempted. Performance will improve if you are able to
add any exact match filters to your Premium Lookup component. Fuzzy match columns must meet a similarity
threshold. If you select FuzzyMatch as the match type,
the similarity threshold will be enabled. Only string types such as VARCHAR, NVARCHAR,
NTEXT and TEXT can be used with a fuzzy match filter, a warning will be shown if any other
type is selected. The similarity threshold is a value between
1% and 100% that determines how similar the values need to be considered to be matches. A value of 1% would be not similar at all
and a value of 100% would an exact match. When using a fuzzy match filter, the higher
the similarity threshold you set, the better the performance will be (and the fewer matches
you will get). A high similarity threshold allows potential
non-matches to be eliminated more quickly. There are a few more specific match types,
such as Address Match, FirstName Match, CompanyName Match, PhoneNumber Match and US Zip Code Match. For the Address Match option, strings are
treated as street addresses, and common street name prefixes and directions are treated as
equivalent. For example N. and North, or Avenue and Ave.
would be treated as exact matches. For the FirstName Match option, strings are
treated as first names, and common nicknames are treated as the same name. For example Bill and William would be an exact
match. For the CompanyName Match option, strings
are treated as company names, and common company suffixes such as INC, CORP and LLC are ignored
when comparing. For the PhoneNumber Match option, strings
are parsed to attempt to identify a 10 digit phone number which is used for comparison,
ignoring any other characters. For the USZipCode Match option, strings are
parsed to attempt to identify a 5 or 9 digit zip code which is used for comparison, ignoring
other characters. Notice that we have an Advanced Settings Column. Click on the button to launch the Editor form. There are a couple of options to specify how
lookup is performed, while the values are compared. Selecting the Trim Leading and Trailing Whitespace
option will remove all whitespace at the beginning and end of the compared values. Selecting the Ignore Case option will ignore
the case of the compared values. Selecting the Ignore All Whitespace option
will remove all whitespace from the compared values. Selecting the Ignore All Punctuation option
will remove all punctuation from the compared values. The Matching Performance slider can be used
to find a balance between the quality and speed of the lookup action. There is a default value set for each of the
fuzzy match types, which is considered to be the recommended setting, but this can be
adjusted if performance is found to be too slow, or if accuracy is too low. Increasing this value will result in faster
performance, but less matches found in the lookup table. Note that the MatchingPerformance property
can be very useful to achieve desired levels of quality and performance. This property is a value between 0 and 1. It uses an asymptotic scale, so small changes
will have a bigger impact as they approach 1 and a smaller impact as they approach 0. If you are experiencing slow performance,
especially with long strings (greater than 15 characters), try changing this value to
something higher, such as 0.8 or 0.9. If your performance is good, and you would
like better fuzzy matching precision, you can try lowering the MatchingPerformance value
for a field. This will be especially useful for very short
string fields (less than 8 characters). Try setting this value to 0.66 or 0.5 for
improved fuzzy matching precision and lower performance. (Press OK)
When any one of Advanced Settings options is checked, its applicable icon will appear
here. If you hover over this area, a tooltip will
appear outlining your settings. Now let’s navigate to the Comparison Settings
page. The Max Number of Matches option determines
the maximum number of matches that will be found for each primary input row. If the number of matches is higher than this
number, the best matches will be displayed. Culture Identifier is defaulted to the user’s
Windows regional setting. This will be used when comparing string values. When the Columns for Similarity of Each Compared
Field option is checked, a column will be added to the matched output for each mapped
field. The runtime value of the field will be the
actual similarity of those fields. The Columns page allows you to specify which
columns will be part of the output. Columns can be toggled for the Matched Rows
Output and the Unmatched Rows output. Available columns are all the columns from
both inputs (Primary Input and Lookup Table Input). Additionally, there are 4 calculated columns
in the Matched Rows Output for the following: Average Similarity Score, Richness Score,
Richness Rank, and Confidence. Those fields start with the underscore character. For more details about these additional fields,
please refer to our online help manual page. The Average Similarity Score column displays
the average similarity score of all compared fields. Fields that are an exact match have a similarity
of 100%. Fields that are a fuzzy match will have a
score less than 100%. The Richness Score column displays the richness
of the row from the lookup table. Richness is calculated based on number of
fields as null or empty. A row with data in every field would have
a richness of 1. The Richness Rank column displays the rank
of the row based on its richness, relative to the other matches found. For example, if 5 matches are found, the match
with the highest richness will be ranked 1, and the match with the lowest richness will
be ranked 5. The Confidence column displays the confidence
score for the match, which is a representation of how probable it is if this is the correct
match. The confidence score is calculated based on
the similarity to the primary input, and the total number of matches found. The last page is the Error Handling page,
where there are 3 error-handling mechanisms to choose from. The default option is ‘fail on error’,
where the entire dataflow will fail as soon as an error occurs. There is also the redirect rows to error output,
where the error output will contain the failed records with extra columns, ErrorCode, ErrorColumn
and ErrorMessage. There is also a third option, which is to
ignore any errors that may have occurred. We can click OK to finish configuring the
component. I will quickly connect this component to some
dummy Data Readers Destination Components for the purpose of showing the outputs for
Matched Rows and Unmatched Rows. There are also Error Outputs for the Primary
Input and the Lookup Table Input, if any error has occurred. We can now execute this data flow task. This concludes the demonstration of the Premium
Lookup component from within our SSIS Productivity Pack. There are many other components in the Productivity
Pack that enable developers to accomplish more in SSIS in a much more productive fashion. Thank you for watching this video. Please feel free to take a look at our other
videos available for viewing on our website or YouTube channel. For any further assistance, please feel free
to contact us.

Leave a Reply

Your email address will not be published. Required fields are marked *