Tolerances and roundings
A tolerance can only be assigned to column defined as a value (meaning that a tolerance can’t be applied to key columns) and, for the moment, is only supported for column defined with types numeric, dateTime and text.
Tolerance for numeric values
If you apply a tolerance of 0.001 to a numeric column, two rows (with same keys) will be considered as equal if the absolute difference between them is less or equal to 0.001.
This is especially helpful in Business Intelligence when you need to compare two queries and you don’t bother about small differences. Another case in Business Intelligence is also when you customers give you some hints about values expected in reports, they are usually a bit imprecise.
It’s also possible to take some short-cuts and write directly in the equalTo tag the columns’ definition. This xml element has the following meaning: all columns are keys except the last one. This last column is a value column where a tolerance of 0.001 must applied.
If you want you can combine both notations. The definition provided in an xml element named column has always the precedence to the tolerance provided the xml element named equalTo. It means that following notation (for a result-set with 6 columns):
must be interpreted as: columns 0, 1 and 4 are keys, column 2 must be ignored, column 3 is a value with a tolerance of 0.001 and column 5 is also a value but with a tolerance of 10.
You can express a tolerance, relative to the expected value, by the means of the % symbol. This only applies to numeric values. When comparing the expected and actual value, the comparer will apply a tolerance of the percentage defined.
To illustrate this, if you’ve two rows with values of 40 and 100, an absolute tolerance of 10 will allow the actual values to be respectively in the intervals [30;50] and [90;110]. A relative tolerance of 10% will change the intervals to [36;44] and [90;110].
The xml syntax requires a % in the attribute tolerance.
Absolute and relative tolerance
Sometimes, you want to use a relative tolerance but you also want to bound thsi tolerance and express that this tolerance cannot be more (or less) than a specific value. It’s possible to achieve this by specifying the value in percentage and also the bound (min or max) between brackets for the xml attribute tolerance.
Sometimes, you want to specify that the tolerance should only be applied on the right or on the left of the expected value. In this case, you must specify the symbols + or - before the percentage of the absolute value.
Tolerance for type “date and time”
If you want to specify the tolerance for dateTime columns, you must express the tolerance in days, hours, minutes, seconds and milliseconds. The correct syntax for two days and an half is
If the tolerance must be set to 15 minutes, you will write:
At the moment only absolute tolerance is supported and its not possible to apply one-sided tolerance.
Reminder, tolerances are only applied to value columns and never to key columns!
Tolerance for type “text”
You can define a tolerance for a text. Following algorithms are supported:
- Hamming Distance
- Jaccard Distance
- Jaro Distance
- Jaro-Winkler Distance
- Levenshtein Distance
- Longest Common Subsequence
- Longest Common Substring
- Overlap Coefficient
- Ratcliff-Obershelp Similarity
- Sorensen-Dice Distance
- Tanimoto Coefficient
For more information about these algorithms or about how it’s implemented, refer to the third-party library FuzzyStrings.
Use the attribute tolerance and specify the name of the algorithm and the threshold value.
If you wish, you can combine the different algorithms by seperating them with a comma. In this case you can’t specify a numeric threshold but must rely on one of the three predefined values: strong, normal or weak.
The roundings are another set of tools to express that two values are equal if they are close to each other. At the opposite of tolerance, rounding is applied to both expected and actual values. If after the rounding’s operation, the two values are strictly equal then the comparison will be positive (and else negative).
The method used must be specified in the column xml definition by the means of values: round, ceiling and floor in the attribute rounding-style.
Nevertheless, the rounding methods are different in same points to their .Net equivalent. Each of them require a step. If the step is less than 1, the rounding will be applied to the decimal part of the value.
Example 1: For a value of 10.52912 with a step of 0.1, the rounding will return
- with a floor style : 10.5
- with a round style : 10.5
- with a ceiling style : 10.6
Example 2: For a value of 10.52912 with a step of 0.01, the rounding will return
- with a floor style : 10.52
- with a round style : 10.53
- with a ceiling style : 10.53
If the step is greater than 1, the rounding will be applied to the integer part of the value.
Example 3: For a value of 10529.12 with a step of 10, the rounding will return
- with a floor style : 10520
- with a round style : 10530
- with a ceiling style : 10530
Example 4: For a value of 10529.12 with a step of 20, the rounding will return
- with a floor style : 10520
- with a round style : 10520
- with a ceiling style : 10540
Date and time columns
It’s also possible to apply the same kind of roundings for dateTime columns. The rounding must be less than or equal to one day.
Example 5: For a value of 2013-10-17 14:47:00 with a step of “1” (day), the rounding will return
- with a floor style : 2013-10-17 00:00:00
- with a round style : 2013-10-18 00:00:00
- with a ceiling style : 2013-10-18 00:00:00
Example 6: For a value of 2013-10-17 14:47:00 with a step of “00:15:00” (15 minutes), the rounding will return
- with a floor style : 2013-10-17 14:45:00
- with a round style : 2013-10-17 14:45:00
- with a ceiling style : 2013-10-17 15:00:00