Class JaroWinklerSimilarity
java.lang.Object
org.apache.commons.text.similarity.JaroWinklerSimilarity
- All Implemented Interfaces:
BiFunction<CharSequence,,CharSequence, Double> ObjectSimilarityScore<CharSequence,,Double> SimilarityScore<Double>
A similarity algorithm indicating the percentage of matched characters between two character sequences.
The Jaro measure is the weighted sum of percentage of matched characters from each file and transposed characters. Winkler increased this measure for matching initial characters.
This implementation is based on the Jaro Winkler similarity algorithm from https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance.
This code has been adapted from Apache Commons Lang 3.3.
- Since:
- 1.7
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionapply(CharSequence left, CharSequence right) Computes the Jaro Winkler Similarity between two character sequences.<E> Doubleapply(SimilarityInput<E> left, SimilarityInput<E> right) Computes the Jaro Winkler Similarity between two character sequences.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface java.util.function.BiFunction
andThen
-
Constructor Details
-
JaroWinklerSimilarity
public JaroWinklerSimilarity()Creates a new instance.
-
-
Method Details
-
apply
Computes the Jaro Winkler Similarity between two character sequences.sim.apply(null, null) = IllegalArgumentException sim.apply("foo", null) = IllegalArgumentException sim.apply(null, "foo") = IllegalArgumentException sim.apply("", "") = 1.0 sim.apply("foo", "foo") = 1.0 sim.apply("foo", "foo ") = 0.94 sim.apply("foo", "foo ") = 0.91 sim.apply("foo", " foo ") = 0.87 sim.apply("foo", " foo") = 0.51 sim.apply("", "a") = 0.0 sim.apply("aaapppp", "") = 0.0 sim.apply("frog", "fog") = 0.93 sim.apply("fly", "ant") = 0.0 sim.apply("elephant", "hippo") = 0.44 sim.apply("hippo", "elephant") = 0.44 sim.apply("hippo", "zzzzzzzz") = 0.0 sim.apply("hello", "hallo") = 0.88 sim.apply("ABC Corporation", "ABC Corp") = 0.91 sim.apply("D N H Enterprises Inc", "D & H Enterprises, Inc.") = 0.95 sim.apply("My Gym Children's Fitness Center", "My Gym. Childrens Fitness") = 0.92 sim.apply("PENNSYLVANIA", "PENNCISYLVNIA") = 0.88- Specified by:
applyin interfaceBiFunction<CharSequence,CharSequence, Double> - Specified by:
applyin interfaceObjectSimilarityScore<CharSequence,Double> - Specified by:
applyin interfaceSimilarityScore<Double>- Parameters:
left- the first input, must not be null.right- the second input, must not be null.- Returns:
- result similarity.
- Throws:
IllegalArgumentException- if either CharSequence input isnull.
-
apply
Computes the Jaro Winkler Similarity between two character sequences.sim.apply(null, null) = IllegalArgumentException sim.apply("foo", null) = IllegalArgumentException sim.apply(null, "foo") = IllegalArgumentException sim.apply("", "") = 1.0 sim.apply("foo", "foo") = 1.0 sim.apply("foo", "foo ") = 0.94 sim.apply("foo", "foo ") = 0.91 sim.apply("foo", " foo ") = 0.87 sim.apply("foo", " foo") = 0.51 sim.apply("", "a") = 0.0 sim.apply("aaapppp", "") = 0.0 sim.apply("frog", "fog") = 0.93 sim.apply("fly", "ant") = 0.0 sim.apply("elephant", "hippo") = 0.44 sim.apply("hippo", "elephant") = 0.44 sim.apply("hippo", "zzzzzzzz") = 0.0 sim.apply("hello", "hallo") = 0.88 sim.apply("ABC Corporation", "ABC Corp") = 0.91 sim.apply("D N H Enterprises Inc", "D & H Enterprises, Inc.") = 0.95 sim.apply("My Gym Children's Fitness Center", "My Gym. Childrens Fitness") = 0.92 sim.apply("PENNSYLVANIA", "PENNCISYLVNIA") = 0.88- Type Parameters:
E- The type of similarity score unit.- Parameters:
left- the first input, must not be null.right- the second input, must not be null.- Returns:
- result similarity.
- Throws:
IllegalArgumentException- if either CharSequence input isnull.- Since:
- 1.13.0
-