Skip to content

Optimize distinct/distinctBy implementations for non-empty collections #4610

@satorg

Description

@satorg

A follow up for #4608 (see my comment #4608 (comment)).
For an example of a pretty well optimized implementation one could check out the corresponding methods in Chain:

def distinct[AA >: A](implicit O: Order[AA]): Chain[AA] = {
if (isEmptyOrSingleton) this
else {
implicit val ord: Ordering[AA] = O.toOrdering
val bldr = Vector.newBuilder[AA]
val seen = mutable.TreeSet.empty[AA]
val it = iterator
while (it.hasNext) {
val next = it.next()
if (seen.add(next))
bldr += next
}
// Result can contain a single element only.
Chain.fromSeq(bldr.result())
}
}

def distinctBy[B](f: A => B)(implicit O: Order[B]): Chain[A] = {
if (isEmptyOrSingleton) this
else {
implicit val ord: Ordering[B] = O.toOrdering
val bldr = Vector.newBuilder[A]
val seen = mutable.TreeSet.empty[B]
val it = iterator
while (it.hasNext) {
val next = it.next()
if (seen.add(f(next)))
bldr += next
}
// Result can contain a single element only.
Chain.fromSeq(bldr.result())
}
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueIssues that are easier to take on for first time contributorsoptimization

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions