From bd217d4b27174026e11409104f8c8f72d19bfd33 Mon Sep 17 00:00:00 2001 From: Jeremie Knuesel Date: Mon, 23 Jan 2023 13:17:46 +0100 Subject: [PATCH 1/5] Improve documentation of sort-related functions (PR #48387) In particular: * document the `order` keyword in `sort!` * list explicitly the required properties of `lt` in `sort!` * clarify the sequence of "by" transformations if both `by` and `order` are given * show default values in the signatures for `searchsorted` and related functions * add `isunordered` to the manual (it's already exported) --- base/operators.jl | 10 +- base/ordering.jl | 8 +- base/sort.jl | 213 +++++++++++++++++++++++++++----------- doc/src/base/base.md | 1 + doc/src/base/sort.md | 2 +- doc/src/manual/missing.md | 2 +- 6 files changed, 167 insertions(+), 69 deletions(-) diff --git a/base/operators.jl b/base/operators.jl index da55981c5f7f8..b4fbea547238e 100644 --- a/base/operators.jl +++ b/base/operators.jl @@ -154,13 +154,13 @@ Values that are normally unordered, such as `NaN`, are ordered after regular values. [`missing`](@ref) values are ordered last. -This is the default comparison used by [`sort`](@ref). +This is the default comparison used by [`sort!`](@ref). # Implementation Non-numeric types with a total order should implement this function. Numeric types only need to implement it if they have special values such as `NaN`. Types with a partial order should implement [`<`](@ref). -See the documentation on [Alternate orderings](@ref) for how to define alternate +See the documentation on [Alternate Orderings](@ref) for how to define alternate ordering methods that can be used in sorting and related functions. # Examples @@ -328,6 +328,8 @@ New types with a canonical partial order should implement this function for two arguments of the new type. Types with a canonical total order should implement [`isless`](@ref) instead. +See also [`isunordered`](@ref). + # Examples ```jldoctest julia> 'a' < 'b' @@ -1344,7 +1346,7 @@ corresponding position in `collection`. To get a vector indicating whether each in `items` is in `collection`, wrap `collection` in a tuple or a `Ref` like this: `in.(items, Ref(collection))` or `items .∈ Ref(collection)`. -See also: [`∉`](@ref). +See also: [`∉`](@ref), [`insorted`](@ref), [`contains`](@ref), [`occursin`](@ref), [`issubset`](@ref). # Examples ```jldoctest @@ -1382,8 +1384,6 @@ julia> [1, 2] .∈ ([2, 3],) 0 1 ``` - -See also: [`insorted`](@ref), [`contains`](@ref), [`occursin`](@ref), [`issubset`](@ref). """ in diff --git a/base/ordering.jl b/base/ordering.jl index d0c9cb99f9c72..5383745b1dd1f 100644 --- a/base/ordering.jl +++ b/base/ordering.jl @@ -87,8 +87,8 @@ By(by) = By(by, Forward) """ Lt(lt) -`Ordering` which calls `lt(a, b)` to compare elements. `lt` should -obey the same rules as implementations of [`isless`](@ref). +`Ordering` that calls `lt(a, b)` to compare elements. `lt` must +obey the same rules as the `lt` parameter of [`sort!`](@ref). """ struct Lt{T} <: Ordering lt::T @@ -146,8 +146,8 @@ Construct an [`Ordering`](@ref) object from the same arguments used by Elements are first transformed by the function `by` (which may be [`identity`](@ref)) and are then compared according to either the function `lt` or an existing ordering `order`. `lt` should be [`isless`](@ref) or a function -which obeys similar rules. Finally, the resulting order is reversed if -`rev=true`. +that obeys the same rules as the `lt` parameter of [`sort!`](@ref). Finally, +the resulting order is reversed if `rev=true`. Passing an `lt` other than `isless` along with an `order` other than [`Base.Order.Forward`](@ref) or [`Base.Order.Reverse`](@ref) is not permitted, diff --git a/base/sort.jl b/base/sort.jl index 985e0e8f597f3..4247afde1fb84 100644 --- a/base/sort.jl +++ b/base/sort.jl @@ -63,8 +63,8 @@ end """ issorted(v, lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward) -Test whether a vector is in sorted order. The `lt`, `by` and `rev` keywords modify what -order is considered to be sorted just as they do for [`sort`](@ref). +Test whether a collection is in sorted order. The keywords modify what +order is considered sorted, as described in the [`sort!`](@ref) documentation. # Examples ```jldoctest @@ -79,6 +79,9 @@ false julia> issorted([(1, "b"), (2, "a")], by = x -> x[2], rev=true) true + +julia> issorted([1, 2, -2, 3], by=abs) +true ``` """ issorted(itr; @@ -94,14 +97,17 @@ maybeview(v, k) = view(v, k) maybeview(v, k::Integer) = v[k] """ - partialsort!(v, k; by=, lt=, rev=false) + partialsort!(v, k; by=identity, lt=isless, rev=false) -Partially sort the vector `v` in place, according to the order specified by `by`, `lt` and -`rev` so that the value at index `k` (or range of adjacent values if `k` is a range) occurs +Partially sort the vector `v` in place so that the value at index `k` (or +range of adjacent values if `k` is a range) occurs at the position where it would appear if the array were fully sorted. If `k` is a single index, that value is returned; if `k` is a range, an array of values at those indices is returned. Note that `partialsort!` may not fully sort the input array. +For the keyword arguments, see the documentation of [`sort!`](@ref). + + # Examples ```jldoctest julia> a = [1, 2, 4, 3, 4] @@ -148,9 +154,9 @@ partialsort!(v::AbstractVector, k::Union{Integer,OrdinalRange}; partialsort!(v, k, ord(lt,by,rev,order)) """ - partialsort(v, k, by=, lt=, rev=false) + partialsort(v, k, by=identity, lt=isless, rev=false) -Variant of [`partialsort!`](@ref) which copies `v` before partially sorting it, thereby returning the +Variant of [`partialsort!`](@ref) that copies `v` before partially sorting it, thereby returning the same thing as `partialsort!` but leaving `v` unmodified. """ partialsort(v::AbstractVector, k::Union{Integer,OrdinalRange}; kws...) = @@ -159,7 +165,7 @@ partialsort(v::AbstractVector, k::Union{Integer,OrdinalRange}; kws...) = # reference on sorted binary search: # http://www.tbray.org/ongoing/When/200x/2003/03/22/Binary -# index of the first value of vector a that is greater than or equal to x; +# index of the first value of vector a that is greater than or equivalent to x; # returns lastindex(v)+1 if x is greater than all values in v. function searchsortedfirst(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::keytype(v) where T<:Integer hi = hi + T(1) @@ -178,7 +184,7 @@ function searchsortedfirst(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::key return lo end -# index of the last value of vector a that is less than or equal to x; +# index of the last value of vector a that is less than or equivalent to x; # returns firstindex(v)-1 if x is less than all values of v. function searchsortedlast(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::keytype(v) where T<:Integer u = T(1) @@ -195,7 +201,7 @@ function searchsortedlast(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::keyt return lo end -# returns the range of indices of v equal to x +# returns the range of indices of v equivalent to x # if v does not contain x, returns a 0-length range # indicating the insertion point of x function searchsorted(v::AbstractVector, x, ilo::T, ihi::T, o::Ordering)::UnitRange{keytype(v)} where T<:Integer @@ -288,14 +294,18 @@ for s in [:searchsortedfirst, :searchsortedlast, :searchsorted] end """ - searchsorted(a, x; by=, lt=, rev=false) + searchsorted(v, x; by=identity, lt=isless, rev=false) + +Return the range of indices in `v` where values are equivalent to `x`, or an +empty range located at the insertion point if `v` does not contain values +equivalent to `x`. The vector `v` must be sorted according to the order defined +by the keywords. Refer to [`sort!`](@ref) for the meaning of the keywords and +the definition of equivalence. -Return the range of indices of `a` which compare as equal to `x` (using binary search) -according to the order specified by the `by`, `lt` and `rev` keywords, assuming that `a` -is already sorted in that order. Return an empty range located at the insertion point -if `a` does not contain values equal to `x`. +The range is generally found using binary search, but there are optimized +implementations for `v` values that are ranges of real numbers. -See also: [`insorted`](@ref), [`searchsortedfirst`](@ref), [`sort`](@ref), [`findall`](@ref). +See also: [`searchsortedfirst`](@ref), [`sort!`](@ref), [`insorted`](@ref), [`findall`](@ref). # Examples ```jldoctest @@ -313,17 +323,25 @@ julia> searchsorted([1, 2, 4, 5, 5, 7], 9) # no match, insert at end julia> searchsorted([1, 2, 4, 5, 5, 7], 0) # no match, insert at start 1:0 + +julia> searchsorted([1, -1, -2, 2, -2, 3, -4, 4], 2, by=abs) # sorted by absolute value, -2 equivalent to 2 +3:5 ``` """ searchsorted """ - searchsortedfirst(a, x; by=, lt=, rev=false) + searchsortedfirst(v, x; by=identity, lt=isless, rev=false) + +Return the index of the first value in `v` greater than or equivalent to `x`. +If `x` is greater than all values in `v` the function returns `lastindex(v) + 1`. -Return the index of the first value in `a` greater than or equal to `x`, according to the -specified order. Return `lastindex(a) + 1` if `x` is greater than all values in `a`. -`a` is assumed to be sorted. +The vector `v` must be sorted according to the order defined by the keywords. +`insert!`ing `x` at the returned index will maintain the sorted order. Refer to +[`sort!`](@ref) for the meaning of the keywords and the definition of +"greater than" and equivalence. -`insert!`ing `x` at this index will maintain sorted order. +The index is generally found using binary search, but there are optimized +implementations for `v` values that are ranges of real numbers. See also: [`searchsortedlast`](@ref), [`searchsorted`](@ref), [`findfirst`](@ref). @@ -343,15 +361,24 @@ julia> searchsortedfirst([1, 2, 4, 5, 5, 7], 9) # no match, insert at end julia> searchsortedfirst([1, 2, 4, 5, 5, 7], 0) # no match, insert at start 1 + +julia> searchsortedfirst([1, -1, -2, 2, -2, 3, -4, 4], 2, by=abs) # sorted by absolute value +3 ``` """ searchsortedfirst """ - searchsortedlast(a, x; by=, lt=, rev=false) + searchsortedlast(v, x; by=identity, lt=isless, rev=false) + +Return the index of the last value in `v` less than or equivalent to `x`. +If `x` is less than all values in `v` the function returns `firstindex(v) - 1`. -Return the index of the last value in `a` less than or equal to `x`, according to the -specified order. Return `firstindex(a) - 1` if `x` is less than all values in `a`. `a` is -assumed to be sorted. +The vector `v` must be sorted according to the order defined by the keywords. +Refer to [`sort!`](@ref) for the meaning of the keywords and the definition of +"less than" and equivalence. + +The index is generally found using binary search, but there are optimized +implementations for `v` values that are ranges of real numbers. # Examples ```jldoctest @@ -369,16 +396,22 @@ julia> searchsortedlast([1, 2, 4, 5, 5, 7], 9) # no match, insert at end julia> searchsortedlast([1, 2, 4, 5, 5, 7], 0) # no match, insert at start 0 + +julia> searchsortedlast([1, -1, -2, 2, -2, 3, -4, 4], 2, by=abs) # sorted by absolute value +5 ``` """ searchsortedlast """ - insorted(x, a; by=, lt=, rev=false) -> Bool + insorted(x, v; by=identity, lt=isless, rev=false) -> Bool -Determine whether an item `x` is in the sorted collection `a`, in the sense that -it is [`==`](@ref) to one of the values of the collection according to the order -specified by the `by`, `lt` and `rev` keywords, assuming that `a` is already -sorted in that order, see [`sort`](@ref) for the keywords. +Determine whether a vector `v` contains any value equivalent to `x`. +The vector `v` must be sorted according to the order defined by the keywords. +Refer to [`sort!`](@ref) for the meaning of the keywords and the definition of +equivalence. + +The check is generally done using binary search, but there are optimized +implementations for `v` values that are ranges of real numbers. See also [`in`](@ref). @@ -398,6 +431,9 @@ false julia> insorted(0, [1, 2, 4, 5, 5, 7]) # no match false + +julia> insorted(2, [1, -1, -2, 3, -4, 4], by=abs) # sorted by absolute value +true ``` !!! compat "Julia 1.6" @@ -524,7 +560,7 @@ Base.size(v::WithoutMissingVector) = size(v.data) send_to_end!(f::Function, v::AbstractVector; [lo, hi]) Send every element of `v` for which `f` returns `true` to the end of the vector and return -the index of the last element which for which `f` returns `false`. +the index of the last element for which `f` returns `false`. `send_to_end!(f, v, lo, hi)` is equivalent to `send_to_end!(f, view(v, lo:hi))+lo-1` @@ -724,8 +760,8 @@ Insertion sort traverses the collection one element at a time, inserting each element into its correct, sorted position in the output vector. Characteristics: -* *stable*: preserves the ordering of elements which compare equal -(e.g. "a" and "A" in a sort of letters which ignores case). +* *stable*: preserves the ordering of elements that compare equal +(e.g. "a" and "A" in a sort of letters that ignores case). * *in-place* in memory. * *quadratic performance* in the number of elements to be sorted: it is well-suited to small collections but should not be used for large ones. @@ -965,8 +1001,8 @@ is treated as the first or last index of the input, respectively. `lo` and `hi` may be specified together as an `AbstractUnitRange`. Characteristics: - * *stable*: preserves the ordering of elements which compare equal - (e.g. "a" and "A" in a sort of letters which ignores case). + * *stable*: preserves the ordering of elements that compare equal + (e.g. "a" and "A" in a sort of letters that ignores case). * *not in-place* in memory. * *divide-and-conquer*: sort strategy similar to [`QuickSort`](@ref). * *linear runtime* if `length(lo:hi)` is constant @@ -1242,7 +1278,7 @@ Otherwise, we dispatch to [`InsertionSort`](@ref) for inputs with `length <= 40` perform a presorted check ([`CheckSorted`](@ref)). We check for short inputs before performing the presorted check to avoid the overhead of the -check for small inputs. Because the alternate dispatch is to [`InseritonSort`](@ref) which +check for small inputs. Because the alternate dispatch is to [`InsertionSort`](@ref) which has efficient `O(n)` runtime on presorted inputs, the check is not necessary for small inputs. @@ -1323,15 +1359,52 @@ defalg(v::AbstractArray{Union{}}) = DEFAULT_UNSTABLE # for method disambiguation """ sort!(v; alg::Algorithm=defalg(v), lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward) -Sort the vector `v` in place. A stable algorithm is used by default. You can select a -specific algorithm to use via the `alg` keyword (see [Sorting Algorithms](@ref) for -available algorithms). The `by` keyword lets you provide a function that will be applied to -each element before comparison; the `lt` keyword allows providing a custom "less than" -function (note that for every `x` and `y`, only one of `lt(x,y)` and `lt(y,x)` can return -`true`); use `rev=true` to reverse the sorting order. These options are independent and can -be used together in all possible combinations: if both `by` and `lt` are specified, the `lt` -function is applied to the result of the `by` function; `rev=true` reverses whatever -ordering specified via the `by` and `lt` keywords. +Sort the vector `v` in place. A stable algorithm is used by default. A specific +algorithm can be selected via the `alg` keyword (see [Sorting Algorithms](@ref) +for available algorithms). + +Elements are first transformed with the function `by` and then compared +according to either the function `lt` or the ordering `order`. Finally, the +resulting order is reversed if `rev=true`. The current implemention applies the +`by` transformation before each comparison rather than once per element. + +Passing an `lt` other than `isless` along with an `order` other than +[`Base.Order.Forward`](@ref) or [`Base.Order.Reverse`](@ref) is not permitted, +otherwise all options are independent and can be used together in all possible +combinations. Note that `order` can also include a "by" transformation, in +which case it is applied after that defined with the `by` keyword. For more +information on `order` values see the documentation on [Alternate +Orderings](@ref). + +Relations between two elements are defined as follows (with "less" and +"greater" exchanged when `rev=true`): + +* `x` is less than `y` if `lt(by(x), by(y))` (or `Base.Order.lt(order, by(x), by(y))`) yields true. +* `x` is greater than `y` if `y` is less than `x`. +* `x` and `y` are equivalent if neither is less than the other ("incomparable" + is sometimes used as a synonym for "equivalent"). + +The result of `sort!` is sorted in the sense that every element is greater than +or equivalent to the previous one. + +The `lt` function must define a strict weak order, that is, it must be + +* irreflexive: `lt(x, x)` always yields `false`, +* asymmetric: if `lt(x, y)` yields `true` then `lt(y, x)` yields `false`, +* transitive: `lt(x, y) && lt(y, z)` implies `lt(x, z)`, +* transitive in equivalence: `!lt(x, y) && !lt(y, x)` and `!lt(y, z) && !lt(z, + y)` together imply `!lt(x, z) && !lt(z, x)`. In words: if `x` and `y` are + equivalent and `y` and `z` are equivalent then `x` and `z` must be + equivalent. + +For example `<` is a valid `lt` function for `Int` values but `≤` is not: it +violates irreflexivity. For `Float64` values even `<` is invalid as it violates +the fourth condition: `1.0` and `NaN` are equivalent and so are `NaN` and `2.0` +but `1.0` and `2.0` are not equivalent. + +See also [`sort`](@ref), [`sortperm`](@ref), [`sortslices`](@ref), +[`partialsort!`](@ref), [`partialsortperm`](@ref), [`issorted`](@ref), +[`searchsorted`](@ref), [`insorted`](@ref), [`Base.Order.ord`](@ref). # Examples ```jldoctest @@ -1358,6 +1431,29 @@ julia> v = [(1, "c"), (3, "a"), (2, "b")]; sort!(v, by = x -> x[2]); v (3, "a") (2, "b") (1, "c") + +julia> sort(0:3, by=x->x-2, order=Base.Order.By(abs)) # same as sort(0:3, by=abs(x->x-2)) +4-element Vector{Int64}: + 2 + 1 + 3 + 0 + +julia> sort([2, NaN, 1, NaN, 3]) # correct sort with default lt=isless +5-element Vector{Float64}: + 1.0 + 2.0 + 3.0 + NaN + NaN + +julia> sort([2, NaN, 1, NaN, 3], lt=<) # wrong sort due to invalid lt +5-element Vector{Float64}: + 2.0 + NaN + 1.0 + NaN + 3.0 ``` """ function sort!(v::AbstractVector{T}; @@ -1398,15 +1494,15 @@ sort(v::AbstractVector; kws...) = sort!(copymutable(v); kws...) ## partialsortperm: the permutation to sort the first k elements of an array ## """ - partialsortperm(v, k; by=, lt=, rev=false) + partialsortperm(v, k; by=ientity, lt=isless, rev=false) Return a partial permutation `I` of the vector `v`, so that `v[I]` returns values of a fully sorted version of `v` at index `k`. If `k` is a range, a vector of indices is returned; if `k` is an integer, a single index is returned. The order is specified using the same -keywords as `sort!`. The permutation is stable, meaning that indices of equal elements -appear in ascending order. +keywords as `sort!`. The permutation is stable: the indices of equal elements +will appear in ascending order. -Note that this function is equivalent to, but more efficient than, calling `sortperm(...)[k]`. +This function is equivalent to, but more efficient than, calling `sortperm(...)[k]`. # Examples ```jldoctest @@ -1432,7 +1528,7 @@ partialsortperm(v::AbstractVector, k::Union{Integer,OrdinalRange}; kwargs...) = partialsortperm!(similar(Vector{eltype(k)}, axes(v,1)), v, k; kwargs...) """ - partialsortperm!(ix, v, k; by=, lt=, rev=false) + partialsortperm!(ix, v, k; by=identity, lt=isless, rev=false) Like [`partialsortperm`](@ref), but accepts a preallocated index vector `ix` the same size as `v`, which is used to store (a permutation of) the indices of `v`. @@ -1498,7 +1594,7 @@ end Return a permutation vector or array `I` that puts `A[I]` in sorted order along the given dimension. If `A` has more than one dimension, then the `dims` keyword argument must be specified. The order is specified using the same keywords as [`sort!`](@ref). The permutation is guaranteed to be stable even -if the sorting algorithm is unstable, meaning that indices of equal elements appear in +if the sorting algorithm is unstable: the indices of equal elements will appear in ascending order. See also [`sortperm!`](@ref), [`partialsortperm`](@ref), [`invperm`](@ref), [`indexin`](@ref). @@ -1732,7 +1828,8 @@ end sort!(A; dims::Integer, alg::Algorithm=defalg(A), lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward) Sort the multidimensional array `A` along dimension `dims`. -See [`sort!`](@ref) for a description of possible keyword arguments. +See the one-dimensional version of [`sort!`](@ref) for a description of +possible keyword arguments. To sort slices of an array, refer to [`sortslices`](@ref). @@ -1886,8 +1983,8 @@ algorithm. Partial quick sort returns the smallest `k` elements sorted from smal to largest, finding them and sorting them using [`QuickSort`](@ref). Characteristics: - * *not stable*: does not preserve the ordering of elements which - compare equal (e.g. "a" and "A" in a sort of letters which + * *not stable*: does not preserve the ordering of elements that + compare equal (e.g. "a" and "A" in a sort of letters that ignores case). * *in-place* in memory. * *divide-and-conquer*: sort strategy similar to [`MergeSort`](@ref). @@ -1903,8 +2000,8 @@ Indicate that a sorting function should use the quick sort algorithm, which is *not* stable. Characteristics: - * *not stable*: does not preserve the ordering of elements which - compare equal (e.g. "a" and "A" in a sort of letters which + * *not stable*: does not preserve the ordering of elements that + compare equal (e.g. "a" and "A" in a sort of letters that ignores case). * *in-place* in memory. * *divide-and-conquer*: sort strategy similar to [`MergeSort`](@ref). @@ -1922,8 +2019,8 @@ subcollection at each step, until the entire collection has been recombined in sorted form. Characteristics: - * *stable*: preserves the ordering of elements which compare - equal (e.g. "a" and "A" in a sort of letters which ignores + * *stable*: preserves the ordering of elements that compare + equal (e.g. "a" and "A" in a sort of letters that ignores case). * *not in-place* in memory. * *divide-and-conquer* sort strategy. diff --git a/doc/src/base/base.md b/doc/src/base/base.md index 7922dd7d67861..d6ba437709128 100644 --- a/doc/src/base/base.md +++ b/doc/src/base/base.md @@ -126,6 +126,7 @@ Core.:(===) Core.isa Base.isequal Base.isless +Base.isunordered Base.ifelse Core.typeassert Core.typeof diff --git a/doc/src/base/sort.md b/doc/src/base/sort.md index e93d9716b1487..64a832a6599f7 100644 --- a/doc/src/base/sort.md +++ b/doc/src/base/sort.md @@ -203,7 +203,7 @@ Base.Sort.defalg(::AbstractArray{<:Union{SmallInlineStrings, Missing}}) = Inline The default sorting algorithm (returned by `Base.Sort.defalg`) is guaranteed to be stable since Julia 1.9. Previous versions had unstable edge cases when sorting numeric arrays. -## Alternate orderings +## Alternate Orderings By default, `sort` and related functions use [`isless`](@ref) to compare two elements in order to determine which should come first. The diff --git a/doc/src/manual/missing.md b/doc/src/manual/missing.md index 9bddcdfbb2ac2..8c8e801ccac9a 100644 --- a/doc/src/manual/missing.md +++ b/doc/src/manual/missing.md @@ -88,7 +88,7 @@ true ``` The [`isless`](@ref) operator is another exception: `missing` is considered -as greater than any other value. This operator is used by [`sort`](@ref), +as greater than any other value. This operator is used by [`sort!`](@ref), which therefore places `missing` values after all other values: ```jldoctest From 7256a034fff40c85f8793ddf0fdadcc6d080d680 Mon Sep 17 00:00:00 2001 From: Lilith Hafner Date: Fri, 20 Jan 2023 11:36:12 -0600 Subject: [PATCH 2/5] revise sort.md and docstrings in sort.jl, take 1 (part of PR #48363) --- base/sort.jl | 30 ++++++++++-- doc/src/base/sort.md | 109 ++++++++++++++----------------------------- 2 files changed, 61 insertions(+), 78 deletions(-) diff --git a/base/sort.jl b/base/sort.jl index 4247afde1fb84..1645ba8bf76af 100644 --- a/base/sort.jl +++ b/base/sort.jl @@ -1978,9 +1978,9 @@ struct MergeSortAlg <: Algorithm end """ PartialQuickSort{T <: Union{Integer,OrdinalRange}} -Indicate that a sorting function should use the partial quick sort -algorithm. Partial quick sort returns the smallest `k` elements sorted from smallest -to largest, finding them and sorting them using [`QuickSort`](@ref). +Indicate that a sorting function should use the partial quick sort algorithm. +Partial quick sort is like quick sort, but is only required to find and sort the +elements that would end up in `v[k]` were `v` fully sorted. Characteristics: * *not stable*: does not preserve the ordering of elements that @@ -1988,6 +1988,27 @@ Characteristics: ignores case). * *in-place* in memory. * *divide-and-conquer*: sort strategy similar to [`MergeSort`](@ref). + +Note that `PartialQuickSort(k)` does not necessarily sort the whole array. For example, + +```jldoctest +julia> x = rand(100); + +julia> k = 50:100; + +julia> s1 = sort(x; alg=QuickSort); + +julia> s2 = sort(x; alg=PartialQuickSort(k)); + +julia> map(issorted, (s1, s2)) +(true, false) + +julia> map(x->issorted(x[k]), (s1, s2)) +(true, true) + +julia> s1[k] == s2[k] +true +``` """ struct PartialQuickSort{T <: Union{Integer,OrdinalRange}} <: Algorithm k::T @@ -2022,7 +2043,8 @@ Characteristics: * *stable*: preserves the ordering of elements that compare equal (e.g. "a" and "A" in a sort of letters that ignores case). - * *not in-place* in memory. + * *not in-place* in memory — requires a temporary + array of half the size of the input array. * *divide-and-conquer* sort strategy. """ const MergeSort = MergeSortAlg() diff --git a/doc/src/base/sort.md b/doc/src/base/sort.md index 64a832a6599f7..2cf7a74849684 100644 --- a/doc/src/base/sort.md +++ b/doc/src/base/sort.md @@ -1,7 +1,7 @@ # Sorting and Related Functions -Julia has an extensive, flexible API for sorting and interacting with already-sorted arrays of -values. By default, Julia picks reasonable algorithms and sorts in standard ascending order: +Julia has an extensive, flexible API for sorting and interacting with already-sorted arrays +of values. By default, Julia picks reasonable algorithms and sorts in ascending order: ```jldoctest julia> sort([2,3,1]) @@ -11,7 +11,7 @@ julia> sort([2,3,1]) 3 ``` -You can easily sort in reverse order as well: +You can sort in reverse order as well: ```jldoctest julia> sort([2,3,1], rev=true) @@ -21,7 +21,8 @@ julia> sort([2,3,1], rev=true) 1 ``` -To sort an array in-place, use the "bang" version of the sort function: +`sort` constructs a sorted copy leaving its input unchanged. Use the "bang" version of +the sort function to mutate an existing array: ```jldoctest julia> a = [2,3,1]; @@ -35,8 +36,8 @@ julia> a 3 ``` -Instead of directly sorting an array, you can compute a permutation of the array's indices that -puts the array into sorted order: +Instead of directly sorting an array, you can compute a permutation of the array's +indices that puts the array into sorted order: ```julia-repl julia> v = randn(5) @@ -64,7 +65,7 @@ julia> v[p] 0.382396 ``` -Arrays can easily be sorted according to an arbitrary transformation of their values: +Arrays can be sorted according to an arbitrary transformation of their values: ```julia-repl julia> sort(v, by=abs) @@ -100,9 +101,12 @@ julia> sort(v, alg=InsertionSort) 0.382396 ``` -All the sorting and order related functions rely on a "less than" relation defining a total order +All the sorting and order related functions rely on a "less than" relation defining a +[strict partial order](https://en.wikipedia.org/wiki/Partially_ordered_set#Strict_partial_order) on the values to be manipulated. The `isless` function is invoked by default, but the relation -can be specified via the `lt` keyword. +can be specified via the `lt` keyword, a function that takes two array elements and returns true +if and only if the first argument is "less than" the second. See [Alternate orderings](@ref) for +more info. ## Sorting Functions @@ -134,65 +138,23 @@ Base.Sort.partialsortperm! ## Sorting Algorithms -There are currently four sorting algorithms available in base Julia: +There are currently four sorting algorithms publicly available in base Julia: * [`InsertionSort`](@ref) * [`QuickSort`](@ref) * [`PartialQuickSort(k)`](@ref) * [`MergeSort`](@ref) -`InsertionSort` is an O(n²) stable sorting algorithm. It is efficient for very small `n`, -and is used internally by `QuickSort`. +By default, the `sort` family of functions uses stable sorting algorithms that are fast +on most inputs. The exact algorithm choice is an implementation detail to allow for +future performance improvements. Currently, a hybrid of `RadixSort`, `ScratchQuickSort`, +`InsertionSort`, and `CountingSort` is used based on input type, size, and composition. +Implementation details are subject to change but currently availible in the extended help +of `??Base.DEFAULT_STABLE` and the docstrings of internal sorting algorithms listed there. -`QuickSort` is a very fast sorting algorithm with an average-case time complexity of -O(n log n). `QuickSort` is stable, i.e., elements considered equal will remain in the same -order. Notice that O(n²) is worst-case complexity, but it gets vanishingly unlikely as the -pivot selection is randomized. - -`PartialQuickSort(k::OrdinalRange)` is similar to `QuickSort`, but the output array is only -sorted in the range of `k`. For example: - -```jldoctest -julia> x = rand(1:500, 100); - -julia> k = 50:100; - -julia> s1 = sort(x; alg=QuickSort); - -julia> s2 = sort(x; alg=PartialQuickSort(k)); - -julia> map(issorted, (s1, s2)) -(true, false) - -julia> map(x->issorted(x[k]), (s1, s2)) -(true, true) - -julia> s1[k] == s2[k] -true -``` - -!!! compat "Julia 1.9" - The `QuickSort` and `PartialQuickSort` algorithms are stable since Julia 1.9. - -`MergeSort` is an O(n log n) stable sorting algorithm but is not in-place – it requires a temporary -array of half the size of the input array – and is typically not quite as fast as `QuickSort`. -It is the default algorithm for non-numeric data. - -The default sorting algorithms are chosen on the basis that they are fast and stable. -Usually, `QuickSort` is selected, but `InsertionSort` is preferred for small data. -You can also explicitly specify your preferred algorithm, e.g. -`sort!(v, alg=PartialQuickSort(10:20))`. - -The mechanism by which Julia picks default sorting algorithms is implemented via the -`Base.Sort.defalg` function. It allows a particular algorithm to be registered as the -default in all sorting functions for specific arrays. For example, here is the default -method from [`sort.jl`](https://github.com/JuliaLang/julia/blob/master/base/sort.jl): - -```julia -defalg(v::AbstractArray) = DEFAULT_STABLE -``` - -You may change the default behavior for specific types by defining new methods for `defalg`. +You can explicitly specify your preferred algorithm with the `alg` keyword +(e.g. `sort!(v, alg=PartialQuickSort(10:20))`) or reconfigure the default sorting algorithm +for a custom types by adding a specialized method to the `Base.Sort.defalg` function. For example, [InlineStrings.jl](https://github.com/JuliaStrings/InlineStrings.jl/blob/v1.3.2/src/InlineStrings.jl#L903) defines the following method: ```julia @@ -200,22 +162,21 @@ Base.Sort.defalg(::AbstractArray{<:Union{SmallInlineStrings, Missing}}) = Inline ``` !!! compat "Julia 1.9" - The default sorting algorithm (returned by `Base.Sort.defalg`) is guaranteed - to be stable since Julia 1.9. Previous versions had unstable edge cases when sorting numeric arrays. + The default sorting algorithm (returned by `Base.Sort.defalg`) is guaranteed to be stable + since Julia 1.9. Previous versions had unstable edge cases when sorting numeric arrays. ## Alternate Orderings -By default, `sort` and related functions use [`isless`](@ref) to compare two -elements in order to determine which should come first. The -[`Base.Order.Ordering`](@ref) abstract type provides a mechanism for defining -alternate orderings on the same set of elements. Instances of `Ordering` define -a [total order](https://en.wikipedia.org/wiki/Total_order) on a set of elements, -so that for any elements `a`, `b`, `c` the following hold: - -* Exactly one of the following is true: `a` is less than `b`, `b` is less than - `a`, or `a` and `b` are equal (according to [`isequal`](@ref)). -* The relation is transitive - if `a` is less than `b` and `b` is less than `c` - then `a` is less than `c`. +By default, `sort`, `searchsorted`, and related functions use [`isless`](@ref) to compare +two elements in order to determine which should come first. The +[`Base.Order.Ordering`](@ref) abstract type provides a mechanism for defining alternate +orderings on the same set of elements. Instances of `Ordering` define a +[strict partial order](https://en.wikipedia.org/wiki/Partially_ordered_set#Strict_partial_order). +To be a strict partial order, for any elements `a`, `b`, `c` the following hold: + +* if `a == b`, then `lt(a, b) == false`; +* `lt(a, b) && lt(b, a) == false`; and +* if `lt(a, b) && lt(b, c) == true`, then `lt(a, c) == true` The [`Base.Order.lt`](@ref) function works as a generalization of `isless` to test whether `a` is less than `b` according to a given order. From 9532291f56d5c3f06898d0ba8057df8bbafcaa18 Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Mon, 23 Jan 2023 13:00:30 -0600 Subject: [PATCH 3/5] Change "partial" to "weak" (part of PR #48363) Thanks @knuesel --- doc/src/base/sort.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/doc/src/base/sort.md b/doc/src/base/sort.md index 2cf7a74849684..2b737d81998f7 100644 --- a/doc/src/base/sort.md +++ b/doc/src/base/sort.md @@ -102,7 +102,7 @@ julia> sort(v, alg=InsertionSort) ``` All the sorting and order related functions rely on a "less than" relation defining a -[strict partial order](https://en.wikipedia.org/wiki/Partially_ordered_set#Strict_partial_order) +[strict weak order](https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings) on the values to be manipulated. The `isless` function is invoked by default, but the relation can be specified via the `lt` keyword, a function that takes two array elements and returns true if and only if the first argument is "less than" the second. See [Alternate orderings](@ref) for @@ -171,12 +171,12 @@ By default, `sort`, `searchsorted`, and related functions use [`isless`](@ref) t two elements in order to determine which should come first. The [`Base.Order.Ordering`](@ref) abstract type provides a mechanism for defining alternate orderings on the same set of elements. Instances of `Ordering` define a -[strict partial order](https://en.wikipedia.org/wiki/Partially_ordered_set#Strict_partial_order). -To be a strict partial order, for any elements `a`, `b`, `c` the following hold: +[strict weak order](https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings). +To be a strict weak order, for any elements `a`, `b`, `c` the following hold: -* if `a == b`, then `lt(a, b) == false`; -* `lt(a, b) && lt(b, a) == false`; and -* if `lt(a, b) && lt(b, c) == true`, then `lt(a, c) == true` +* `lt(a, b) && lt(b, a) === false`; +* if `lt(a, b) && lt(b, c)`, then `lt(a, c)`; and +* if `!lt(a, b) && !lt(b, c)`, then `!lt(a, c)` The [`Base.Order.lt`](@ref) function works as a generalization of `isless` to test whether `a` is less than `b` according to a given order. From 23178f70eb2b4aec961b339fcf18fef03aa9dc44 Mon Sep 17 00:00:00 2001 From: Lilith Hafner Date: Sat, 28 Jan 2023 09:21:09 -0600 Subject: [PATCH 4/5] remove changes that maybe shouldn't be backported --- base/operators.jl | 10 +- base/ordering.jl | 8 +- base/sort.jl | 223 +++++++++++--------------------------- doc/src/base/base.md | 1 - doc/src/base/sort.md | 51 +++++---- doc/src/manual/missing.md | 2 +- 6 files changed, 98 insertions(+), 197 deletions(-) diff --git a/base/operators.jl b/base/operators.jl index b4fbea547238e..da55981c5f7f8 100644 --- a/base/operators.jl +++ b/base/operators.jl @@ -154,13 +154,13 @@ Values that are normally unordered, such as `NaN`, are ordered after regular values. [`missing`](@ref) values are ordered last. -This is the default comparison used by [`sort!`](@ref). +This is the default comparison used by [`sort`](@ref). # Implementation Non-numeric types with a total order should implement this function. Numeric types only need to implement it if they have special values such as `NaN`. Types with a partial order should implement [`<`](@ref). -See the documentation on [Alternate Orderings](@ref) for how to define alternate +See the documentation on [Alternate orderings](@ref) for how to define alternate ordering methods that can be used in sorting and related functions. # Examples @@ -328,8 +328,6 @@ New types with a canonical partial order should implement this function for two arguments of the new type. Types with a canonical total order should implement [`isless`](@ref) instead. -See also [`isunordered`](@ref). - # Examples ```jldoctest julia> 'a' < 'b' @@ -1346,7 +1344,7 @@ corresponding position in `collection`. To get a vector indicating whether each in `items` is in `collection`, wrap `collection` in a tuple or a `Ref` like this: `in.(items, Ref(collection))` or `items .∈ Ref(collection)`. -See also: [`∉`](@ref), [`insorted`](@ref), [`contains`](@ref), [`occursin`](@ref), [`issubset`](@ref). +See also: [`∉`](@ref). # Examples ```jldoctest @@ -1384,6 +1382,8 @@ julia> [1, 2] .∈ ([2, 3],) 0 1 ``` + +See also: [`insorted`](@ref), [`contains`](@ref), [`occursin`](@ref), [`issubset`](@ref). """ in diff --git a/base/ordering.jl b/base/ordering.jl index 5383745b1dd1f..d0c9cb99f9c72 100644 --- a/base/ordering.jl +++ b/base/ordering.jl @@ -87,8 +87,8 @@ By(by) = By(by, Forward) """ Lt(lt) -`Ordering` that calls `lt(a, b)` to compare elements. `lt` must -obey the same rules as the `lt` parameter of [`sort!`](@ref). +`Ordering` which calls `lt(a, b)` to compare elements. `lt` should +obey the same rules as implementations of [`isless`](@ref). """ struct Lt{T} <: Ordering lt::T @@ -146,8 +146,8 @@ Construct an [`Ordering`](@ref) object from the same arguments used by Elements are first transformed by the function `by` (which may be [`identity`](@ref)) and are then compared according to either the function `lt` or an existing ordering `order`. `lt` should be [`isless`](@ref) or a function -that obeys the same rules as the `lt` parameter of [`sort!`](@ref). Finally, -the resulting order is reversed if `rev=true`. +which obeys similar rules. Finally, the resulting order is reversed if +`rev=true`. Passing an `lt` other than `isless` along with an `order` other than [`Base.Order.Forward`](@ref) or [`Base.Order.Reverse`](@ref) is not permitted, diff --git a/base/sort.jl b/base/sort.jl index 1645ba8bf76af..b3dbaf9ac2d79 100644 --- a/base/sort.jl +++ b/base/sort.jl @@ -63,8 +63,8 @@ end """ issorted(v, lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward) -Test whether a collection is in sorted order. The keywords modify what -order is considered sorted, as described in the [`sort!`](@ref) documentation. +Test whether a vector is in sorted order. The `lt`, `by` and `rev` keywords modify what +order is considered to be sorted just as they do for [`sort`](@ref). # Examples ```jldoctest @@ -79,9 +79,6 @@ false julia> issorted([(1, "b"), (2, "a")], by = x -> x[2], rev=true) true - -julia> issorted([1, 2, -2, 3], by=abs) -true ``` """ issorted(itr; @@ -97,17 +94,14 @@ maybeview(v, k) = view(v, k) maybeview(v, k::Integer) = v[k] """ - partialsort!(v, k; by=identity, lt=isless, rev=false) + partialsort!(v, k; by=, lt=, rev=false) -Partially sort the vector `v` in place so that the value at index `k` (or -range of adjacent values if `k` is a range) occurs +Partially sort the vector `v` in place, according to the order specified by `by`, `lt` and +`rev` so that the value at index `k` (or range of adjacent values if `k` is a range) occurs at the position where it would appear if the array were fully sorted. If `k` is a single index, that value is returned; if `k` is a range, an array of values at those indices is returned. Note that `partialsort!` may not fully sort the input array. -For the keyword arguments, see the documentation of [`sort!`](@ref). - - # Examples ```jldoctest julia> a = [1, 2, 4, 3, 4] @@ -154,9 +148,9 @@ partialsort!(v::AbstractVector, k::Union{Integer,OrdinalRange}; partialsort!(v, k, ord(lt,by,rev,order)) """ - partialsort(v, k, by=identity, lt=isless, rev=false) + partialsort(v, k, by=, lt=, rev=false) -Variant of [`partialsort!`](@ref) that copies `v` before partially sorting it, thereby returning the +Variant of [`partialsort!`](@ref) which copies `v` before partially sorting it, thereby returning the same thing as `partialsort!` but leaving `v` unmodified. """ partialsort(v::AbstractVector, k::Union{Integer,OrdinalRange}; kws...) = @@ -165,7 +159,7 @@ partialsort(v::AbstractVector, k::Union{Integer,OrdinalRange}; kws...) = # reference on sorted binary search: # http://www.tbray.org/ongoing/When/200x/2003/03/22/Binary -# index of the first value of vector a that is greater than or equivalent to x; +# index of the first value of vector a that is greater than or equal to x; # returns lastindex(v)+1 if x is greater than all values in v. function searchsortedfirst(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::keytype(v) where T<:Integer hi = hi + T(1) @@ -184,7 +178,7 @@ function searchsortedfirst(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::key return lo end -# index of the last value of vector a that is less than or equivalent to x; +# index of the last value of vector a that is less than or equal to x; # returns firstindex(v)-1 if x is less than all values of v. function searchsortedlast(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::keytype(v) where T<:Integer u = T(1) @@ -201,7 +195,7 @@ function searchsortedlast(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::keyt return lo end -# returns the range of indices of v equivalent to x +# returns the range of indices of v equal to x # if v does not contain x, returns a 0-length range # indicating the insertion point of x function searchsorted(v::AbstractVector, x, ilo::T, ihi::T, o::Ordering)::UnitRange{keytype(v)} where T<:Integer @@ -294,18 +288,14 @@ for s in [:searchsortedfirst, :searchsortedlast, :searchsorted] end """ - searchsorted(v, x; by=identity, lt=isless, rev=false) - -Return the range of indices in `v` where values are equivalent to `x`, or an -empty range located at the insertion point if `v` does not contain values -equivalent to `x`. The vector `v` must be sorted according to the order defined -by the keywords. Refer to [`sort!`](@ref) for the meaning of the keywords and -the definition of equivalence. + searchsorted(a, x; by=, lt=, rev=false) -The range is generally found using binary search, but there are optimized -implementations for `v` values that are ranges of real numbers. +Return the range of indices of `a` which compare as equal to `x` (using binary search) +according to the order specified by the `by`, `lt` and `rev` keywords, assuming that `a` +is already sorted in that order. Return an empty range located at the insertion point +if `a` does not contain values equal to `x`. -See also: [`searchsortedfirst`](@ref), [`sort!`](@ref), [`insorted`](@ref), [`findall`](@ref). +See also: [`insorted`](@ref), [`searchsortedfirst`](@ref), [`sort`](@ref), [`findall`](@ref). # Examples ```jldoctest @@ -323,25 +313,17 @@ julia> searchsorted([1, 2, 4, 5, 5, 7], 9) # no match, insert at end julia> searchsorted([1, 2, 4, 5, 5, 7], 0) # no match, insert at start 1:0 - -julia> searchsorted([1, -1, -2, 2, -2, 3, -4, 4], 2, by=abs) # sorted by absolute value, -2 equivalent to 2 -3:5 ``` """ searchsorted """ - searchsortedfirst(v, x; by=identity, lt=isless, rev=false) + searchsortedfirst(a, x; by=, lt=, rev=false) -Return the index of the first value in `v` greater than or equivalent to `x`. -If `x` is greater than all values in `v` the function returns `lastindex(v) + 1`. +Return the index of the first value in `a` greater than or equal to `x`, according to the +specified order. Return `lastindex(a) + 1` if `x` is greater than all values in `a`. +`a` is assumed to be sorted. -The vector `v` must be sorted according to the order defined by the keywords. -`insert!`ing `x` at the returned index will maintain the sorted order. Refer to -[`sort!`](@ref) for the meaning of the keywords and the definition of -"greater than" and equivalence. - -The index is generally found using binary search, but there are optimized -implementations for `v` values that are ranges of real numbers. +`insert!`ing `x` at this index will maintain sorted order. See also: [`searchsortedlast`](@ref), [`searchsorted`](@ref), [`findfirst`](@ref). @@ -361,24 +343,15 @@ julia> searchsortedfirst([1, 2, 4, 5, 5, 7], 9) # no match, insert at end julia> searchsortedfirst([1, 2, 4, 5, 5, 7], 0) # no match, insert at start 1 - -julia> searchsortedfirst([1, -1, -2, 2, -2, 3, -4, 4], 2, by=abs) # sorted by absolute value -3 ``` """ searchsortedfirst """ - searchsortedlast(v, x; by=identity, lt=isless, rev=false) - -Return the index of the last value in `v` less than or equivalent to `x`. -If `x` is less than all values in `v` the function returns `firstindex(v) - 1`. - -The vector `v` must be sorted according to the order defined by the keywords. -Refer to [`sort!`](@ref) for the meaning of the keywords and the definition of -"less than" and equivalence. + searchsortedlast(a, x; by=, lt=, rev=false) -The index is generally found using binary search, but there are optimized -implementations for `v` values that are ranges of real numbers. +Return the index of the last value in `a` less than or equal to `x`, according to the +specified order. Return `firstindex(a) - 1` if `x` is less than all values in `a`. `a` is +assumed to be sorted. # Examples ```jldoctest @@ -396,22 +369,16 @@ julia> searchsortedlast([1, 2, 4, 5, 5, 7], 9) # no match, insert at end julia> searchsortedlast([1, 2, 4, 5, 5, 7], 0) # no match, insert at start 0 - -julia> searchsortedlast([1, -1, -2, 2, -2, 3, -4, 4], 2, by=abs) # sorted by absolute value -5 ``` """ searchsortedlast """ - insorted(x, v; by=identity, lt=isless, rev=false) -> Bool - -Determine whether a vector `v` contains any value equivalent to `x`. -The vector `v` must be sorted according to the order defined by the keywords. -Refer to [`sort!`](@ref) for the meaning of the keywords and the definition of -equivalence. + insorted(x, a; by=, lt=, rev=false) -> Bool -The check is generally done using binary search, but there are optimized -implementations for `v` values that are ranges of real numbers. +Determine whether an item `x` is in the sorted collection `a`, in the sense that +it is [`==`](@ref) to one of the values of the collection according to the order +specified by the `by`, `lt` and `rev` keywords, assuming that `a` is already +sorted in that order, see [`sort`](@ref) for the keywords. See also [`in`](@ref). @@ -431,9 +398,6 @@ false julia> insorted(0, [1, 2, 4, 5, 5, 7]) # no match false - -julia> insorted(2, [1, -1, -2, 3, -4, 4], by=abs) # sorted by absolute value -true ``` !!! compat "Julia 1.6" @@ -760,8 +724,8 @@ Insertion sort traverses the collection one element at a time, inserting each element into its correct, sorted position in the output vector. Characteristics: -* *stable*: preserves the ordering of elements that compare equal -(e.g. "a" and "A" in a sort of letters that ignores case). +* *stable*: preserves the ordering of elements which compare equal +(e.g. "a" and "A" in a sort of letters which ignores case). * *in-place* in memory. * *quadratic performance* in the number of elements to be sorted: it is well-suited to small collections but should not be used for large ones. @@ -1001,8 +965,8 @@ is treated as the first or last index of the input, respectively. `lo` and `hi` may be specified together as an `AbstractUnitRange`. Characteristics: - * *stable*: preserves the ordering of elements that compare equal - (e.g. "a" and "A" in a sort of letters that ignores case). + * *stable*: preserves the ordering of elements which compare equal + (e.g. "a" and "A" in a sort of letters which ignores case). * *not in-place* in memory. * *divide-and-conquer*: sort strategy similar to [`QuickSort`](@ref). * *linear runtime* if `length(lo:hi)` is constant @@ -1359,52 +1323,15 @@ defalg(v::AbstractArray{Union{}}) = DEFAULT_UNSTABLE # for method disambiguation """ sort!(v; alg::Algorithm=defalg(v), lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward) -Sort the vector `v` in place. A stable algorithm is used by default. A specific -algorithm can be selected via the `alg` keyword (see [Sorting Algorithms](@ref) -for available algorithms). - -Elements are first transformed with the function `by` and then compared -according to either the function `lt` or the ordering `order`. Finally, the -resulting order is reversed if `rev=true`. The current implemention applies the -`by` transformation before each comparison rather than once per element. - -Passing an `lt` other than `isless` along with an `order` other than -[`Base.Order.Forward`](@ref) or [`Base.Order.Reverse`](@ref) is not permitted, -otherwise all options are independent and can be used together in all possible -combinations. Note that `order` can also include a "by" transformation, in -which case it is applied after that defined with the `by` keyword. For more -information on `order` values see the documentation on [Alternate -Orderings](@ref). - -Relations between two elements are defined as follows (with "less" and -"greater" exchanged when `rev=true`): - -* `x` is less than `y` if `lt(by(x), by(y))` (or `Base.Order.lt(order, by(x), by(y))`) yields true. -* `x` is greater than `y` if `y` is less than `x`. -* `x` and `y` are equivalent if neither is less than the other ("incomparable" - is sometimes used as a synonym for "equivalent"). - -The result of `sort!` is sorted in the sense that every element is greater than -or equivalent to the previous one. - -The `lt` function must define a strict weak order, that is, it must be - -* irreflexive: `lt(x, x)` always yields `false`, -* asymmetric: if `lt(x, y)` yields `true` then `lt(y, x)` yields `false`, -* transitive: `lt(x, y) && lt(y, z)` implies `lt(x, z)`, -* transitive in equivalence: `!lt(x, y) && !lt(y, x)` and `!lt(y, z) && !lt(z, - y)` together imply `!lt(x, z) && !lt(z, x)`. In words: if `x` and `y` are - equivalent and `y` and `z` are equivalent then `x` and `z` must be - equivalent. - -For example `<` is a valid `lt` function for `Int` values but `≤` is not: it -violates irreflexivity. For `Float64` values even `<` is invalid as it violates -the fourth condition: `1.0` and `NaN` are equivalent and so are `NaN` and `2.0` -but `1.0` and `2.0` are not equivalent. - -See also [`sort`](@ref), [`sortperm`](@ref), [`sortslices`](@ref), -[`partialsort!`](@ref), [`partialsortperm`](@ref), [`issorted`](@ref), -[`searchsorted`](@ref), [`insorted`](@ref), [`Base.Order.ord`](@ref). +Sort the vector `v` in place. A stable algorithm is used by default. You can select a +specific algorithm to use via the `alg` keyword (see [Sorting Algorithms](@ref) for +available algorithms). The `by` keyword lets you provide a function that will be applied to +each element before comparison; the `lt` keyword allows providing a custom "less than" +function (note that for every `x` and `y`, only one of `lt(x,y)` and `lt(y,x)` can return +`true`); use `rev=true` to reverse the sorting order. These options are independent and can +be used together in all possible combinations: if both `by` and `lt` are specified, the `lt` +function is applied to the result of the `by` function; `rev=true` reverses whatever +ordering specified via the `by` and `lt` keywords. # Examples ```jldoctest @@ -1431,29 +1358,6 @@ julia> v = [(1, "c"), (3, "a"), (2, "b")]; sort!(v, by = x -> x[2]); v (3, "a") (2, "b") (1, "c") - -julia> sort(0:3, by=x->x-2, order=Base.Order.By(abs)) # same as sort(0:3, by=abs(x->x-2)) -4-element Vector{Int64}: - 2 - 1 - 3 - 0 - -julia> sort([2, NaN, 1, NaN, 3]) # correct sort with default lt=isless -5-element Vector{Float64}: - 1.0 - 2.0 - 3.0 - NaN - NaN - -julia> sort([2, NaN, 1, NaN, 3], lt=<) # wrong sort due to invalid lt -5-element Vector{Float64}: - 2.0 - NaN - 1.0 - NaN - 3.0 ``` """ function sort!(v::AbstractVector{T}; @@ -1494,15 +1398,15 @@ sort(v::AbstractVector; kws...) = sort!(copymutable(v); kws...) ## partialsortperm: the permutation to sort the first k elements of an array ## """ - partialsortperm(v, k; by=ientity, lt=isless, rev=false) + partialsortperm(v, k; by=, lt=, rev=false) Return a partial permutation `I` of the vector `v`, so that `v[I]` returns values of a fully sorted version of `v` at index `k`. If `k` is a range, a vector of indices is returned; if `k` is an integer, a single index is returned. The order is specified using the same -keywords as `sort!`. The permutation is stable: the indices of equal elements -will appear in ascending order. +keywords as `sort!`. The permutation is stable, meaning that indices of equal elements +appear in ascending order. -This function is equivalent to, but more efficient than, calling `sortperm(...)[k]`. +Note that this function is equivalent to, but more efficient than, calling `sortperm(...)[k]`. # Examples ```jldoctest @@ -1528,7 +1432,7 @@ partialsortperm(v::AbstractVector, k::Union{Integer,OrdinalRange}; kwargs...) = partialsortperm!(similar(Vector{eltype(k)}, axes(v,1)), v, k; kwargs...) """ - partialsortperm!(ix, v, k; by=identity, lt=isless, rev=false) + partialsortperm!(ix, v, k; by=, lt=, rev=false) Like [`partialsortperm`](@ref), but accepts a preallocated index vector `ix` the same size as `v`, which is used to store (a permutation of) the indices of `v`. @@ -1594,7 +1498,7 @@ end Return a permutation vector or array `I` that puts `A[I]` in sorted order along the given dimension. If `A` has more than one dimension, then the `dims` keyword argument must be specified. The order is specified using the same keywords as [`sort!`](@ref). The permutation is guaranteed to be stable even -if the sorting algorithm is unstable: the indices of equal elements will appear in +if the sorting algorithm is unstable, meaning that indices of equal elements appear in ascending order. See also [`sortperm!`](@ref), [`partialsortperm`](@ref), [`invperm`](@ref), [`indexin`](@ref). @@ -1828,8 +1732,7 @@ end sort!(A; dims::Integer, alg::Algorithm=defalg(A), lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward) Sort the multidimensional array `A` along dimension `dims`. -See the one-dimensional version of [`sort!`](@ref) for a description of -possible keyword arguments. +See [`sort!`](@ref) for a description of possible keyword arguments. To sort slices of an array, refer to [`sortslices`](@ref). @@ -1978,18 +1881,18 @@ struct MergeSortAlg <: Algorithm end """ PartialQuickSort{T <: Union{Integer,OrdinalRange}} -Indicate that a sorting function should use the partial quick sort algorithm. -Partial quick sort is like quick sort, but is only required to find and sort the -elements that would end up in `v[k]` were `v` fully sorted. +Indicate that a sorting function should use the partial quick sort +algorithm. Partial quick sort returns the smallest `k` elements sorted from smallest +to largest, finding them and sorting them using [`QuickSort`](@ref). Characteristics: - * *not stable*: does not preserve the ordering of elements that - compare equal (e.g. "a" and "A" in a sort of letters that + * *not stable*: does not preserve the ordering of elements which + compare equal (e.g. "a" and "A" in a sort of letters which ignores case). * *in-place* in memory. * *divide-and-conquer*: sort strategy similar to [`MergeSort`](@ref). -Note that `PartialQuickSort(k)` does not necessarily sort the whole array. For example, + Note that `PartialQuickSort(k)` does not necessarily sort the whole array. For example, ```jldoctest julia> x = rand(100); @@ -2008,7 +1911,6 @@ julia> map(x->issorted(x[k]), (s1, s2)) julia> s1[k] == s2[k] true -``` """ struct PartialQuickSort{T <: Union{Integer,OrdinalRange}} <: Algorithm k::T @@ -2021,8 +1923,8 @@ Indicate that a sorting function should use the quick sort algorithm, which is *not* stable. Characteristics: - * *not stable*: does not preserve the ordering of elements that - compare equal (e.g. "a" and "A" in a sort of letters that + * *not stable*: does not preserve the ordering of elements which + compare equal (e.g. "a" and "A" in a sort of letters which ignores case). * *in-place* in memory. * *divide-and-conquer*: sort strategy similar to [`MergeSort`](@ref). @@ -2040,12 +1942,13 @@ subcollection at each step, until the entire collection has been recombined in sorted form. Characteristics: - * *stable*: preserves the ordering of elements that compare - equal (e.g. "a" and "A" in a sort of letters that ignores + * *stable*: preserves the ordering of elements which compare + equal (e.g. "a" and "A" in a sort of letters which ignores case). - * *not in-place* in memory — requires a temporary - array of half the size of the input array. + * *not in-place* in memory. * *divide-and-conquer* sort strategy. + * *good performance* for large collections but typically not quite as + fast as [`QuickSort`](@ref). """ const MergeSort = MergeSortAlg() diff --git a/doc/src/base/base.md b/doc/src/base/base.md index d6ba437709128..7922dd7d67861 100644 --- a/doc/src/base/base.md +++ b/doc/src/base/base.md @@ -126,7 +126,6 @@ Core.:(===) Core.isa Base.isequal Base.isless -Base.isunordered Base.ifelse Core.typeassert Core.typeof diff --git a/doc/src/base/sort.md b/doc/src/base/sort.md index 2b737d81998f7..4af6c866bff46 100644 --- a/doc/src/base/sort.md +++ b/doc/src/base/sort.md @@ -1,7 +1,7 @@ # Sorting and Related Functions -Julia has an extensive, flexible API for sorting and interacting with already-sorted arrays -of values. By default, Julia picks reasonable algorithms and sorts in ascending order: +Julia has an extensive, flexible API for sorting and interacting with already-sorted arrays of +values. By default, Julia picks reasonable algorithms and sorts in standard ascending order: ```jldoctest julia> sort([2,3,1]) @@ -11,7 +11,7 @@ julia> sort([2,3,1]) 3 ``` -You can sort in reverse order as well: +You can easily sort in reverse order as well: ```jldoctest julia> sort([2,3,1], rev=true) @@ -36,8 +36,8 @@ julia> a 3 ``` -Instead of directly sorting an array, you can compute a permutation of the array's -indices that puts the array into sorted order: +Instead of directly sorting an array, you can compute a permutation of the array's indices that +puts the array into sorted order: ```julia-repl julia> v = randn(5) @@ -65,7 +65,7 @@ julia> v[p] 0.382396 ``` -Arrays can be sorted according to an arbitrary transformation of their values: +Arrays can easily be sorted according to an arbitrary transformation of their values: ```julia-repl julia> sort(v, by=abs) @@ -101,12 +101,9 @@ julia> sort(v, alg=InsertionSort) 0.382396 ``` -All the sorting and order related functions rely on a "less than" relation defining a -[strict weak order](https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings) +All the sorting and order related functions rely on a "less than" relation defining a total order on the values to be manipulated. The `isless` function is invoked by default, but the relation -can be specified via the `lt` keyword, a function that takes two array elements and returns true -if and only if the first argument is "less than" the second. See [Alternate orderings](@ref) for -more info. +can be specified via the `lt` keyword. ## Sorting Functions @@ -162,21 +159,23 @@ Base.Sort.defalg(::AbstractArray{<:Union{SmallInlineStrings, Missing}}) = Inline ``` !!! compat "Julia 1.9" - The default sorting algorithm (returned by `Base.Sort.defalg`) is guaranteed to be stable - since Julia 1.9. Previous versions had unstable edge cases when sorting numeric arrays. - -## Alternate Orderings - -By default, `sort`, `searchsorted`, and related functions use [`isless`](@ref) to compare -two elements in order to determine which should come first. The -[`Base.Order.Ordering`](@ref) abstract type provides a mechanism for defining alternate -orderings on the same set of elements. Instances of `Ordering` define a -[strict weak order](https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings). -To be a strict weak order, for any elements `a`, `b`, `c` the following hold: - -* `lt(a, b) && lt(b, a) === false`; -* if `lt(a, b) && lt(b, c)`, then `lt(a, c)`; and -* if `!lt(a, b) && !lt(b, c)`, then `!lt(a, c)` + The default sorting algorithm (returned by `Base.Sort.defalg`) is guaranteed to + be stable since Julia 1.9. Previous versions had unstable edge cases when + sorting numeric arrays. + +## Alternate orderings + +By default, `sort` and related functions use [`isless`](@ref) to compare two +elements in order to determine which should come first. The +[`Base.Order.Ordering`](@ref) abstract type provides a mechanism for defining +alternate orderings on the same set of elements. Instances of `Ordering` define +a [total order](https://en.wikipedia.org/wiki/Total_order) on a set of elements, +so that for any elements `a`, `b`, `c` the following hold: + +* Exactly one of the following is true: `a` is less than `b`, `b` is less than + `a`, or `a` and `b` are equal (according to [`isequal`](@ref)). +* The relation is transitive - if `a` is less than `b` and `b` is less than `c` + then `a` is less than `c`. The [`Base.Order.lt`](@ref) function works as a generalization of `isless` to test whether `a` is less than `b` according to a given order. diff --git a/doc/src/manual/missing.md b/doc/src/manual/missing.md index 8c8e801ccac9a..9bddcdfbb2ac2 100644 --- a/doc/src/manual/missing.md +++ b/doc/src/manual/missing.md @@ -88,7 +88,7 @@ true ``` The [`isless`](@ref) operator is another exception: `missing` is considered -as greater than any other value. This operator is used by [`sort!`](@ref), +as greater than any other value. This operator is used by [`sort`](@ref), which therefore places `missing` values after all other values: ```jldoctest From 3848224a6ea52066c0bcc4314c5a8fce957bbbcd Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Sun, 29 Jan 2023 13:34:31 -0600 Subject: [PATCH 5/5] Apply suggestions from code review Thanks @cormullion --- doc/src/base/sort.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/src/base/sort.md b/doc/src/base/sort.md index 4af6c866bff46..41b7096391a04 100644 --- a/doc/src/base/sort.md +++ b/doc/src/base/sort.md @@ -146,12 +146,12 @@ By default, the `sort` family of functions uses stable sorting algorithms that a on most inputs. The exact algorithm choice is an implementation detail to allow for future performance improvements. Currently, a hybrid of `RadixSort`, `ScratchQuickSort`, `InsertionSort`, and `CountingSort` is used based on input type, size, and composition. -Implementation details are subject to change but currently availible in the extended help +Implementation details are subject to change but currently available in the extended help of `??Base.DEFAULT_STABLE` and the docstrings of internal sorting algorithms listed there. You can explicitly specify your preferred algorithm with the `alg` keyword (e.g. `sort!(v, alg=PartialQuickSort(10:20))`) or reconfigure the default sorting algorithm -for a custom types by adding a specialized method to the `Base.Sort.defalg` function. +for custom types by adding a specialized method to the `Base.Sort.defalg` function. For example, [InlineStrings.jl](https://github.com/JuliaStrings/InlineStrings.jl/blob/v1.3.2/src/InlineStrings.jl#L903) defines the following method: ```julia