Skip to content

Conversation

rlschuller
Copy link
Contributor

As defined by Aitchison 1986.

@codecov-commenter
Copy link

codecov-commenter commented Sep 10, 2021

Codecov Report

Merging #20 (5f56023) into master (54c3f44) will decrease coverage by 14.27%.
The diff coverage is 0.00%.

❗ Current head 5f56023 differs from pull request most recent head 089918c. Consider uploading reports for the commit 089918c to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##           master      #20       +/-   ##
===========================================
- Coverage   82.10%   67.82%   -14.28%     
===========================================
  Files           3        4        +1     
  Lines          95      115       +20     
===========================================
  Hits           78       78               
- Misses         17       37       +20     
Impacted Files Coverage Δ
src/matrices.jl 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 54c3f44...089918c. Read the comment docs.

Copy link
Member

@juliohm juliohm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. This is a first round of review before we can do more optimizations. I would also review the index style IndexLinear() is what we want? I have to review the interface myself, but I thought that Cartesian index was desired.

After you incorporate the suggestions and review the index style, please go ahead and add some tests. That we can see it in action.

src/matrices.jl Outdated
Comment on lines 6 to 7
UnitMatrix{T}(d)
UnitColumn{T}(d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename these to OnesMatrix and OnesColumn to remove ambiguity with the unit element.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, let's simply call it JMatrix and JColumn to match the notation. Otherwise we will have a difficult time finding good names for the other matrices below.

src/matrices.jl Outdated
Comment on lines 8 to 10
F{T}(d)
G{T}(N)
H{T}(d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename these to FMatrix, GMatrix and HMatrix. And then we create an alias F, G, H for the type and export it. See the I example in the standard library LinearAlgebra.

src/matrices.jl Outdated
Comment on lines 12 to 19
Elementary matrices as defined by Aitchison 1986.

## Examples

```julia
julia> UnitMatrix(5)
julia> UnitColumn{Int}(2)
julia> G(4)
julia> H{Float64}(2)
```

When the type of the elements isn't explicitly defined, `Float64` is used.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that docstrings apply to a specific struct or function. Please split this string into multiple docstrings right before the structs defining the matrices. Also remove the leading empty line between the string and the struct so that Julia doc system can find the association.

src/matrices.jl Outdated
struct UnitColumn{T} <: AbstractArray{T, 1}
d::Int
end
UnitColumn(d::Int) = UnitColumn{Float64}(d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should default to Bool because then Julia will use its promote rules with other types when needed. If you specify Float64 you are already forcing double precision and inhibiting some applications (ex: GPU with Float32).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since my knowledge of the Julia's compiler is limited, I'd like to document a few bullet points that influenced my decision to use Float64 as a default:

  • I saw that the Identity Matrix was implemented using Bool by default, but my idea was to avoid casting during the operations, which can hinder AVX's performance.

  • We need to also keep in mind that the auto casting is non-associative and non-commutative, ie,

julia> typeof(Float32(1) * true/( true + true ))
Float32

julia> typeof(Float32(1) * (true/( true + true )))
Float64

julia> typeof( true/( true + true ) * Float32(1))
Float64

which means that auto casting at compilation time probably isn't possible for all cases in Julia.

  • The default promotions in Julia are to Int64 and to Float64, so for GPU(s) I'd sitck to manual casting.

With those bullet points in mind, I see no problem in switching to Bool and Int whenever possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I saw that the Identity Matrix was implemented using Bool by default, but my idea was to avoid casting during the operations, which can hinder AVX's performance.

For some of these special matrices we will optimize the matrix-vector multiplication in a way that doesn't even use the type stored in the struct. For example, multiplying the matrix J by any vector x is equivalent to producing a new vector where the entries are sum(x). That is the beauty of Julia. If you read the entire LinearAlgebra example with the identity matrix, you will see that multiplications and many other operations are completely erased at compile time. We will do the same here after the basic struct is ready for tests.

src/matrices.jl Outdated
UnitColumn(d::Int) = UnitColumn{Float64}(d)
Base.size(j::UnitColumn) = (j.d,)
Base.IndexStyle(::Type{UnitColumn}) = IndexLinear()
Base.getindex(j::UnitColumn, i::Int) = convert(eltype(j), 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Base.getindex(j::UnitColumn, i::Int) = convert(eltype(j), 1)
Base.getindex(j::UnitColumn{T}, i) where {T} = one(T)

src/matrices.jl Outdated
Base.getindex(J::UnitMatrix, i::Int, j::Int) = convert(eltype(J), 1)


struct UnitColumn{T} <: AbstractArray{T, 1}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
struct UnitColumn{T} <: AbstractArray{T, 1}
struct UnitColumn{T} <: AbstractVector{T}

src/matrices.jl Outdated
F(d::Int) = F{Float64}(d)
Base.size(A::F) = (A.d, A.d+1)
Base.IndexStyle(::Type{F}) = IndexLinear()
Base.getindex(A::F, i::Int, j::Int) = convert(eltype(A), ifelse(i==j, 1, 0) - (j==(A.d+1)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using convert(eltype(A), x) you can use one(T) and zero(T) to get the unit and zero elements for the type T. This type T can be specified in the argument as A::F{T} where {T}. See the other suggestion above for a concrete example.

src/matrices.jl Outdated
Base.getindex(A::F, i::Int, j::Int) = convert(eltype(A), ifelse(i==j, 1, 0) - (j==(A.d+1)))

struct G{T} <: AbstractArray{T, 2}
N::Int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use 2 white spaces for indentation to follow the existing convention in the source.

src/matrices.jl Outdated
G(N::Int) = G{Float64}(N)
Base.size(A::G) = (A.N, A.N)
Base.IndexStyle(::Type{G}) = IndexLinear()
Base.getindex(A::G, i::Int, j::Int) = convert(eltype(A), ifelse(i==j, 1, 0) - 1.0/A.N)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment applies, you can use one(T) and zero(T) to simplify the expression. Also, you can remove the specification i::Int and just type i. That way people can index with other kinds of integers.

src/matrices.jl Outdated
H(d::Int) = H{Float64}(d)
Base.size(A::H) = (A.d, A.d)
Base.IndexStyle(::Type{H}) = IndexLinear()
Base.getindex(A::H, i::Int, j::Int) = convert(eltype(A), (ifelse(i==j, 2, 1)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment applies.

@rlschuller
Copy link
Contributor Author

rlschuller commented Sep 10, 2021

About the IndexStyle

The documentation about the IndexStyle seems somewhat vague to me, since it lacks the explanation of the underlying implementations for AbstractArray specifically. I'll check the code to understand what's truly happening at the low level and issue a new PR as soon as possible.

Both definitions tell the user what the styles do, without addressing the all important how. In particular, IndexCartesian() cites divisions, which doesn't make sense for contiguous arrays.

An extra multiplication happening at the registers every N columns is certainly cheaper than an extra memory access for every element access. However, if we want the convenience of using A[i, j] every time, it is possible that IndexCartesian() - if implemented properly - could spare the CPU of a few multiplications (by not multiplying i*N every access, but only once per line).

Usually in C/C++, the compiler is smart enough to optimize nested loops regardless of the indexation method - as long as the transverse order is the same. Bellow follows the corresponding documentation entries, taken from: .

IndexLinear()

Subtype of IndexStyle used to describe arrays which are optimally indexed by one linear index.

A linear indexing style uses one integer index to describe the position in the array (even if it's a multidimensional array) and column-major ordering is used to efficiently access the elements. This means that requesting eachindex from an array that is IndexLinear will return a simple one-dimensional range, even if it is multidimensional.

A custom array that reports its IndexStyle as IndexLinear only needs to implement indexing (and indexed assignment) with a single Int index; all other indexing expressions — including multidimensional accesses — will be recomputed to the linear index. For example, if A were a 2×3 custom matrix with linear indexing, and we referenced A[1, 3], this would be recomputed to the equivalent linear index and call A[5] since 2*1 + 3 = 5.

IndexCartesian()

Subtype of IndexStyle used to describe arrays which are optimally indexed by a Cartesian index. This is the default for new custom AbstractArray subtypes.

A Cartesian indexing style uses multiple integer indices to describe the position in a multidimensional array, with exactly one index per dimension. This means that requesting eachindex from an array that is IndexCartesian will return a range of CartesianIndices.

A N-dimensional custom array that reports its IndexStyle as IndexCartesian needs to implement indexing (and indexed assignment) with exactly N Int indices; all other indexing expressions — including linear indexing — will be recomputed to the equivalent Cartesian location. For example, if A were a 2×3 custom matrix with cartesian indexing, and we referenced A[5], this would be recomputed to the equivalent Cartesian index and call A[1, 3] since 5 = 2*1 + 3.

It is significantly more expensive to compute Cartesian indices from a linear index than it is to go the other way. The former operation requires division — a very costly operation — whereas the latter only uses multiplication and addition and is essentially free. This asymmetry means it is far more costly to use linear indexing with an IndexCartesian array than it is to use Cartesian indexing with an IndexLinear array.

@juliohm
Copy link
Member

juliohm commented Sep 10, 2021

Very good excerpts from the docs, thanks. Would you agree that the choice of index style is a function of how we intend to use these arrays in typical workflows? I am assuming that most algorithms that we will implement with these matrices will be expressed in terms of Cartesian indices. If we choose IndexCartesian as the style we will have straight access of 1s or -1s without any index arithmetic (a constant that the compiler can propagate). On the other hand if we choose IndexLinear, then any access like J[i,j] inside algorithms will be performing basic arithmetic. If you agree with these observations, we should probably pick IndexCartesian.

@rlschuller
Copy link
Contributor Author

We (now) agree on all of the suggestions.

I've done a few benchmarks (attached Pluto notebook), and the conclusion is that choosing any of the styles doesn't affect the performance very much. However, both CartesianIndices() and LinearIndices() on for loops should be avoided in performance sensitive applications: they run between 1 and 2 orders of magnitude slower than for loops with a more traditional syntax.

benchmark.jl.zip

@juliohm
Copy link
Member

juliohm commented Sep 11, 2021

The results of your benchmark are very different for me. I would not use Pluto for that purpose. Please use a raw Julia script and use the BenchmarkTools.jl package to assess the distribution of times. All the cells of the notebook are around 100ms for me on the following machine:

julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 8

Also, you tend to focus too much on performance prematurely. Let's get the core functionality first, and then later we can optimize the code for performance.

@juliohm
Copy link
Member

juliohm commented Sep 11, 2021

My suggestion is to first fix the name of the structs as suggested, then use IndexCartesian style for matrices and IndexLinear style for columns. After that, I suggest writing tests with basic matrix multiplications and identities from the chapters of the book.

Only then, after everything is working we can start thinking about performance.

@rlschuller
Copy link
Contributor Author

rlschuller commented Sep 11, 2021

Pluto isn't the problem. Apparently it's a known bug, fixed at the end of 2020: a huge overhead when using CartesianIndices and LinearIndices.

I was using Ubuntu 21.04, with the default Julia version (1.5.3). Considering that this is a non LTS version of Ubuntu, it's likely that most non rolling-release distros will have similar performance problems in their package managed versions of Julia.

While I agree that early optimization is a pretty bad idea, a few sanity performance checks can be pretty useful: both to catch bugs and to check if the compilers/interpreters are doing anything unusual.

@juliohm
Copy link
Member

juliohm commented Sep 11, 2021

Pluto isn't the problem.

Pluto isn't the right place to share benchmarks. I had to open a Pluto session just to reproduce the times, which btw vary in multiple runs of the notebook. As I mentioned, the correct way to benchmark code is BenchmarkTools.jl, the @benchmark macro. It is much more reliable than looking at Pluto cell times.

I was using Ubuntu 21.04, with the default Julia version (1.5.3). Considering that this is a non LTS version of Ubuntu, it's likely that most non rolling-release distros will have similar performance problems in their package managed versions of Julia.

And we keep discussing issues that are irrelevant for the task at hand. As you can see, the current version of the PR is unmergeable, it doesn't have basic tests, and we are discussing performance on a specific Ubuntu distribution with an old release of the language...

While I agree that early optimization is a pretty bad idea, a few sanity performance checks can be pretty useful: both to catch bugs and to check if the compilers/interpreters are doing anything unusual.

Do you agree that this is not the case here? You are trying to optimize something that is extremely low-level and likely irrelevant at this point in time. I agree with you that it is good to learn and be skilled in this low-level optimizations, it is just not the right time. I urge you to fight this tendency and focus on the task. That way we will be more efficient together, and I am sure we will do wonderful things. There will be lots of opportunity to investigate performance if that is something that is blocking our work.

@juliohm
Copy link
Member

juliohm commented Sep 11, 2021

Is there anything that I can do to help you in the process? What about writing down a list of steps you plan to achieve to clarify priorities? I always do that and it helps me a lot.

@juliohm
Copy link
Member

juliohm commented Sep 11, 2021

And just to be clear, I really appreciate your skills. It is not easy to find someone with such great knowledge of performance optimizations and attention to low-level details. I strongly believe that you will outperform yourself by a lot with very minor tweaks here and there. Please don't bother if my comments sound harsh sometimes, I am doing it with the best intentions.

@rlschuller
Copy link
Contributor Author

I'm learning a lot with the comments, and I appreciate the patience of looking things thoroughly. :)

@juliohm
Copy link
Member

juliohm commented Sep 11, 2021

Looking forward to growing together 💯 Let me know when you have an updated PR ready for review.

As defined by Aitchison 1986.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants