Skip to content

performance in sched_yield (multisocket) #900

@phirus2

Description

@phirus2

Hi!!!
I'm using mumps with openblas 0.2.18 compiled from sources with this options:

  • gcc 5.3.0
  • USE_THREAD = 1
  • NUM_THREADS = 128
  • NO_WARMUP = 1
  • #NO_AFFINITY = 1
  • #BIGNUMA = 1
  • MAX_STACK_ALLOC = 8128

I'm running my code in some machines multisocket (2 or 4) with 24 until 128 cores

The problem is that most of the time is executed by the system. To be more acurate is doing not I/O is doing Context switch. You can see it in the below pictures
spin_time
sched_time

This doesn't happend in 1 socket machine, but if I force to use just one CPU (taskset -c 0-31 MyAPP) the performance is also poor.

What can I do to give you more information and try to help¿¿¿

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions