performance in sched_yield (multisocket)

Hi!!!
I'm using mumps with openblas 0.2.18 compiled from sources with this options:
- gcc 5.3.0
- USE_THREAD = 1
- NUM_THREADS = 128
- NO_WARMUP = 1
- #NO_AFFINITY = 1
- #BIGNUMA = 1
-  MAX_STACK_ALLOC = 8128

I'm running my code in some machines multisocket (2 or 4) with 24 until 128 cores

The problem is that most of the time is executed by the system. To be more acurate is doing not I/O is doing Context switch. You can see it in the below pictures
![spin_time](https://cloud.githubusercontent.com/assets/19668974/15677414/788d31e2-274a-11e6-9da3-493261b30eff.png)
![sched_time](https://cloud.githubusercontent.com/assets/19668974/15677415/78e247f4-274a-11e6-898a-f90372069c3a.png)

This doesn't happend in 1 socket machine, but if I force to use just one CPU (taskset -c 0-31 MyAPP) the performance is also poor.

What can I do to give you more information and try to help¿¿¿


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

performance in sched_yield (multisocket) #900

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

performance in sched_yield (multisocket) #900

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions