Skip to content

Further improvements to memory.c. #1625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 20, 2018
Merged

Conversation

oon3m0oo
Copy link
Contributor

  • Compiler TLS is now used only used when the compiler supports it
  • If compiler TLS is unsupported, we use platform-specific TLS
  • Only one variable (an index) is now in TLS
  • We only access TLS once per alloc, and never when freeing
  • Allocation / release info is now stored within the allocation itself, by
    over-allocating; this saves having external structures do the bookkeeping, and
    reduces some of the redundant data that was being stored (such as addresses)
  • We never hit the alloc lock when not using SMP or when using OpenMP (that was
    my fault)
  • Now that there are fewer tracking structures I think this is a bit easier to
    read than before

@fenrus75
Copy link
Contributor

I get core dumps with this patch applied unfortunately

I'm not seeing an actual performance difference

@oon3m0oo
Copy link
Contributor Author

It should be about the same as using TLS from before, but should be as fast for systems that don't support compiler-generated TLS. The delta between the two was about 15-20% before, now it should be around 1%. What are you building with (just USE_THREADS=1)?

@sandwichmaker
Copy link

sandwichmaker commented Jun 18, 2018

This patch is not quite right, on ARM64, it leads to improvements for small matrices (even without threading), but large matrix GEMM has gotten worse pretty uniformly (results on pixel2).

Old: 3313e4b
New: This patch.

Since I am testing single threaded performance on ARM64, none of the other changes has any effect. These numbers are fairly repeatable too.


Benchmark                Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------
BM_SGEMM/3            -0.1808         -0.1810           190           155           189           155
BM_SGEMM/4            -0.2120         -0.2124           167           132           167           131
BM_SGEMM/5            -0.1404         -0.1400           257           221           256           220
BM_SGEMM/6            -0.1252         -0.1264           284           249           284           248
BM_SGEMM/7            -0.0831         -0.0850           391           358           390           357
BM_SGEMM/8            -0.1230         -0.1255           293           257           292           255
BM_SGEMM/9            -0.0835         -0.0840           439           402           438           401
BM_SGEMM/10           -0.0785         -0.0785           503           463           501           462
BM_SGEMM/15           -0.0452         -0.0452          1301          1242          1298          1239
BM_SGEMM/16           -0.0419         -0.0423          1003           961          1000           958
BM_SGEMM/20           -0.0184         -0.0200          1889          1854          1885          1847
BM_SGEMM/24           -0.0179         -0.0182          2862          2810          2853          2802
BM_SGEMM/28           -0.0091         -0.0097          4566          4525          4556          4511
BM_SGEMM/31           -0.0034         -0.0043          6923          6899          6909          6879
BM_SGEMM/32           -0.0079         -0.0097          5805          5760          5793          5737
BM_SGEMM/40           -0.0116         -0.0113         11321         11190         11281         11154
BM_SGEMM/63           +0.0033         +0.0025         46383         46537         46241         46354
BM_SGEMM/64           +0.0080         +0.0069         42037         42374         41922         42212
BM_SGEMM/80           +0.0045         +0.0030         82953         83323         82768         83016
BM_SGEMM/100          +0.0015         +0.0026        165059        165309        164180        164608
BM_SGEMM/128          +0.0099         +0.0083        329035        332282        328190        330901
BM_SGEMM/150          +0.0083         +0.0105        554837        559435        552010        557790
BM_SGEMM/200          +0.0399         +0.0419       1277909       1328845       1269988       1323152
BM_SGEMM/256          +0.0437         +0.0416       2550165       2661637       2540819       2646626
BM_SGEMM/300          +0.1039         +0.1032       4314616       4762907       4294829       4737888
BM_SGEMM/400          +0.0521         +0.0536      10092941      10618822      10021265      10558817
BM_SGEMM/500          -0.0493         -0.0465      23088610      21950891      22896250      21830893
BM_SGEMM/600          +0.0186         +0.0227      35726644      36392627      35376506      36179018
BM_SGEMM/700          -0.0410         -0.0374      59584372      57142675      59039666      56832014
BM_SGEMM/800          -0.0314         -0.0317      87766727      85010907      87149617      84383114
BM_SGEMM/1000         +0.1304         +0.1296     167182816     188983717     166070287     187587058
BM_SGEMM/2000         +0.2073         +0.2093    1321452267    1595353440    1311520006    1586033481
BM_DGEMM/3            -0.1628         -0.1631           199           167           198           166
BM_DGEMM/4            -0.1890         -0.1869           194           157           193           157
BM_DGEMM/5            -0.1281         -0.1274           291           254           290           253
BM_DGEMM/6            -0.1152         -0.1145           333           295           332           294
BM_DGEMM/7            -0.0862         -0.0856           451           412           449           411
BM_DGEMM/8            -0.0796         -0.0851           406           373           403           368
BM_DGEMM/9            +0.0748         -0.0324           592           637           591           571
BM_DGEMM/10           -0.0184         -0.0187           691           678           688           675
BM_DGEMM/15           -0.0002         -0.0009          1846          1845          1838          1837
BM_DGEMM/16           +0.0071         +0.0141          1803          1816          1782          1807
BM_DGEMM/20           +0.0232         +0.0212          3260          3335          3250          3318
BM_DGEMM/24           +0.0432         +0.0393          5090          5310          5082          5282
BM_DGEMM/28           +0.0506         +0.0489          7968          8371          7952          8340
BM_DGEMM/31           +0.0511         +0.0499         11331         11910         11302         11867
BM_DGEMM/32           +0.0499         +0.0506         11319         11884         11255         11824
BM_DGEMM/40           +0.0855         +0.0813         21240         23056         21180         22901
BM_DGEMM/63           +0.0524         +0.0511         87471         92055         87180         91632
BM_DGEMM/64           +0.0574         +0.0617         89899         95059         89153         94655
BM_DGEMM/80           +0.0643         +0.0625        170626        181599        170160        180795
BM_DGEMM/100          +0.1211         +0.1199        332286        372528        331204        370918
BM_DGEMM/128          +0.2295         +0.2262        663002        815136        661287        810887
BM_DGEMM/150          +0.2387         +0.2409       1067990       1322946       1061155       1316794
BM_DGEMM/200          +0.2702         +0.2731       2502758       3179106       2485776       3164608
BM_DGEMM/256          +0.1450         +0.1475       5769953       6606508       5718306       6561558
BM_DGEMM/300          +0.0263         +0.0287       9568809       9820421       9482266       9754384
BM_DGEMM/400          +0.1879         +0.1885      20318045      24136138      20162808      23963155
BM_DGEMM/500          +0.1224         +0.1248      42359300      47544546      41982117      47220910
BM_DGEMM/600          +0.1883         +0.1905      68890194      81859604      68292610      81301967
BM_DGEMM/700          +0.1170         +0.1192     112883049     126094318     111896255     125232474
BM_DGEMM/800          +0.1932         +0.1956     161339586     192510553     159930901     191217982
BM_DGEMM/1000         +0.2783         +0.2815     320137741     409224234     317268806     406574772
BM_DGEMM/2000         +0.1939         +0.1981    2571993225    3070723483    2545479728    3049799579

@martin-frbg
Copy link
Collaborator

Older compilers do not like the static _Atomic int memory_initialized (maybe still something wrong with the workaround for _Atomic in common.h), also I get an abort in the fork utest ("program will terminate because you started too many threads")

@oon3m0oo
Copy link
Contributor Author

oon3m0oo commented Jun 18, 2018 via email

@martin-frbg
Copy link
Collaborator

martin-frbg commented Jun 18, 2018

Just trying to redo the failed CI jobs locally to find the causes that are hidden by the QUIET_MAKE setting. (Removing that option only makes it hit the output limit in travis...). And the threads issue is probably related to what I warned sandwichmaker about in #1616 - NUM_BUFFERS is/was allocated as NUM_THREADS*2 to leave room for level3 blas functions to spawn their own threads (fork utest spawns cblas_dgemm jobs).

@martin-frbg
Copy link
Collaborator

Seems my problem with the fork utest is directly related to the small NUM_THREADS chosen at build time (4 on the dual-core laptop I used yesterday, while the CI runs set NUM_THREADS=32). So perhaps it is this default that needs to be changed (although that test used to complete in seconds even with NUM_THREADS=4. Now if I just increase the limits in memory.c to get past the "too many threads" warning, I see one thread each spinning sched_yield with three others waiting on the lock in get_memory_table().)

@oon3m0oo
Copy link
Contributor Author

oon3m0oo commented Jun 19, 2018

The issue is actually that at fork() we need to reinitialize the memory table for the child process. Well that and also increase the max thread number.

I think I've also resolved the performance issue for large matrices (needed to pad the alloc_t structure for proper alignment). I'll update the PR soon.

@oon3m0oo
Copy link
Contributor Author

Ok updated. @sandwichmaker and I have both seen some small improvements now with large matrices on x86_64, and he's going to test with ARM hopefully later today. This also resolves the fork test (that's a very handy test, btw, I never would have realized we needed to reinitialize things without it).

@oon3m0oo
Copy link
Contributor Author

Number from @sandwichmaker

x86 single threaded
Benchmark                Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------
BM_SGEMM/3            -0.1402         -0.1402           130           112           130           112
BM_SGEMM/4            -0.1202         -0.1201           123           108           123           108
BM_SGEMM/5            -0.1094         -0.1094           190           169           190           169
BM_SGEMM/6            -0.0737         -0.0737           208           193           208           193
BM_SGEMM/7            -0.0592         -0.0592           283           266           283           266
BM_SGEMM/8            -0.0941         -0.0941           241           219           241           219
BM_SGEMM/9            -0.0830         -0.0835           277           254           277           253
BM_SGEMM/10           -0.0381         -0.0387           316           304           316           304
BM_SGEMM/15           -0.0209         -0.0214           717           702           717           702
BM_SGEMM/16           -0.0582         -0.0588           557           525           557           525
BM_SGEMM/20           -0.0409         -0.0417           775           743           775           743
BM_SGEMM/24           -0.0272         -0.0278          1131          1100          1131          1099
BM_SGEMM/28           -0.0352         -0.0359          1618          1561          1618          1560
BM_SGEMM/31           -0.0111         -0.0116          2959          2926          2958          2924
BM_SGEMM/32           -0.1132         -0.1138          2231          1979          2231          1977
BM_SGEMM/40           -0.0136         -0.0143          3492          3445          3492          3442
BM_SGEMM/63           -0.0011         -0.0019         14286         14269         14285         14258
BM_SGEMM/64           -0.0496         -0.0503         11087         10536         11086         10529
BM_SGEMM/80           -0.0108         -0.0113         18522         18322         18521         18312
BM_SGEMM/100          +0.0002         -0.0005         34678         34685         34676         34658
BM_SGEMM/128          -0.0054         -0.0060         64793         64446         64791         64404
BM_SGEMM/150          -0.0208         -0.0215        116206        113791        116204        113707
BM_SGEMM/200          -0.0359         -0.0366        232652        224291        232648        224132
BM_SGEMM/256          +0.0050         +0.0045        458916        461224        458892        460938
BM_SGEMM/300          +0.0183         +0.0176        769137        783204        769112        782611
BM_SGEMM/400          +0.0186         +0.0179       1679716       1710927       1679688       1709713
BM_SGEMM/500          +0.0163         +0.0155       3360271       3414975       3360116       3412349
BM_SGEMM/600          +0.0144         +0.0136       5631951       5713038       5631844       5708626
BM_SGEMM/700          +0.0066         +0.0059       9165324       9225406       9164832       9219038
BM_SGEMM/800          +0.0070         +0.0063      13293688      13386690      13293260      13376712
BM_SGEMM/1000         +0.0055         +0.0048      26125555      26269277      26123610      26250128
BM_SGEMM/2000         -0.0003         -0.0011     213193417     213136911     213187787     212943258
BM_DGEMM/3            -0.0964         -0.0970           116           105           116           104
BM_DGEMM/4            -0.0951         -0.0956           121           109           121           109
BM_DGEMM/5            -0.0448         -0.0453           156           149           156           149
BM_DGEMM/6            -0.0675         -0.0681           169           158           169           158
BM_DGEMM/7            +0.0189         +0.0182           235           239           235           239
BM_DGEMM/8            -0.1080         -0.1086           215           192           215           192
BM_DGEMM/9            -0.0519         -0.0526           293           278           293           278
BM_DGEMM/10           -0.0559         -0.0566           326           308           326           308
BM_DGEMM/15           -0.0357         -0.0363           761           734           761           734
BM_DGEMM/16           -0.0579         -0.0586           651           613           651           612
BM_DGEMM/20           -0.0357         -0.0363          1064          1026          1064          1025
BM_DGEMM/24           -0.0324         -0.0329          1513          1464          1513          1463
BM_DGEMM/28           -0.0414         -0.0420          2388          2289          2388          2288
BM_DGEMM/31           -0.0154         -0.0159          3678          3622          3678          3620
BM_DGEMM/32           -0.0132         -0.0138          3181          3139          3181          3137
BM_DGEMM/40           -0.0011         -0.0017          6458          6450          6457          6446
BM_DGEMM/63           +0.0082         +0.0075         21892         22070         21891         22054
BM_DGEMM/64           -0.0027         -0.0034         18912         18860         18911         18846
BM_DGEMM/80           +0.0096         +0.0090         35129         35467         35128         35444
BM_DGEMM/100          -0.0092         -0.0099         63729         63145         63728         63100
BM_DGEMM/128          +0.0251         +0.0244        127077        130271        127074        130169
BM_DGEMM/150          -0.0223         -0.0230        207717        203081        207711        202933
BM_DGEMM/200          -0.0048         -0.0055        429428        427363        429421        427052
BM_DGEMM/256          -0.0066         -0.0073        887551        881720        887529        881049
BM_DGEMM/300          -0.0152         -0.0159       1412681       1391145       1412572       1390162
BM_DGEMM/400          -0.0018         -0.0025       3145866       3140228       3145814       3137949
BM_DGEMM/500          +0.0109         +0.0108       6079168       6145724       6078689       6144121
BM_DGEMM/600          -0.0401         -0.0401      10510484      10088648      10509998      10088474
BM_DGEMM/700          -0.0480         -0.0480      16684748      15883419      16684307      15883138
BM_DGEMM/800          -0.0470         -0.0471      24636362      23477244      24635967      23476820
BM_DGEMM/1000         -0.0302         -0.0302      48059575      46610514      48056409      46606922
BM_DGEMM/2000         -0.0419         -0.0419     376962543     361167789     376956030     361154462

arm32
Benchmark                Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------
BM_SGEMM/3            -0.1514         -0.1513           225           191           224           190
BM_SGEMM/4            -0.1873         -0.1877           209           170           208           169
BM_SGEMM/5            -0.1136         -0.1133           318           282           316           281
BM_SGEMM/6            -0.1236         -0.1230           376           329           375           329
BM_SGEMM/7            -0.0706         -0.0706           516           479           514           478
BM_SGEMM/8            -0.0912         -0.0916           424           385           422           383
BM_SGEMM/9            -0.0327         -0.0362           638           617           636           613
BM_SGEMM/10           -0.0363         -0.0359           738           711           736           709
BM_SGEMM/15           -0.0312         -0.0317          1964          1903          1957          1895
BM_SGEMM/16           -0.0230         -0.0231          1807          1765          1801          1759
BM_SGEMM/20           -0.0090         -0.0076          3291          3261          3277          3252
BM_SGEMM/24           -0.0110         -0.0104          5247          5189          5227          5173
BM_SGEMM/28           -0.0046         -0.0039          7972          7935          7942          7912
BM_SGEMM/31           -0.0095         -0.0079         12119         12003         12057         11962
BM_SGEMM/32           -0.0153         -0.0150         11622         11444         11577         11404
BM_SGEMM/40           +0.0035         +0.0036         21367         21442         21300         21377
BM_SGEMM/63           -0.0044         -0.0045         84768         84391         84486         84109
BM_SGEMM/64           -0.0076         -0.0072         83780         83140         83409         82808
BM_SGEMM/80           +0.0058         +0.0065        148103        148955        147468        148432
BM_SGEMM/100          +0.0015         +0.0015        284695        285123        283528        283949
BM_SGEMM/128          -0.0032         -0.0022        591203        589318        588843        587560
BM_SGEMM/150          +0.0050         +0.0048        967000        971841        963291        967925
BM_SGEMM/200          -0.0093         -0.0092       2224218       2203425       2215125       2194807
BM_SGEMM/256          -0.0076         -0.0081       4656337       4620919       4641226       4603836
BM_SGEMM/300          -0.0011         -0.0004       7507302       7499072       7471123       7467913
BM_SGEMM/400          +0.0085         +0.0071      18188756      18342734      18095715      18223895
BM_SGEMM/500          +0.0046         +0.0052      36179631      36346202      35968018      36154920
BM_SGEMM/600          +0.0146         +0.0148      62401929      63313860      62020056      62936028
BM_SGEMM/700          +0.0167         +0.0150      97957004      99589876      97461377      98925326
BM_SGEMM/800          +0.0062         +0.0050     145055233     145955254     144315651     145042267
BM_SGEMM/1000         +0.0110         +0.0090     287127684     290297295     285873309     288454658
BM_SGEMM/2000         +0.0043         +0.0042    2316450282    2326489398    2302788641    2312403125
BM_DGEMM/3            -0.1669         -0.1664           227           189           226           188
BM_DGEMM/4            -0.1879         -0.1881           226           184           225           183
BM_DGEMM/5            -0.1329         -0.1313           338           293           336           292
BM_DGEMM/6            -0.0906         -0.0911           389           354           388           352
BM_DGEMM/7            -0.0813         -0.0804           542           498           540           496
BM_DGEMM/8            -0.0815         -0.0822           478           439           477           438
BM_DGEMM/9            -0.0450         -0.0457           701           670           699           667
BM_DGEMM/10           -0.0409         -0.0391           813           780           809           777
BM_DGEMM/15           -0.0253         -0.0260          2168          2113          2162          2105
BM_DGEMM/16           -0.0197         -0.0195          2161          2119          2153          2111
BM_DGEMM/20           -0.0102         -0.0104          3919          3879          3906          3866
BM_DGEMM/24           -0.0032         -0.0031          6271          6251          6249          6230
BM_DGEMM/28           -0.0076         -0.0077          9817          9742          9783          9708
BM_DGEMM/31           +0.0027         +0.0023         13917         13954         13870         13901
BM_DGEMM/32           +0.0065         +0.0059         13882         13972         13836         13918
BM_DGEMM/40           -0.0089         -0.0092         26576         26339         26477         26233
BM_DGEMM/63           +0.0026         +0.0024        104861        105130        104466        104721
BM_DGEMM/64           -0.0057         -0.0052        104748        104152        104375        103830
BM_DGEMM/80           +0.0063         +0.0065        200605        201876        199859        201164
BM_DGEMM/100          +0.0078         +0.0065        387187        390199        385736        388242
BM_DGEMM/128          -0.0021         -0.0015        812949        811264        809869        808667
BM_DGEMM/150          -0.0014         -0.0011       1310261       1308367       1304496       1303104
BM_DGEMM/200          -0.0136         -0.0129       3084951       3043143       3071900       3032121
BM_DGEMM/256          +0.0070         +0.0056       6639244       6685952       6597562       6634618
BM_DGEMM/300          +0.0193         +0.0182      10835401      11044842      10767216      10962956
BM_DGEMM/400          +0.0087         +0.0075      26730971      26962244      26516950      26715629
BM_DGEMM/500          +0.0060         +0.0088      53069164      53387650      52551398      53015149
BM_DGEMM/600          -0.0029         -0.0025      90205543      89946877      89419416      89199421
BM_DGEMM/700          +0.0189         +0.0182     143591108     146309046     142260136     144855770
BM_DGEMM/800          +0.0125         +0.0104     217679865     220393234     215784205     218026011
BM_DGEMM/1000         -0.0038         -0.0018     424729287     423094287     420740836     419976307
BM_DGEMM/2000         -0.0283         -0.0257    3410817267    3314352882    3379507829    3292534611

ARM64
Benchmark                Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------
BM_SGEMM/3            -0.1801         -0.1808           183           150           183           150
BM_SGEMM/4            -0.2221         -0.2223           165           129           165           128
BM_SGEMM/5            -0.1503         -0.1490           254           216           253           215
BM_SGEMM/6            -0.1148         -0.1148           278           246           277           245
BM_SGEMM/7            -0.1021         -0.0999           381           342           379           341
BM_SGEMM/8            -0.1308         -0.1309           286           249           285           248
BM_SGEMM/9            -0.0727         -0.0733           425           394           423           392
BM_SGEMM/10           -0.0625         -0.0634           487           457           486           455
BM_SGEMM/15           -0.0256         -0.0259          1255          1222          1251          1218
BM_SGEMM/16           -0.0312         -0.0310           972           942           969           939
BM_SGEMM/20           -0.0182         -0.0172          1842          1809          1836          1804
BM_SGEMM/24           -0.0324         -0.0318          2823          2731          2812          2722
BM_SGEMM/28           -0.0071         -0.0079          4431          4399          4418          4383
BM_SGEMM/31           -0.0127         -0.0119          6761          6675          6734          6654
BM_SGEMM/32           +0.0065         +0.0054          5660          5697          5641          5671
BM_SGEMM/40           +0.0095         +0.0095         10900         11004         10864         10968
BM_SGEMM/63           -0.0034         -0.0018         45639         45484         45326         45245
BM_SGEMM/64           +0.0112         +0.0110         40740         41195         40597         41045
BM_SGEMM/80           +0.0004         +0.0007         80835         80867         80559         80616
BM_SGEMM/100          -0.0068         -0.0062        160149        159052        159505        158521
BM_SGEMM/128          -0.0171         -0.0153        322682        317171        321263        316338
BM_SGEMM/150          +0.0064         +0.0067        528288        531688        526221        529752
BM_SGEMM/200          -0.0016         -0.0010       1205107       1203210       1199813       1198562
BM_SGEMM/256          -0.0020         -0.0020       2467486       2462612       2457177       2452213
BM_SGEMM/300          -0.0066         -0.0048       4163765       4136379       4137890       4117835
BM_SGEMM/400          +0.0073         +0.0068       9785710       9856740       9715929       9781537
BM_SGEMM/500          +0.0064         +0.0080      21277548      21412785      21093481      21261741
BM_SGEMM/600          +0.0184         +0.0189      34069150      34696921      33752403      34391422
BM_SGEMM/700          +0.0053         +0.0063      55902636      56200050      55408250      55759368
BM_SGEMM/800          +0.0494         +0.0443      80430766      84406525      79784959      83320575
BM_SGEMM/1000         +0.0184         +0.0182     158663518     161589574     157209684     160077747
BM_SGEMM/2000         -0.0451         -0.0465    1327357841    1267531897    1317230635    1255992878
BM_DGEMM/3            -0.1890         -0.1896           195           159           195           158
BM_DGEMM/4            -0.1946         -0.1941           188           152           187           151
BM_DGEMM/5            -0.1235         -0.1230           282           247           281           246
BM_DGEMM/6            -0.1058         -0.1062           322           288           321           287
BM_DGEMM/7            -0.0737         -0.0737           433           401           432           400
BM_DGEMM/8            -0.0879         -0.0872           395           360           393           359
BM_DGEMM/9            -0.0697         -0.0686           580           539           577           538
BM_DGEMM/10           -0.0582         -0.0578           676           636           673           634
BM_DGEMM/15           -0.0059         -0.0065          1768          1758          1763          1752
BM_DGEMM/16           -0.0028         -0.0043          1742          1737          1734          1727
BM_DGEMM/20           -0.0126         -0.0132          3180          3140          3171          3129
BM_DGEMM/24           -0.0048         -0.0065          5000          4976          4981          4949
BM_DGEMM/28           -0.0018         -0.0027          7760          7746          7737          7716
BM_DGEMM/31           -0.0068         -0.0066         10991         10916         10957         10885
BM_DGEMM/32           +0.0189         +0.0182         10979         11186         10940         11139
BM_DGEMM/40           +0.0222         +0.0219         21417         21894         21348         21815
BM_DGEMM/63           +0.0098         +0.0080         85733         86574         85345         86026
BM_DGEMM/64           -0.0036         -0.0029         87114         86803         86778         86530
BM_DGEMM/80           +0.0020         +0.0013        164618        164944        164113        164318
BM_DGEMM/100          -0.0118         -0.0115        323955        320116        322747        319024
BM_DGEMM/128          -0.0087         -0.0069        645043        639421        642025        637619
BM_DGEMM/150          -0.0034         -0.0030       1031996       1028522       1028342       1025276
BM_DGEMM/200          +0.0505         +0.0508       2404959       2526462       2392459       2514104
BM_DGEMM/256          +0.0247         +0.0228       5594583       5732863       5555938       5682435
BM_DGEMM/300          +0.0063         +0.0060       9169609       9227075       9086632       9141260
BM_DGEMM/400          +0.0225         +0.0216      19622516      20063782      19459674      19879953
BM_DGEMM/500          +0.0011         +0.0022      41092464      41137002      40658344      40749409
BM_DGEMM/600          +0.0015         +0.0017      66975611      67077867      66338838      66450146
BM_DGEMM/700          +0.0122         +0.0118     109909855     111246018     108909513     110195158
BM_DGEMM/800          +0.0074         +0.0079     157584755     158744162     155992702     157220261
BM_DGEMM/1000         +0.0039         +0.0047     308587635     309800266     305735100     307168773
BM_DGEMM/2000         -0.0019         -0.0018    2474572642    2469830610    2450644724    2446135023

@martin-frbg
Copy link
Collaborator

Seems get_memory_table does not have an implementation on Windows yet, and next_memory_table_pos only exists in the COMPILER_TLS case (which breaks the xcode builds).
Compile-testing on mips32 now, not sure if I'll bother to run any benchmarks on that poor little router...

@oon3m0oo oon3m0oo force-pushed the develop branch 2 times, most recently from 79ff1f8 to e9d6864 Compare June 19, 2018 16:31
@oon3m0oo
Copy link
Contributor Author

Whoops, missed the ALLOC_MMAP ifdef, that should have been below the new code (should fix windows), and also fixed the ifdef. I should really get set up on more platforms....

void (*func)(struct alloc_t *);
/* Pad to 64-byte alignment */
#ifdef __64BIT__
char pad[48];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fragile, is there no way for the compiler to do the padding automatically, or by using a macro of some sort here which will deduce the size and do the right thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#endif
};

static const int allocation_block_size = BUFFER_SIZE + sizeof(struct alloc_t);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and added more documentation for the struct and why it exists, too.


static const int allocation_block_size = BUFFER_SIZE + sizeof(struct alloc_t);

/* TLS is supported from clang 2.8, gcc 4.1, MSVC 2005, and XCode 8 */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a rather convoluted test, perhaps break this into multiple more easily checkable/modifiable define statements?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

along the lines of the android test you have below, which is considerably easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


/* Holds pointers to allocated memory */
#if defined(SMP) && !defined(USE_OPENMP)
#define MAX_ALLOCATING_THREADS MAX_CPU_NUMBER * 2 * MAX_PARALLEL_NUMBER

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the logic for this number?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use indentation here to make the logic here easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. This is how we do it internally.

static DWORD local_storage_key;
#else
static pthread_key_t local_storage_key;
#endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the else and endifs are so nested, consider adding

#endif // foo to indicate what is ending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here and elsewhere.


}

static void *alloc_windows(void *address){
void *map_address;

map_address = VirtualAlloc(address,
BUFFER_SIZE,
allocation_block_size,
MEM_RESERVE | MEM_COMMIT,
PAGE_READWRITE);

if (map_address == (void *)NULL) map_address = (void *)-1;

if (map_address != (void *)-1) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not sure what you mean here.


/* Memory allocation routine */
/* procpos ... indicates where it comes from */
/* 0 : Level 3 functions */
/* 1 : Level 2 functions */
/* 2 : Thread */

static void blas_memory_init(){

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see note about indentation and better documenting the else and endifs earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

release_info[release_pos].attr = fd;
release_info[release_pos].func = alloc_hugetlbfile_free;
release_pos ++;
struct alloc_t *alloc_info = (struct alloc_t *)map_address;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this type of if statement and code pattern is repeated many times. any way this can be made simpler via a macro perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -770,9 +845,9 @@ static void *alloc_devicedirver(void *address){

#ifdef ALLOC_SHM

static void alloc_shm_free(struct release_t *release){
static void alloc_shm_free(struct alloc_t *release){

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

release -> alloc_info everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

release_info[release_pos].attr = fd;
release_info[release_pos].func = alloc_devicedirver_free;
release_pos ++;
struct alloc_t *alloc_info = (struct alloc_t *)map_address;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with macro?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@oon3m0oo oon3m0oo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is going to work since I have to force update the branch, but I've resolved all the comments.

void (*func)(struct alloc_t *);
/* Pad to 64-byte alignment */
#ifdef __64BIT__
char pad[48];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#endif
};

static const int allocation_block_size = BUFFER_SIZE + sizeof(struct alloc_t);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and added more documentation for the struct and why it exists, too.


static const int allocation_block_size = BUFFER_SIZE + sizeof(struct alloc_t);

/* TLS is supported from clang 2.8, gcc 4.1, MSVC 2005, and XCode 8 */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


/* Holds pointers to allocated memory */
#if defined(SMP) && !defined(USE_OPENMP)
#define MAX_ALLOCATING_THREADS MAX_CPU_NUMBER * 2 * MAX_PARALLEL_NUMBER
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. This is how we do it internally.

static DWORD local_storage_key;
#else
static pthread_key_t local_storage_key;
#endif
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here and elsewhere.


}

static void *alloc_windows(void *address){
void *map_address;

map_address = VirtualAlloc(address,
BUFFER_SIZE,
allocation_block_size,
MEM_RESERVE | MEM_COMMIT,
PAGE_READWRITE);

if (map_address == (void *)NULL) map_address = (void *)-1;

if (map_address != (void *)-1) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not sure what you mean here.

release_info[release_pos].attr = fd;
release_info[release_pos].func = alloc_devicedirver_free;
release_pos ++;
struct alloc_t *alloc_info = (struct alloc_t *)map_address;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -770,9 +845,9 @@ static void *alloc_devicedirver(void *address){

#ifdef ALLOC_SHM

static void alloc_shm_free(struct release_t *release){
static void alloc_shm_free(struct alloc_t *release){
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

release_info[release_pos].attr = fd;
release_info[release_pos].func = alloc_hugetlbfile_free;
release_pos ++;
struct alloc_t *alloc_info = (struct alloc_t *)map_address;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


/* Memory allocation routine */
/* procpos ... indicates where it comes from */
/* 0 : Level 3 functions */
/* 1 : Level 2 functions */
/* 2 : Thread */

static void blas_memory_init(){
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@oon3m0oo
Copy link
Contributor Author

Hmm... looks like GitHub's review system doesn't work like gerrit or our internal tools; you can't diff with the previous push, since you have to overwrite the commit.

@martin-frbg
Copy link
Collaborator

I believe one can still view changes relative to your original commit (by choosing that option in the "Changes from" dropdown on the diff page), and reviewers have an additional option to "view changes since your last review" there.

@oon3m0oo
Copy link
Contributor Author

Aha. I think that will work if @sandwichmaker sends his comments as a review, then I can diff against that. Right now it doesn't have anything in "changes since last review."

@martin-frbg
Copy link
Collaborator

Something wrong with the Windows ::TlsGetValue/::TlsSetValue now, perhaps just missing an include ?

- Compiler TLS is now used only used when the compiler supports it
- If compiler TLS is unsupported, we use platform-specific TLS
- Only one variable (an index) is now in TLS
- We only access TLS once per alloc, and never when freeing
- Allocation / release info is now stored within the allocation itself, by
  over-allocating; this saves having external structures do the bookkeeping, and
  reduces some of the redundant data that was being stored (such as addresses)
- We never hit the alloc lock when not using SMP or when using OpenMP (that was
  my fault)
- Now that there are fewer tracking structures I think this is a bit easier to
  read than before
@oon3m0oo
Copy link
Contributor Author

oon3m0oo commented Jun 20, 2018

When I refactored the HAS_COMPILER_TLS block I left out the MSVC one to use __declspec(thread). Those functions should be defined in winbase.h, which comes in from windows.h, which is included through common.h. Either way, this should resolve it, though I wonder if we can just remove the Tls functions since it's doubtful anyone is still using MSVC pre-2005.

Copy link

@sandwichmaker sandwichmaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making the changes. The code looks really nice to read.

One minor comment about MAX_ALLOCATING_THREADS, but otherwise good to go from me.

/* Holds pointers to allocated memory */
#if defined(SMP) && !defined(USE_OPENMP)
/* This is the number of threads than can be spawned by the server, which is the
server plus the number of threads in the thread pool */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand how the comment leads to this particular computation of number of threads.

@sandwichmaker
Copy link

nice cleanup @oon3m0oo.

@martin-frbg dropping the ifdefs that come from MSVC pre 2005 is a good idea. Does that mean we will be able to get rid of the windows block for compilers without TLS?

@oon3m0oo
Copy link
Contributor Author

If we dropped support for pre-2005 then yes, the windows block would go away, and we'd only have the pthreads part.

I believe the number is discussed in #1616, since this determines the number of threads.

@martin-frbg
Copy link
Collaborator

I am always a bit sentimental about removing working code for old systems, there could be someone somewhere still relying on it. Twelve years may be a bit too old to be useful though (but it matches the ancient hardware still supported, Core2 would have been brand new back then).
In any case, removal should be done through a separate PR I think.

@martin-frbg
Copy link
Collaborator

ready for merging ?

@oon3m0oo
Copy link
Contributor Author

Hmm... before you merge this, I see some fortran test failures on my side with a couple of tests, let me see if that's this change.

@oon3m0oo
Copy link
Contributor Author

Ok this was only in my internal branch that has lots of other changes to help build things internally, everything passes fine on my github repo.

@oon3m0oo
Copy link
Contributor Author

oon3m0oo commented Jun 20, 2018

The issue I see is an illegal instruction in zgemv.c when using STACK_ALLOC, where the allocation size requested is larger than MAX_STACK_ALLOC, which ends up creating an array of size 0, which is indeed illegal. This doesn't seem to be related to the memory.c changes, though.

Aha, the issue only appears in unoptimized builds. I'll send another PR to fix this.

@martin-frbg
Copy link
Collaborator

Sounds unrelated but decidedly unhealthy, probably best to create a new issue for that

@oon3m0oo
Copy link
Contributor Author

Sent out #1631.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants