Update table rows in C #1552

jeromekelleher · 2021-07-05T10:39:38Z

This is the simplest way I could think of doing #1545

There's a lot of different ways this could be made more efficient, but I think it's probably not worth trying unless we find it's a genuine bottleneck. I had a go at doing it more efficiently and (a) it'd be a lot more core, and (b) there's all sorts of tricky subtleties around memcpy'ing potentially overlapping buffers and changing pointer values because of potential reallocs. This is as dumb as rocks, but should be solid.

Any thoughts @benjeffery?

benjeffery · 2021-07-05T11:09:28Z

Looks good, and pretty rock solid as it uses the existing API. I feel we're really not sure this a bottleneck so happy to proceed with this version. TSK_NO_EDGE_METADATA would be an easy exception if that was ever a bottleneck as that requires no reallocs.

jeromekelleher · 2021-07-05T12:34:36Z

OK, will tap away at this in my procrastination time.

petrelharp · 2021-07-05T14:36:21Z

There should probably be a comment in the docs that one should not use this method to change every row in a table if one is concerned about speed.

jeromekelleher · 2021-07-05T17:10:24Z

I changed the signature here to match add_row, as this seemed a bit more natural as I started writing tests. Also added in a special case for the (probably common?) case in which we're replacing same-size values. This would make things like setting rows in the middle of the edge table, or say, changing the flags value for a node, much more tolerable. I think it's a nice idiom to do:

tables.nodes[0] = tables.nodes[0].replace(flags=tskit.NODE_IS_SAMPLE)

so we should make it efficient to do this.

codecov · 2021-07-05T17:34:21Z

Codecov Report

Merging #1552 (d47dd78) into main (f83ee53) will decrease coverage by 0.12%.
The diff coverage is 85.19%.

❗ Current head d47dd78 differs from pull request most recent head daa252f. Consider uploading reports for the commit daa252f to get more accurate results

@@            Coverage Diff             @@
##             main    #1552      +/-   ##
==========================================
- Coverage   93.75%   93.62%   -0.13%     
==========================================
  Files          27       27              
  Lines       22647    22915     +268     
  Branches     1079     1076       -3     
==========================================
+ Hits        21232    21454     +222     
- Misses       1381     1427      +46     
  Partials       34       34

Flag	Coverage Δ
c-tests	`91.89% <85.19%> (-0.22%)`	⬇️
lwt-tests	`92.97% <ø> (ø)`
python-c-tests	`95.29% <ø> (-0.01%)`	⬇️
python-tests	`98.80% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
c/tskit/tables.c	`89.96% <85.19%> (-0.27%)`	⬇️
c/tskit/core.c	`97.07% <0.00%> (-0.02%)`	⬇️
c/tskit/trees.c	`94.84% <0.00%> (-0.02%)`	⬇️
python/_tskitmodule.c	`91.83% <0.00%> (-0.02%)`	⬇️
python/tskit/trees.py	`97.94% <0.00%> (-0.01%)`	⬇️
python/tskit/util.py	`100.00% <0.00%> (ø)`
python/tskit/tables.py	`98.90% <0.00%> (ø)`
python/tskit/__init__.py	`100.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f83ee53...daa252f. Read the comment docs.

jeromekelleher · 2021-07-06T09:46:06Z

It would be good to get some eyes on this now @benjeffery before I embark on the copy-paste-athon. Notes:

I change the name to "update_row" as this seemed clearer
I added the special case for equal sizes, as this seems important.
I realised that there was some nasty undefined behaviour lurking with our use of memcpy, if someone used the result of get_row as input (which seems a perfectly reasonable thing to do). This is also a possibility in the full rewrite case, so I thought it safer to change over to memmove there also. I'm sure the perf difference will be undetectable for our purposes.

Annoyingly, valgrind doesn't catch these undefined behaviours with memcpy on linux as it seems that under the hood somewhere memcpy is being replaced with memmove, and valgrind therefore doesn't see it. So, we'd have defined behaviour on some platforms, but not on others ☹️

So, my take is to change to memmove anywhere we're taking pointers as input just to be on the safe side.

grahamgower · 2021-07-06T10:00:28Z

Annoyingly, valgrind doesn't catch these undefined behaviours with memcpy on linux as it seems that under the hood somewhere memcpy is being replaced with memmove, and valgrind therefore doesn't see it. So, we'd have defined behaviour on some platforms, but not on others

This is probably compiler dependent. At least with GCC, memcpy can be inlined as an ordinary copying loop (e.g. using SSE instructions).

benjeffery

LGTM!
I'm happy with using memmove throughout - I doubt the perf gain of memcpy is significant.

c/tskit/tables.h

c/tskit/tables.c

jeromekelleher · 2021-07-12T06:00:45Z

This is ready to go now I think - there's a lot of code, so I think it's best to wait until @benjeffery gets back to merge.

benjeffery

LGTM, been through closely.

jeromekelleher marked this pull request as draft July 5, 2021 12:34

jeromekelleher force-pushed the set-row branch from d676d78 to b95e7de Compare July 5, 2021 17:12

jeromekelleher force-pushed the set-row branch from b95e7de to 85b83d5 Compare July 6, 2021 09:40

benjeffery reviewed Jul 6, 2021

View reviewed changes

c/tskit/tables.h Outdated Show resolved Hide resolved

c/tskit/tables.c Show resolved Hide resolved

c/tskit/tables.c Show resolved Hide resolved

jeromekelleher force-pushed the set-row branch from d47dd78 to 8be5efb Compare July 12, 2021 05:59

jeromekelleher marked this pull request as ready for review July 12, 2021 05:59

jeromekelleher changed the title ~~Proposal for Table set row~~ Update table rows in C Jul 12, 2021

jeromekelleher force-pushed the set-row branch from 8be5efb to 390bcde Compare July 12, 2021 10:25

jeromekelleher mentioned this pull request Jul 17, 2021

Support storing 64 bit offsets #1571

Merged

benjeffery approved these changes Jul 19, 2021

View reviewed changes

benjeffery added the AUTOMERGE-REQUESTED label Jul 19, 2021

C update row infrastructure

daa252f

jeromekelleher force-pushed the set-row branch from 390bcde to daa252f Compare July 19, 2021 12:32

mergify bot merged commit 12dfc49 into tskit-dev:main Jul 19, 2021

mergify bot removed the AUTOMERGE-REQUESTED label Jul 19, 2021

jeromekelleher deleted the set-row branch July 19, 2021 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update table rows in C #1552

Update table rows in C #1552

jeromekelleher commented Jul 5, 2021

Uh oh!

benjeffery commented Jul 5, 2021

Uh oh!

jeromekelleher commented Jul 5, 2021

Uh oh!

petrelharp commented Jul 5, 2021

Uh oh!

jeromekelleher commented Jul 5, 2021

Uh oh!

codecov bot commented Jul 5, 2021 •

edited

Loading

Uh oh!

jeromekelleher commented Jul 6, 2021

Uh oh!

grahamgower commented Jul 6, 2021

Uh oh!

benjeffery left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeromekelleher commented Jul 12, 2021

Uh oh!

benjeffery left a comment

Uh oh!

Uh oh!

Update table rows in C #1552

Update table rows in C #1552

Conversation

jeromekelleher commented Jul 5, 2021

Uh oh!

benjeffery commented Jul 5, 2021

Uh oh!

jeromekelleher commented Jul 5, 2021

Uh oh!

petrelharp commented Jul 5, 2021

Uh oh!

jeromekelleher commented Jul 5, 2021

Uh oh!

codecov bot commented Jul 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jeromekelleher commented Jul 6, 2021

Uh oh!

grahamgower commented Jul 6, 2021

Uh oh!

benjeffery left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeromekelleher commented Jul 12, 2021

Uh oh!

benjeffery left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Jul 5, 2021 •

edited

Loading