Skip to content

[Syntax] Use BumpPtrAllocator for Syntax node internals #2925

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 7, 2025

Conversation

rintaro
Copy link
Member

@rintaro rintaro commented Dec 21, 2024

Rework Syntax internals. "Red" tree is now bump-pointer allocated and visited children are cached.

overview diagram
  • Syntax is a pair of the allocator (strong reference to SyntaxDataArena class) and a pointer to the allocated data (SyntaxData struct).
  • All the red-tree data (parent and positions) are bump-pointer-allocated and cached.
  • The arena and the tree are 1:1. Each mutation creates a new arena.
  • child(at:) is now O(1) because the red-tree children are cached in the arena.
  • root is now O(1) regardless of the depth because the arena holds the reference.
  • Simplify SyntaxChildren. SyntaxChildrenIndex is now just a Int

Remove some hacks as they aren't needed anymore:

  • UnownedSyntax because SyntaxDataReference replaces it.
  • SyntaxNodeFactory because Syntax.Info is gone.

Additionally:

  • Flatten AbsoluteSyntaxInfo to simplify the data and clarify what advancedBySibling(_:) and advancedToFirstChild() do.
  • Remove RawSyntaxChildren and RawNonNilSyntaxChildren as they were implementation detail of SyntaxChildren.
  • Remove AbsoluteRawSyntax and AbsoluteSyntaxPosition as nobody was really using it.
  • Rename Syntax.indexInParent to layoutIndexInParent because it was confusing with SyntaxProtocol.indexInParent: SyntaxChildrenIndex

Baseline (main)

Test Case '-[PerformanceTest.PerfTests testEmptyRewriter]' measured [Time, seconds] average: 1.653, relative standard deviation: 0.923%, values: [1.695273, 1.662036, 1.641838, 1.651738, 1.642226, 1.647752, 1.653183, 1.645211, 1.642044, 1.650028]
Test Case '-[PerformanceTest.PerfTests testEmptyVisitor]' measured [Time, seconds] average: 1.186, relative standard deviation: 0.702%, values: [1.208109, 1.183697, 1.176679, 1.191531, 1.179450, 1.184774, 1.185649, 1.180681, 1.186222, 1.183057]

Recreate SyntaxDataArena for each iteration.

Test Case '-[PerformanceTest.PerfTests testEmptyRewriter]' measured [Time, seconds] average: 1.216, relative standard deviation: 1.319%, values: [1.261088, 1.210089, 1.203140, 1.201663, 1.217956, 1.215719, 1.206626, 1.214545, 1.211392, 1.220302]
Test Case '-[PerformanceTest.PerfTests testEmptyVisitor]' measured [Time, seconds] average: 0.919, relative standard deviation: 1.333%, values: [0.955104, 0.914993, 0.910915, 0.915000, 0.914397, 0.919356, 0.913795, 0.914536, 0.913638, 0.921364]

Reuse fully populated SyntaxDataArena for all iterations.

Test Case '-[PerformanceTest.PerfTests testEmptyRewriter]' measured [Time, seconds] average: 0.847, relative standard deviation: 1.445%, values: [0.862360, 0.852487, 0.868413, 0.852756, 0.848216, 0.832601, 0.836978, 0.836306, 0.829240, 0.847472]
Test Case '-[PerformanceTest.PerfTests testEmptyVisitor]' measured [Time, seconds] average: 0.545, relative standard deviation: 1.391%, values: [0.560563, 0.547005, 0.532093, 0.545551, 0.548472, 0.546971, 0.535693, 0.538867, 0.547836, 0.548776]

Reuse fully populated SyntaxDataArena. Combined with #2924

Test Case '-[PerformanceTest.PerfTests testEmptyRewriter]' measured [Time, seconds] average: 0.471, relative standard deviation: 2.559%, values: [0.506535, 0.467759, 0.467274, 0.467793, 0.468687, 0.465028, 0.474181, 0.463959, 0.466232, 0.465126]
Test Case '-[PerformanceTest.PerfTests testEmptyVisitor]' measured [Time, seconds] average: 0.242, relative standard deviation: 4.694%, values: [0.273160, 0.237957, 0.236319, 0.232192, 0.235216, 0.234729, 0.237012, 0.239749, 0.245762, 0.247613]

public subscript<RangeType: RangeExpression<Int>>(
range: RangeType
) -> SyntaxArenaAllocatedBufferPointer<Element> {
return SyntaxArenaAllocatedBufferPointer(UnsafeBufferPointer(rebasing: self.buffer[range]))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self as SubSequence was an anti-pattern. That violated "The subsequence shares indices with the original collection".

@rintaro rintaro force-pushed the syntax-data-arena branch 3 times, most recently from f9b6c7e to cf0292d Compare December 23, 2024 23:51

// If the buffer is already populated, return it.
if let baseAddress = baseAddressRef.pointee {
return SyntaxDataReferenceBuffer(UnsafeBufferPointer(start: baseAddress, count: childCount))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should not read the memory before the lock, or we should use atomic read/write. But does it really matter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think you either need to guard this with the lock, or use atomics. Another thread could potentially perform the writes out-of-order, so the buffer may not be fully initialized. SE-0282 requires concurrent read/write accesses to be done using atomic operations (a lock also works as it would make them non-concurrent)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reference! Changed to use _Atomic(const void *), and it doesn't seem to affect the performance numbers.

Copy link
Contributor

@hamishknight hamishknight Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Could be worth seeing if this can be done purely using atomics to avoid needing PlatformMutex, e.g compare-and-exchange a null pointer with 0x1 on the thread that computes the layout, with racing threads spinning until it becomes > 0x1.

Edit: Having thought about this a bit more, I'm not sure it's actually a good idea: we'd still need a mutex to guard the allocator, and doing a spinlock can potentially lead to deadlocks

@rintaro rintaro force-pushed the syntax-data-arena branch 4 times, most recently from 10d9545 to fd35cc9 Compare December 24, 2024 17:00
@rintaro rintaro force-pushed the syntax-data-arena branch 2 times, most recently from 2132a4f to 2829606 Compare January 1, 2025 22:24
@rintaro rintaro marked this pull request as ready for review January 2, 2025 00:04
@rintaro rintaro requested a review from hamishknight January 2, 2025 00:11
@rintaro rintaro force-pushed the syntax-data-arena branch 2 times, most recently from 74beef9 to 6a5f0fe Compare January 2, 2025 23:27
@rintaro
Copy link
Member Author

rintaro commented Jan 2, 2025

Rebased on main after #2924

@rintaro rintaro force-pushed the syntax-data-arena branch from 6a5f0fe to c53efa4 Compare January 2, 2025 23:31
@rintaro
Copy link
Member Author

rintaro commented Jan 2, 2025

@swift-ci Please test

Copy link
Member

@ahoppen ahoppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. 🚀 Looks great to me overall, I left a few comments inline.

Comment on lines +450 to +463
/// Each node consumes `SyntaxData` size at least. Non-empty layout node tail
/// allocates a pointer storage for the base address of the layout buffer.
///
/// For layout buffers, each child element consumes a `SyntaxDataReference` in
/// the parent's layout. But non-collection layout nodes, the layout is usually
/// sparse, so we can't calculate the exact memory size until we see the RawSyntax.
/// That being said, `SytnaxData` + 4 pointer size looks like an enough estimation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this a bit and think that we should be a little more conservative with the memory that we allocate. My rationale is as follows (if you agree, I make this a doc comment).

We can’t know the size required to hold all SyntaxData for the entire tree until we have traversed it because totalNodes only counts the non-nil syntax nodes in the tree (which require the size of SyntaxData plus the size of AtomicPointer by which the SyntaxData is referenced from the parent node). Additionally, all layout children that are nil require the size of AtomicPointer but these nil-slots are not accounted for in totalNodes, so we need to estimate the ratio of nil children to non-nil children. The facts that we can base our estimate on are:

  • In a valid syntax tree, every layout node with n children has n + 1 unexpected* children, which are nil for valid syntax trees, which account for the majority of syntax trees. We can thus expect at least 1 nil-child for every non-nil child.
  • Some layout nodes have optional children, which further increases the number of nil-children.
  • Collection nodes don’t have nil children, which offsets the optional children to some degree.

Based on that information, we can choose to over- or underestimate:

  • Under-estimating the slab size by a constant-ish factor of 2 or 3 is not a big deal because it means that we just need to allocate another 1 or 2 slabs if the user does a full-tree traversal. If the user doesn’t traverse the entire tree, we save a little memory.
  • Over-estimating the slab size if the full tree size is less than 4096 means that we allocate memory which we’ll never use.

We thus assume that every non-nil child has roughly one child that is nil.

Copy link
Member Author

@rintaro rintaro Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep single node trees minimal. Note that single node trees are often created when rewriting TokenSyntax in SyntaxRewriter. MemoryLayout<SyntaxData>.stride (32 bytes) is enough for them. I don't want to waste extra pointers size. So I think we should handle root node specially.

As for the actual estimation, actually measured with our swift-syntax test suite:

  • with my estimation logic, 1,043/998,482 (0.1%) trees created with slabSize < 4096 overflows the estimated slab size,
  • with your estimation logic, 54,601/1,047,632 (5.2%) trees created with slabSize < 4096 overflowed
  • fwiw, (dataSize + pointerSize * 3) * nodeCount still overflows in 28,417/1,020,774 trees.

And I think under-estimating the slab size is more problematic than over-estimating. because it wastes most of the second slab.

Copy link
Member

@ahoppen ahoppen Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t think about the single-node trees. That’s a good point and a worthwhile optimization. Maybe worth a comment.

Your analysis is correct under the assumption that the user visits the entire tree, which might not necessarily be true. And it might be worth measuring how much memory in the slabs are not used with the different estimation logics.

Why is the denominator of your measurements different for the different estimation logics? I would expect them to be the same because you should have created the same number of trees by running the test suite.

Copy link
Member Author

@rintaro rintaro Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your analysis is correct under the assumption that the user visits the entire tree, which might not necessarily be true.

FWIW my measurements are based on the actual test cases we have. They might be different from the typical use-cases, but I just wanted to point out that.

It's true that users don't always visit the entire tree, but I don't think some overallocation is that problematic. (of course we want to avoid allocating memory that we know will never be used)

And it might be worth measuring how much memory in the slabs are not used with the different estimation logics.

That would be interesting 👍

Why is the denominator of your measurements different for the different estimation logics? I would expect them to be the same because you should have created the same number of trees by running the test suite.

The denominator is the number of "trees created with slabSize < 4096" which is different between the estimation logics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will revisit this.

@rintaro rintaro force-pushed the syntax-data-arena branch 2 times, most recently from 03ed9c5 to 8949f20 Compare January 4, 2025 05:26
Copy link
Contributor

@hamishknight hamishknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

return nil
}

return SyntaxIdentifier(rootId: Self.rootId(of: root.raw), indexInTree: indexInTree)
return SyntaxIdentifier(rootId: UInt(rawID: root.raw.id), indexInTree: indexInTree)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but out of curiosity, is there any reason rootId isn't RawSyntax.ID?

Copy link
Member Author

@rintaro rintaro Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we thought it should be serializable. But at this point, I don't think there's a reason. Actually I was thinking to make it just RawSyntax.ID in a follow up :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you mention serializing, that might have been the reason when we still transferred the syntax tree via XPC or JSON from the C++ parser in 2018-ish.

@rintaro rintaro force-pushed the syntax-data-arena branch from 8949f20 to 7a36dbb Compare January 6, 2025 17:34
@rintaro
Copy link
Member Author

rintaro commented Jan 6, 2025

@swift-ci Please test

Rework `Syntax` internals. "Red" tree is now bump-pointer allocated and
visited children are cached.

* `Syntax` is a pair of the allocator (strong reference to
  `SyntaxDataArena` class) and a pointer to the allocated data
  (`SyntaxData` struct).
* All the red-tree data (parent and positions) are
  bump-pointer-allocated and cached.
* The arena and the tree are 1:1. Each mutation creates a new arena.
* `child(at:)` is now O(1) because the red-tree children are cached in
  the arena.
* `root` is now O(1) regardless of the depth because the arena holds the
  reference.
* Simplify `SyntaxChildren`. `SyntaxChildrenIndex` is now just a `Int`

Remove some hacks as they aren't needed anymore:
* `UnownedSyntax` because `SyntaxDataReference` replaces it.
* `SyntaxNodeFactory` because `Syntax.Info` is gone.

Additionally:
* Flatten `AbsoluteSyntaxInfo` to simplify the data and clarify what
  `advancedBySibling(_:)` and `advancedToFirstChild()` do.
* Remove `RawSyntaxChildren` and `RawNonNilSyntaxChildren` as they were
  implementation detail of `SyntaxChildren`.
* Remove `AbsoluteRawSyntax` and `AbsoluteSyntaxPosition` as nobody was
  really using it.
* Rename `Syntax.indexInParent` to `layoutIndexInParent` because it was
  confusing with `SyntaxProtocol.indexInParent: SyntaxChildrenIndex`
@rintaro rintaro force-pushed the syntax-data-arena branch from 7a36dbb to 8fff0de Compare January 7, 2025 19:09
@rintaro
Copy link
Member Author

rintaro commented Jan 7, 2025

@swift-ci Please test

@rintaro
Copy link
Member Author

rintaro commented Jan 7, 2025

@swift-ci Please test Windows

@rintaro rintaro enabled auto-merge January 7, 2025 22:06
@rintaro rintaro disabled auto-merge January 7, 2025 22:08
@rintaro rintaro merged commit 6a8b21a into swiftlang:main Jan 7, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants