Skip to content

Pass documentation context its only bundle during initialization #1057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Oct 24, 2024

Conversation

d-ronnqvist
Copy link
Contributor

Bug/issue #, if applicable: rdar://136208312

Summary

This is the fourth of many changes to redefine how documentation contexts are created.

This adds a new DocumentationContext initializer that takes a DocumentationBundle as a parameter. This change means that the context doesn't need a DocumentationContextDataProvider to provide it a bundle. It only needs a simplified (different) data provider to read the data for the bundle's files.

It also updates ConvertService and ConvertAction to create the context of documentation-to-build using this initializer. In doing so, they can't use DocumentationConverter anymore because it expects to register content with the context itself using its DocumentationWorkspace (which conforms to DocumentationContextDataProvider).

Changing DocumentationConverter to work this way would require source breaking changes—I know because I tried—so instead I looked into extracting its functionality into pieces that ConvertService and ConvertAction both could use. This investigation revealed that they share very little code. The following image shows the lines of DocumentationConverter.convert(outputConsumer:) highlighted depending on if it's

  • workspace registration code, no longer needed or wanted. (red)
  • only used by the convert action (blue)
  • only used by the convert service (yellow)
Screenshot 2024-10-14 at 20 05 13

As you can see, less than 10% of the code is shared between the convert action and the convert device. This realization led me to create distinct implementations for each, making the convert service implementation short enough that it could be inlined into ConvertService.convert(request:messageIdentifier:).

Lastly, now that ConvertAction no longer used DocumentationConverter I moved some properties around so that tests could inspect what used to be converter configuration on the action instead.

Dependencies

None.

Testing

Nothing in particular this isn't a user-facing change.

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

  • Added Updated tests
  • Ran the ./bin/test script and it succeeded
  • [ ] Updated documentation if necessary A future PR in this series will update the contributor documentation when a few more changes are in place.

@d-ronnqvist d-ronnqvist added the code cleanup Code cleanup *without* user facing changes label Oct 14, 2024
@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

# Conflicts:
#	Sources/SwiftDocCUtilities/Action/Actions/Convert/ConvertAction.swift
Copy link
Contributor

@patshaughnessy patshaughnessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting started trying to understand all of these changes. I'll keep working on this, but I've left a few questions for you in the meantime.

So far, seems like a great simplification! Nice work 👏

try await perform(logHandle: &logHandle).0
}

func perform(logHandle: inout LogHandle) async throws -> (ActionResult, DocumentationContext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we returning this tuple here? It looks like the call site above only uses the ActionResult value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PreviewAction/convert() uses it here. This makes the convert action responsible for creating a context from the inputs and pass that context to the preview action. Since the preview action receives the context from the convert action, it never needs to create a context itself and never needs to store a context.


import Foundation

package enum ConvertActionConverter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this distinct from ConvertAction ? Just to extract code out into a separate file? Could we move the convert function into ConvertAction and reduce the number of classes/names we need to keep track of?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why is this an enum? I don't see any enum cases defined below anywhere. This seems like a Swift idiom I'm not familiar with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this distinct from ConvertAction?

Mainly because this code uses some types and API that are internal to SwiftDocC. ConvertAction is defined in a different module and can't access that internal API. We could make the necessary API package level, but I thought that it would be a smaller change to keep that code in SwiftDocC and let ConvertAction call it.

Also, why is this an enum? I don't see any enum cases defined below anywhere. This seems like a Swift idiom I'm not familiar with.

This pattern is sometimes used as a namespace in Swift. The reason for using an enum for this is that enum without any cases is "uninhabited" and can't be initialized which means that callers can only ever access its static API which limits the possibilities for state and mutation.

In this case the enum only has a single static function so the "namespace" doesn't provide much benefit compared to just a global function and since it's only package access we can change it freely in the future without needing to make deprecations.

/// - diagnosticEngine: The pre-configured engine that will collect problems encountered during compilation.
/// - configuration: A collection of configuration for the created context.
/// - Throws: If an error is encountered while registering a documentation bundle.
package init(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to retain the old initializer? ...and add the complexity around:

 private enum _Provider {
        case legacy(DocumentationContextDataProvider)
        case new(DocumentationBundleDataProvider)
    }

Is the problem that the old initializer is public? And so we can't remove it?

Should you deprecate the old initializer so we can one day, after the next major release possibly, remove the old initializer and the legacy data provider?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Because the old initializer is public we need to keep it around and retain support for one full minor release. I didn't deprecate it in this PR because it's still used in tests so deprecating it would introduce warnings. #1059 deprecates it here.

Copy link
Contributor

@patshaughnessy patshaughnessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more questions for you... I'll move on to review #1059 tomorrow.

workspace: DocumentationWorkspace = DocumentationWorkspace(),
context: DocumentationContext? = nil,
dataProvider: DocumentationWorkspaceDataProvider? = nil,
workspace _: DocumentationWorkspace = DocumentationWorkspace(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm familiar with using an underscore for the argument label, which means at the call site the label is optional. But I haven't seen the actual parameter name set to _ like this, while using an argument label workspace. What does this mean? Does it have anything to do with the default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binding the parameter to _ is a way of explicitly marking it as unused because nothing can access its value. Callers can still pass a value but the received doesn't (and can't) access it. I used it here to verify for myself that nothing was using the workspace.

The reason for not removing the parameter completely is that there were still tests that used it and I didn't want to make this PR bigger than it already was. Instead I removed this parameter here in #1059


let context = try DocumentationContext(bundle: bundle, dataProvider: dataProvider, diagnosticEngine: diagnosticEngine, configuration: configuration)
Copy link
Contributor

@patshaughnessy patshaughnessy Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a while I've been thinking that it's surprising this initializer performs so much work: Not only does it create the documentation context, but that includes reading in all of the assets, source markdown files, symbol graphs, etc. This seems like an important step in the compilation process that should be called out a bit more clearly.

Since we're refactoring this top level code now, I'd suggest splitting this up into two calls:

  • let context = DocumentationContext()
  • context.register

This would involve momentarily having an empty context, but that seems harmless and this code reflects what's happening here more clearly I think.

Copy link
Contributor Author

@d-ronnqvist d-ronnqvist Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're refactoring this top level code now, I'd suggest splitting this up into two calls:

  • let context = DocumentationContext()
  • context.register

No, at least not on the context itself. That would essentially put us back where we started with a two-step creation:

let workspace = DocumentationWorkspace()
let context = DocumentationContext(dataProvider: workspace, ...)
// other code
workspace.register(...)

There are three main problems with this pattern:

  • The context needs to be fully mutable since its won't have its "initial" values until register is called.
  • It's possible to keep modifying the context up until register but also after. Sometimes this led to bugs where a property was modified after registration and thus had no effect
  • There's nothing preventing register from running more than once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, one idea that I've been thinking about for the is to move all the register work into a new DocumentationContextBuilder type with

consuming func build() -> DocumentationContext {}

This would allow us to move the registration code out of DocumentationContext and doesn't require making/keeping the context fully mutable, while still preventing bugs where the builder is modified after creating the context and preventing build() from being called more than once (because the builder is consumed and can no longer be used after returning the built context).

One of the reasons why I've keep the new context initializer package access is so that we can make that type of changes without needing even more rounds of deprecations.

Unfortunately I think that it would be challenging to make that change before 6.2 and maintain compatible with the two-step workspace-style setup without duplicating the existing registration code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok makes sense - I didn't consider mutability or the possibility of someone calling register twice.

let analysisProblems: [Problem]
let conversionProblems: [Problem]
do {
(analysisProblems, conversionProblems) = try converter.convert(outputConsumer: outputConsumer)
conversionProblems = try ConvertActionConverter.convert(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a mouthful: the word "convert" appears 3 times :) plus we have "conversion"

Can we think of a different name for ConvertActionConverter ? Maybe just Converter? Are there other converters we want to distinguish this from?

Maybe the other converter is the DocumentationContextConverter? The similar names make me wonder if we can move the ConvertActionConverter.convert function into DocumentationContextConverter. Do we really need two converters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's very repetitive but the one thing I like abut that name is that it signals that the API is currently only intended for one caller.

If it had a more general name then it would invite generalization that I don't think we have enough information about use-cases to do well.

The suggestion about DocumentationContextConverter is interesting but mostly because I don't see a strong reason for that type to exist in its current form.

@@ -84,7 +84,7 @@ public class DocumentationContextConverter {
/// - Parameters:
/// - node: The documentation node to convert.
/// - Returns: The render node representation of the documentation node.
public func renderNode(for node: DocumentationNode) throws -> RenderNode? {
public func renderNode(for node: DocumentationNode) -> RenderNode? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to throw? Was the Swift compiler returning a warning here all along?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't. It's allowed for a function to declare itself as raising errors without ever doing so in the implementation.

I was surprised that this wasn't a warning.

(bundle, dataProvider) = Self.makeBundleAndInMemoryDataProvider(request)
}

let context = try DocumentationContext(bundle: bundle, dataProvider: dataProvider, configuration: configuration)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the other call site: Would be nice to know how much work happens under here when we create the documentation context.


let topLevelPages: [URL]
let provider: DocumentationBundleDataProvider
if moduleNames.count == 1, let moduleName = moduleNames.first, moduleName != info.displayName, let url = URL(string: "in-memory-data://\(moduleName).md") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "in-memory-data" protocol seems like a very different implementation than what we had before. What's the overall idea here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually very similar to how GeneratedDataProvider provides the synthesized markdown file which is what this replaces.

The way I had to divide up all these changes into "smaller" PRs meant that I sometimes needed to introduce temporary changes that a later PR addresses. Otherwise it would be easy to fall into the trap of "just one more change" until the PR is gigantic.

/// - symbolGraphTransformer: An optional closure that transforms the symbol graph after the loader decodes it.
init(
bundle: DocumentationBundle,
dataLoader: @escaping (URL, DocumentationBundle) throws -> Data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trick with the closure is a bit confusing. Are you trying to workaround the same problem as we have in DocumentationContext with the new and soon-to-be-deprecated initializers? Maybe we should follow the same pattern we have in DocumentationContext with the legacy/new enum and protocol? But for the data loader this time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

My reason for using a closure here instead of a legacy/new enum like in the context was that I was trying to isolate then legacy/new knowledge to the context. This code can be simplified after 6.2 is released so at the very least I should add a comment with that information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment saying that this code can be simplified after 6.2 in bf1d51c

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist d-ronnqvist merged commit e4e1d09 into swiftlang:main Oct 24, 2024
2 checks passed
@d-ronnqvist d-ronnqvist deleted the init-context-with-bundle branch October 24, 2024 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code cleanup Code cleanup *without* user facing changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants