Skip to content

Add table of contents support #980

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 7, 2020
Merged

Add table of contents support #980

merged 1 commit into from
Jan 7, 2020

Conversation

saschagrunert
Copy link
Member

This adds the possibility to add a generated table of contents before
the actual markdown, which will be later used for the correct assembling
of the overall CHANGELOG-x.y.md files.

Needs #978

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/release-eng Issues or PRs related to the Release Engineering subproject size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/release Categorizes an issue or PR as relevant to SIG Release. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 16, 2019
@saschagrunert saschagrunert changed the title WIP: Add table of contents support Add table of contents support Dec 22, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 22, 2019
@saschagrunert
Copy link
Member Author

Rebased on top of the latest master, ready for review.

@justaugustus justaugustus added this to the v1.18 milestone Jan 6, 2020
@justaugustus
Copy link
Member

/assign @hoegaarden @hasheddan

Copy link
Member

@cpanato cpanato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold for the assigners to review as well

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 6, 2020
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 6, 2020
Copy link
Contributor

@hasheddan hasheddan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @saschagrunert! One small comment but looks good otherwise :)

Copy link
Contributor

@hoegaarden hoegaarden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All just nits, except the error hiding and the thing with the backtick in the title.

pkg/notes/toc.go Outdated
}

func add(result io.StringWriter, title string, indent int, headers map[string]int) {
toReplace := strings.NewReplacer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

  • Technically, I believe we'd need to reimplement https://github.com/jch/html-pipeline/blob/master/lib/html/pipeline/toc_filter.rb#L41-L56 to be correct here. But I guess we are close enough? 🤷‍♂
  • toReplace is a bit of a misleading name. If we are happy not to fully / correctly implement the generation of the anchor names, we could use something like:
    link := strings.NewReplacer(
    	"!", "",
    	"#", "",
    	// all the other replacements
    	"~", "",
    	" ", "-", // move the replacement of the space also to the Replacer
    ).Replace(strings.ToLower(title))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it as suggested. I'm not sure which use-case we do not cover with the current implementation. Is there anything we missed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. multiple successive space should be replaced to one dash, I am not sure if all the chars we remove with the replacer are equal to /[^\w\- ]/, we don't do the equivalent of EscapeUtils.escape_html and maybe other (I just skimmed both this code and the github html pipeline code) ... but as said, I think we are close enough for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, yeah sounds reasonable to me.

pkg/notes/toc.go Outdated
Comment on lines 42 to 44
if strings.Contains(prevLine, "`") {
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what that is about, TBH. Wouldn't that skip the 2nd of two successive headings, if the 1st one had a backtick in it's title?

# This has `some backticks`
## And this heading will be skipped now?

There is no test case for that either so I am not sure about the use-case / intention for that special case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, wouldn't it break if we'd had a header line with a backtick in it?

  • we find a header and set prevLine to it's content, coincidentally it contains a backtick
  • next time around the if clause matches and we continue, prevLine still is set to the thing with the backtick
  • next time around the if clause matches and we continue, prevLine still is set to the thing with the backtick
  • ...
  • eventually we are done scanning, but never really processed much except up until the first header with a backtick

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main purpose of this is to skip code blocks (do not generate any headers inside them, because they could contain #. :) I think we have to somehow verify if we're inside a code block by the amount of backticks we've seen. 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking the right amount of back ticks should fix the code block issue in a more general way, PTAL :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is working correctly. The current implementation would ignore a perfectly valid heading

# `markdown` solves all our problems, they said

Other issues might arise with documents like

something

``````
wow ... this actually seems to be valid md

# ````go
# hello := `hello`
# ````
``````

something else

where the number of seen backticks is even in every line. This example is contrived, however we are dealing with random user input here.

Honest questions:

  • How much effort do we want to spend here for correctly rendering the TOC?
  • How likely is it that a b0rken TOC blanks out or scrambles the rest of the document?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first example should be covered by the current implementation, I verified that with an additional unit test:

func TestGenerateTOCBackTickInHeading(t *testing.T) {
toc, err := GenerateTOC("# `markdown` solves all our problems, they said")
require.Nil(t, err)
require.Equal(t, toc, "- [`markdown` solves all our problems, they said](#markdown-solves-all-our-problems-they-said)\n")
}

The second example is tricky, right. So what do you prefer? I could remove the overall code block check vs leaving it as it currently is.

pkg/notes/toc.go Outdated

var headerPattern = regexp.MustCompile("^(?P<indent>#+) ?(?P<title>.+)$")

func GenerateTOC(input string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

I feel weird implementing a (partial) Markdown parser and generator, but I checked really quick and I could not find a give-me-a-toc-for-this-markdown-doc do module ... So I guess we gotta do what we gotta do 😄

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 6, 2020
This adds the possibility to add a generated table of contents before
the actual markdown, which will be later used for the correct assembling
of the overall `CHANGELOG-x.y.md` files.

Signed-off-by: Sascha Grunert <[email protected]>
@hoegaarden
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 7, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cpanato, hoegaarden, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hoegaarden
Copy link
Contributor

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 7, 2020
@k8s-ci-robot k8s-ci-robot merged commit 484d258 into kubernetes:master Jan 7, 2020
@saschagrunert saschagrunert deleted the toc branch January 7, 2020 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/release-eng Issues or PRs related to the Release Engineering subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/release Categorizes an issue or PR as relevant to SIG Release. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants