-
-
Notifications
You must be signed in to change notification settings - Fork 24
Closed
Labels
👎 phase/noPost cannot or will not be acted onPost cannot or will not be acted on🙅 no/wontfixThis is not (enough of) an issue for this projectThis is not (enough of) an issue for this project
Description
Initial checklist
- I read the support docsI read the contributing guideI agree to follow the code of conductI searched issues and couldn’t find anything (or linked relevant results below)To pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
Problem
Heya, I would like to directly import/use the compiler, on a pre-created list of events.
I think this is currently not possible?
Obviously this is the key function provided by this package, then fromMarkdown
is just a wrapper around it and the upstream postprocess
/parse
/postprocess
functions (all importable)
Solution
Allow for e.g.
import {compiler} from 'mdast-util-from-markdown/lib/index'
compiler(options)(events)
I guess this just requires the addition of export function compiler...
, and a small modification of package.json
, ilke in micromark itself:
{
"exports": {
".": {
"development": "./dev/index.js",
"default": "./index.js"
},
"./lib/index": {
"development": "./dev/lib/index.js",
"default": "./lib/index.js"
},
"./lib/index.js": {
"development": "./dev/lib/index.js",
"default": "./lib/index.js"
}
}
}
Alternatives
Don't think so
Metadata
Metadata
Assignees
Labels
👎 phase/noPost cannot or will not be acted onPost cannot or will not be acted on🙅 no/wontfixThis is not (enough of) an issue for this projectThis is not (enough of) an issue for this project
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
wooorm commentedon Mar 25, 2022
What’s the reason you have events?
compile
and events are all rather “internal” and not “pretty”chrisjsewell commentedon Mar 25, 2022
To implement https://github.com/executablebooks/myst-spec, and replace our current markdown-it implementation: https://github.com/executablebooks/markdown-it-docutils, I need to be able to perform nested/incremental parsing:
Trust me, I know the "unprettiness" of Markdown parsing 😅, I'm also the author of https://github.com/executablebooks/markdown-it-py
Events and compilers are already documented as part of your core parsing architecture: https://github.com/micromark/micromark#architecture, so I would not necessarily say they are completely "internal" 😬
chrisjsewell commentedon Mar 25, 2022
FYI, if we can get all this working, then we are hoping to utilise it as the core parsing architecture in products such as https://curvenote.com/, https://irydium.dev/ and https://github.com/agoose77/jupyterlab-markup 😄
wooorm commentedon Mar 25, 2022
Can you expand on this? Markdown already allows for (a). What is the “context” you mean in (b)?
chrisjsewell commentedon Mar 25, 2022
The context is:
Initialising the parse with the correct initial position, so that all the node positions point to their correct places in the source file. You could do this retroactively, in a post-processing step, but it's nicer to do in one parse
Initialising the parser with known definition/footnote identifiers. This is the key point really, because CommonMark only parses definition references of known definitions (otherwise treating them as plain text), you have to have this context of "found" definitions.
It would be great if CommonMark, would just parse all
[x]
syntax as definition references, irrespective of what definitions are present, then allow the renderer to handle missing definitions, but such is life 😒.wooorm commentedon Mar 25, 2022
Why not integrate with micromark in an extension?
Extensions parse their thing and they can annotate that some stuff inside them should be parsed next
https://github.com/micromark/micromark/blob/fc5e2d8b83eb9c01c9bfd2f4b1ea4e42e6a7e224/packages/micromark-util-types/index.js#L20
chrisjsewell commentedon Mar 25, 2022
Possibly, but it then means that "everything" has to be parsed in a single parse, and makes things a lot less "modular" and incremental
the idea with these directives, is that you perform an initial parse, which just identifies the directives
which gets you to an intermediate AST
Then you perform a subsequent parse, which processes the directives and gets you to your final AST:
This makes it a lot easier than having to do everything at the micromark "level"
wooorm commentedon Mar 25, 2022
the thing is that with tracking position (one thing) but importantly all the definition identifier stuff, you’re replicating a lot of the work.
Also note that the positional info is not going to be 100% if you have mdast for fenced code, and then parse its result, because an funky “indent”/exdent is allowed:
https://spec.commonmark.org/dingus/?text=%20%20%20%60%60%60%7Bnote%7D%0A%20%20Internal%0A%20*markdown*%0Amore%0A%60%60%60
Uhhh, this post is about juggling micromark internals to not have to make a micromark extension? How is that easier? 🤔 I don‘t get it.
It sounds simpler to
micromark already does that? It has it built in. Why do you need separate stages?
wooorm commentedon Mar 25, 2022
How are you using “incremental”?
chrisjsewell commentedon Mar 25, 2022
Hmmm, I feel I'm not explaining directives properly to you; processing directive content is not just about parsing, its about node generation. Directives need to be able to generate MDAST nodes, and these nodes do not necessarily relate directly to syntax in the source text.
Take the figure directive:
This:
needs to go to this:
How would you even go about getting a micromark extension to achieve this?
It is a lot easier to work at the MDAST node level than the micromark event level, when processing directives.
But you do need to have a way to perform nested parsing.
This is exactly how docutils/sphinx directives work; you are generating nodes, and only performing nested parsing when necessary: https://github.com/live-clones/docutils/blob/6548b56d9ea9a3e101cd62cfcd727b6e9e8b7ab6/docutils/docutils/parsers/rst/directives/images.py#L146
chrisjsewell commentedon Mar 26, 2022
FYI, I also know of https://github.com/micromark/micromark-extension-directive, but these directives are quite different, in that their content is "interpreted" text, i.e. it might not be Markdown.
Take for example
csv-table
: https://docutils.sourceforge.io/docs/ref/rst/directives.html#csv-table-1Here, the content will be converted into table nodes, which is not something that can be done in a micromark extension.
wooorm commentedon Mar 27, 2022
Thanks for expanding. I now understand the use case better, particularly why it’s a choice at the AST level, after the initial parse, to parse subdocuments.
I do find your earlier statements about wanting to reuse identifiers of “outer” definitions in these “inner” a bit weird. If they are really so separate and optional, it seems beneficial to have them “sandboxed” from the outer content, and in other words it seems to be at odds with your goal to reuse identifiers.
I don’t see why not? micromark can parse that syntax. Though micromark is a level under mdast. So micromark would parse the syntax. A utility would turn the events into that tree.
I am not suggesting to do the “Processing directives” part in micromark. As I understand it we both believe that that can happen in mdast.
I am suggesting to “perform nested parsing” in micromark. Because markdown does “nested” already: micromark has this builtin.
This issue is about
compile
, but you also mentioned:options.startPoint
support tomicromark
(and: how to even handle indents?)How important are these to you? Are there other subissues you percieve?
19 remaining items
chrisjsewell commentedon Mar 29, 2022
Yeh no worries
unicornware commentedon Apr 11, 2024
@wooorm
would you be open to adding
options.from
so that it can be passed todocument
?from
can be passed tocreateTokenizer
when working withmicromark
, but because thecompiler
function is not exported, i cannot make any use of the option without reimplementing the compiler myself.wooorm commentedon Apr 13, 2024
Hi Lex! Uhm, maybe, maybe not? Sounds like you want to increment positional info. I could see that not work the way you want. Can you elaborate more on your use case?
The reason I think it will not work, is that there are probably multiple gaps.
There’s a gap before
more
too. A similar problem occurs in MDX, where the embedded JS expressions can have markdown prefixes:A better solution might be around https://github.com/vfile/vfile-location, similar to vfile/vfile-location#14, and the “stops” in mdxjs-rs: https://github.com/wooorm/markdown-rs/blob/60db8e5896be05d23137a6bdb806e63519171f9e/src/util/mdx_collect.rs#L24.
unicornware commentedon Jul 5, 2024
@wooorm
i'm not sure i understand your example 😅
i'm working on an ast for docblocks that supports markdown in comments, so mdast node positions need to be relative to my comment nodes.
i ended up using
transforms
to apply my positioning logic, but feel it to be quite messy. based on some soft "tests",options.from
would be more idealwooorm commentedon Jul 5, 2024
There are several gaps.
from
only gives info for the start of the first line. There are multiple lines. If you want what you want, you’d need multiplefrom
s. That doesn‘t exist. I don’t think this does what you want.from
is this place:Here your positional info is out of date again:
I recommend taking more time with my previous comment. Trying to grasp what it says. I think I describes the problem well, for your case, but also for MDX, and then shows how it is solved for MDX, which is what I believe you need to do too.
unicornware commentedon Jul 5, 2024
@wooorm
oh i see, but i actually do want the initial
from
so the root node doesn't start at1:1
. i already have the logic to account for comment delimiters, if thats what you meant by gaps/multiplefrom
s.wooorm commentedon Jul 5, 2024
My point is that you want that and more. Having just that is not enough for you.
wooorm commentedon Jul 5, 2024
Please try to patch-package this issue, or edit it in your node_modules locally, and check if that works for you? I don't think it will.
unicornware commentedon Jul 7, 2024
@wooorm
i think that is where our disconnect is. i know
options.from
isn't enough by itself, but it would be useful for markdown "chunks" spanning one line (i.e. a one line description) because no shifting is needed. for chunks spanning more than one line,options.from
is useful so i can start my calculations from the given start point instead of1:1
.i came to this conclusion because my soft "tests" included editing
node_modules
locally, lol.wooorm commentedon Jul 8, 2024
It could theoretically be useful for a hypothetical human. I’m not interested in adding things that might be useful to someone in the future, as I find that often, that future user practically wants something else.
Meanwhile, I believe you are helped with vfile/vfile-location#14 and stops from
mdx_collect
.unicornware commentedon Jul 9, 2024
@wooorm
is that your suggested approach for pure markdown snippets as well?
additionally, from what i see, that issue is about max line length, which isn't what i'm looking for.
wooorm commentedon Jul 9, 2024
That depends, is this use case a problem you have? From what you said before, I grasp that you don‘t have that problem or need that solution.
That issue is a feature request for a feature. It was brought up for a particular lint rule. That lint rule deals with line length. There are other lint rules. There is also your case, which is helped by that issue. Please though, read not just the link, but also the rest of what I mentioned: