-
-
Notifications
You must be signed in to change notification settings - Fork 538
fix: Parsing/pasting on prosemirror-model: 1.25.1
#1661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@blocknote/ariakit
@blocknote/code-block
@blocknote/core
@blocknote/mantine
@blocknote/react
@blocknote/server-util
@blocknote/shadcn
@blocknote/xl-docx-exporter
@blocknote/xl-multi-column
@blocknote/xl-odt-exporter
@blocknote/xl-pdf-exporter
commit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels wrong, could you give a write up on why this change is needed.
It feels like it litters the code everywhere, I may just be misunderstanding something here
Yep you were right, I went a bit overboard to make sure parse rules weren't being triggered accidentally and pretty much all of the |
Co-authored-by: Matthew Lipski <[email protected]>
Issue
This PR addresses breaking changes to parsing introduced by
prosemirror-model: 1.25.1
. Prior to this update, theDOMParser
would drop any nodes parsed that were not valid in the schema, based on the parent node. For example, take a look at the following HTML:When parsing this before the update, the
li
element would get parsed as abulletListItem
node. Within it, thep
element should also get parsed as aparagraph
node. However, because the schema dictates thatparagraph
nodes can't be withinbulletListItem
nodes, theparagraph
is dropped altogether and ignored.After the update, the
paragraph
node will no longer get dropped, and theDOMParser
will instead attempt to insert it somewhere so that it is valid in the schema. Since it can't be a child of thebulletListItem
node, it instead gets wrapped in ablockContainer
andblockGroup
, which then gets inserted into thebulletListItem
's parentblockContainer
. In this scenario, we actually want to ignore thep
tag and just parse its content.Overall, the changes in
prosemirror-model: 1.25.1
mean we have to be more diligent when writing parse rules. Paragraphs especially appear in many places, e.g. external HTML paragraphs, internal HTML paragraphs, internal HTML list items, and internal HTML table cells.Parse rule changes
Internal HTML
After updating
prosemirror-model
, external HTML parse rules were being triggered when parsingblockContent
nodes. This is becauseblockContent
nodes can have all kinds of HTML inside, which we actually don't care about for parsing. To fix this, all defaultblockContent
nodes have received the following change in their parse rules:Before:
After:
This change was made following a suggestion by Marijn here.
When parsing a
blockContent
element, this now tells theDOMParser
to ignore all descendant elements except the one with thebn-inline-content
class, and only parse its content.Additionally, any
bn-inline-content
elements with thedata-editable
attribute have had this attribute removed.External HTML
Because of the new parsing behaviour, we've had to add additional logic to list items and table cells.
HTML
li
elements may have multiple block or inline elements in them, which is incompatible with our schema as*ListItem
blocks can only contain inline content. By default, the new parsing behaviour lifts all nodes that are incompatible with the schema up, so any e.g.p
andh1
elements within ali
are parsed as separate blocks as children of the*ListItem
block. This has been modified to be more Notion-like, and you can find the logic for this explained ingetListItemContent.ts
.The new default behaviour for table cells is the same as for list items, i.e. elements like
p
andh1
get lifted as children of the parenttable
block. Unlike list items though, moving content that isn't compatible with the schema to the children doesn't really make sense, so we would rather drop it altogether. This is basically how it already worked before theprosemirror-model
update. However, the content of each element is now appended on the same line, whereas before, content from block-level elements would be appended to a new line (we may want to look into this again in the future).Additionally, there's a minor fix for where media elements (
embed
,img
,audio
, andvideo
) insidefigure
elements causing their respectiveblockContent
nodes to be parsed twice.Closes #1643
Closes #1645