fix: Parsing/pasting on `prosemirror-model: 1.25.1` #1661

matthewlipski · 2025-05-06T07:59:26Z

Issue

This PR addresses breaking changes to parsing introduced by prosemirror-model: 1.25.1. Prior to this update, the DOMParser would drop any nodes parsed that were not valid in the schema, based on the parent node. For example, take a look at the following HTML:

<ul>
  <li>
    <p>
      ...
    </p>
  </li>
</ul>

When parsing this before the update, the li element would get parsed as a bulletListItem node. Within it, the p element should also get parsed as a paragraph node. However, because the schema dictates that paragraph nodes can't be within bulletListItem nodes, the paragraph is dropped altogether and ignored.

After the update, the paragraph node will no longer get dropped, and the DOMParser will instead attempt to insert it somewhere so that it is valid in the schema. Since it can't be a child of the bulletListItem node, it instead gets wrapped in a blockContainer and blockGroup, which then gets inserted into the bulletListItem's parent blockContainer. In this scenario, we actually want to ignore the p tag and just parse its content.

Overall, the changes in prosemirror-model: 1.25.1 mean we have to be more diligent when writing parse rules. Paragraphs especially appear in many places, e.g. external HTML paragraphs, internal HTML paragraphs, internal HTML list items, and internal HTML table cells.

Parse rule changes

Internal HTML

After updating prosemirror-model, external HTML parse rules were being triggered when parsing blockContent nodes. This is because blockContent nodes can have all kinds of HTML inside, which we actually don't care about for parsing. To fix this, all default blockContent nodes have received the following change in their parse rules:

Before:

{
  tag: "div[data-content-type=" + this.name + "]",
},

After:

{
  tag: "div[data-content-type=" + this.name + "]",
  contentElement: ".bn-inline-content",
},

This change was made following a suggestion by Marijn here.

When parsing a blockContent element, this now tells the DOMParser to ignore all descendant elements except the one with the bn-inline-content class, and only parse its content.

Additionally, any bn-inline-content elements with the data-editable attribute have had this attribute removed.

External HTML

Because of the new parsing behaviour, we've had to add additional logic to list items and table cells.

HTML li elements may have multiple block or inline elements in them, which is incompatible with our schema as *ListItem blocks can only contain inline content. By default, the new parsing behaviour lifts all nodes that are incompatible with the schema up, so any e.g. p and h1 elements within a li are parsed as separate blocks as children of the *ListItem block. This has been modified to be more Notion-like, and you can find the logic for this explained in getListItemContent.ts.

The new default behaviour for table cells is the same as for list items, i.e. elements like p and h1 get lifted as children of the parent table block. Unlike list items though, moving content that isn't compatible with the schema to the children doesn't really make sense, so we would rather drop it altogether. This is basically how it already worked before the prosemirror-model update. However, the content of each element is now appended on the same line, whereas before, content from block-level elements would be appended to a new line (we may want to look into this again in the future).

Additionally, there's a minor fix for where media elements (embed, img, audio, and video) inside figure elements causing their respective blockContent nodes to be parsed twice.

Closes #1643
Closes #1645

vercel · 2025-05-06T07:59:30Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
blocknote	✅ Ready (Inspect)	Visit Preview	May 9, 2025 2:50pm
blocknote-website	🛑 Canceled (Inspect)		May 9, 2025 2:50pm

pkg-pr-new · 2025-05-06T09:56:03Z

Open in StackBlitz

@blocknote/ariakit

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/ariakit@1661

@blocknote/code-block

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/code-block@1661

@blocknote/core

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/core@1661

@blocknote/mantine

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/mantine@1661

@blocknote/react

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/react@1661

@blocknote/server-util

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/server-util@1661

@blocknote/shadcn

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/shadcn@1661

@blocknote/xl-docx-exporter

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/xl-docx-exporter@1661

@blocknote/xl-multi-column

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/xl-multi-column@1661

@blocknote/xl-odt-exporter

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/xl-odt-exporter@1661

@blocknote/xl-pdf-exporter

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/xl-pdf-exporter@1661

commit: 044831d

nperez0111

This feels wrong, could you give a write up on why this change is needed.

It feels like it litters the code everywhere, I may just be misunderstanding something here

matthewlipski · 2025-05-07T16:25:02Z

This feels wrong, could you give a write up on why this change is needed.

It feels like it litters the code everywhere, I may just be misunderstanding something here

Yep you were right, I went a bit overboard to make sure parse rules weren't being triggered accidentally and pretty much all of the getAttrs checks for element.closest("[data-content-type]") were redundant.

…te into pm-model-update-fixes

Co-authored-by: Matthew Lipski <[email protected]>

Fixed parsing/pasting on prosemirror-model: 1.25.1

e58cbfe

vercel bot deployed to Preview – blocknote May 6, 2025 08:08 View deployment

vercel bot deployed to Preview – blocknote-website May 6, 2025 08:18 View deployment

Updated server-util snapshots

d20d1ed

vercel bot deployed to Preview – blocknote May 6, 2025 09:59 View deployment

vercel bot deployed to Preview – blocknote-website May 6, 2025 10:07 View deployment

nperez0111 reviewed May 7, 2025

View reviewed changes

nperez0111 assigned matthewlipski May 7, 2025

matthewlipski added 3 commits May 7, 2025 17:48

Removed unnecessary checks for external HTML parsing

3e1b2b1

Cleaned BlockContainer parse rules

eaa3555

Removed paragraph parse rule priority

1dccd07

vercel bot deployed to Preview – blocknote-website May 7, 2025 16:35 View deployment

vercel bot deployed to Preview – blocknote May 7, 2025 17:22 View deployment

matthewlipski added 2 commits May 7, 2025 20:16

Added comments

9bba365

Changed inline content detection to use [data-editable]

7afa596

vercel bot deployed to Preview – blocknote-website May 7, 2025 21:27 View deployment

vercel bot deployed to Preview – blocknote May 7, 2025 21:38 View deployment

chore: bump to breaking prosemirror version

aae6bf9

vercel bot deployed to Preview – blocknote-website May 8, 2025 12:13 View deployment

vercel bot had a problem deploying to Preview – blocknote May 8, 2025 12:13 Failure

matthewlipski added 2 commits May 8, 2025 16:49

Removed data-editable attribute from inline content container elements

7fba8e2

Merge branch 'pm-model-update-fixes' of github.com:TypeCellOS/BlockNo…

930b74f

…te into pm-model-update-fixes

vercel bot deployed to Preview – blocknote-website May 8, 2025 14:57 View deployment

vercel bot had a problem deploying to Preview – blocknote May 8, 2025 14:58 Failure

fix: list items & table cell content parsing

fc5aa02

Co-authored-by: Matthew Lipski <[email protected]>

nperez0111 approved these changes May 9, 2025

View reviewed changes

Merge branch 'main' into pm-model-update-fixes

ec982ab

fix: do not use instanceof

044831d

nperez0111 merged commit 3cdc8bf into main May 9, 2025
5 of 7 checks passed

nperez0111 deleted the pm-model-update-fixes branch May 9, 2025 14:24

vercel bot deployed to Preview – blocknote May 9, 2025 14:36 View deployment

vercel bot deployed to Preview – blocknote-website May 9, 2025 14:50 View deployment

This was referenced May 10, 2025

List from HTML Markdown being rendered weirdly #1678

Closed

unexpected copy-paste behavior of nested block #1651

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Parsing/pasting on `prosemirror-model: 1.25.1` #1661

fix: Parsing/pasting on `prosemirror-model: 1.25.1` #1661

matthewlipski commented May 6, 2025 •

edited

Loading

vercel bot commented May 6, 2025 •

edited

Loading

pkg-pr-new bot commented May 6, 2025 •

edited

Loading

nperez0111 left a comment

matthewlipski commented May 7, 2025

fix: Parsing/pasting on prosemirror-model: 1.25.1 #1661

fix: Parsing/pasting on prosemirror-model: 1.25.1 #1661

Conversation

matthewlipski commented May 6, 2025 • edited Loading

Issue

Parse rule changes

Internal HTML

External HTML

vercel bot commented May 6, 2025 • edited Loading

pkg-pr-new bot commented May 6, 2025 • edited Loading

nperez0111 left a comment

Choose a reason for hiding this comment

matthewlipski commented May 7, 2025

fix: Parsing/pasting on `prosemirror-model: 1.25.1` #1661

fix: Parsing/pasting on `prosemirror-model: 1.25.1` #1661

matthewlipski commented May 6, 2025 •

edited

Loading

vercel bot commented May 6, 2025 •

edited

Loading

pkg-pr-new bot commented May 6, 2025 •

edited

Loading