Skip to content

Conversation

yuninxia
Copy link
Contributor

@yuninxia yuninxia commented Jun 1, 2023

In this pull request, I've refined the visibility and interpretability of Quarto render status messages.

Key modifications include:

  • Cells with labels: The syntax has been updated from " Cell {0}/{1}..." to " Cell '<label>' {0}/{1}...".

  • Cells without labels: The format has been altered from " Cell {0}/{1}..." to " Cell '' {0}/{1}...".

These refinements boost the readability of the script and contribute to the overall maintainability and professionalism of our codebase. I look forward to your review.

@cderv cderv linked an issue Jun 1, 2023 that may be closed by this pull request
Comment on lines 162 to 171
# get label
label = ''
source_lines = cell.source.split('\n')
label_line = next((line for line in source_lines if line.startswith('#| label:')), None)
if label_line:
label = label_line.replace('#| label:', '').strip()

status(" Cell '{0}' {1}/{2}...".format(
label, current_code_cell- 1, total_code_cells - 1
))
Copy link
Collaborator

@mcanouil mcanouil Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

It should be more reliable to get the id straight from the cell.
Notebooks, will always have an identifier, but the id is replaced by label if any.
This would avoid to re-do the parsing of parameters, etc.

Notebook cell identifier. Note that if there is no cell id then label will be used as the cell id if it is present. See https://jupyter.org/enhancement-proposals/62-cell-id/cell-id.html for additional details on cell ids.

Suggested change
# get label
label = ''
source_lines = cell.source.split('\n')
label_line = next((line for line in source_lines if line.startswith('#| label:')), None)
if label_line:
label = label_line.replace('#| label:', '').strip()
status(" Cell '{0}' {1}/{2}...".format(
label, current_code_cell- 1, total_code_cells - 1
))
status(" Cell {0}/{1}: {2} ... ".format(
current_code_cell - 1,
total_code_cells - 1,
cell.id,
))

Copy link
Collaborator

@mcanouil mcanouil Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above suggested code produces the following output:

❯ quarto preview index.qmd --no-browser --no-watch-inputs

Starting python3 kernel...Done

Executing 'index.ipynb'
  Cell 1/2: test ... Done
  Cell 2/2: eb6da12e ... Done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that cells with automatically generated IDs shouldn't show the label. I also think that we should measure for the max width of a cell label and pad the other output so all progress messages have the same width (i.e. the Done lines up across lines).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the suggestion to standardise the messages.

A 80 width string filled with spaces and "Done" at the end would do?

Also other messages would need to be "width standardised".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to measure the max width of labels at the outset (and therefore get perfect alignment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we ought to not emit a1886270 and other automatically-generated IDs. If at that point in the code it's too late to know which ID is automatically generated and which isn't, then we need to find a way to carry this information into the right part of the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for bringing this to my attention! Yes, I agree with the suggestion that perhaps there needs to be a flag for whether the ID is automatically generated or not. The reason is, according to the documentation:

define: cell_id is the id in the output.

if "#| id":
   cell_id = cell.id
elif "#| label":
   cell_id = cell.id
else:
   cell_id = <auto-generated-id>

Since cell.id is automatically generated when neither "#| id" nor "#| label" exists, the above logic is equivalent to:

if is_auto_generated(cell.id) == False:
   cell_id = cell.id
else:
   cell_id = <auto-generated-id>

Therefore, once we have a flag to know if the id is auto-generated, we can set cell_id to "":

if is_auto_generated(cell.id) == False:
   cell_id = cell.id
else:
   cell_id = ""

We may need to modify this part of the code as well as some code that has called this function:
quarto-dev/quarto-cli@58a9462

Could you kindly provide some guidance on how the relevant code should be modified? Thank you!

ps. please ignore the latest commit (3cea7dc) in this branch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may actually want to marry this with some work that @dragonstyle has been doing on notebook embedding. There, we want people to be able to embed notebook cells by a unique id. We have an idiom there where we first take the label, then use the first cell tag (if any). Perhaps we need a few field that represents this embeddable identifier?

Copy link
Contributor Author

@yuninxia yuninxia Jun 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! While I was examining the code related to notebook embedding, I found the nb_cell_yaml_options() function. This led me to submit a commit (cc1c803) that might resolve the issue [FR] Replace "Cell 1/N" by label (if provided) when using Jupyter #5752. Furthermore, I've noticed a potential problem with the nb_cell_yaml_options() function. Here's the summary:

Summary of the Latest commit

This commit introduces a feature where the render status will now display the label when a user has specified "#| label" for a "code" cell. When running the "quarto render" command, the output will match the following example:

  Cell 1/3: 'fig-polar-1'......Done
  Cell 2/3: 'fig-polar-1024'...Done
  Cell 3/3: ''.................Done

Notably, even if a user writes "#| id" in the "code" cell, the id will not be visible in the render status.

This is because the return list from nb_cell_yaml_options() just includes cell attributes like label, classes, fig-cap, fig-subcap, and so on, as specified in cell-attributes.yml and yaml-intelligence-resources.json, and does not include cell attributes like "#| id" "#| tags" and "#| export". I'm uncertain if this is an issue that needs addressing or if there's a different approach to access these attributes. Any advice on this would be greatly appreciated!

Additional Note

I observed that in a recent PR (Support export: in jupyter cells #4958), we added support for export, but this attribute is currently absent from the official documentation Code Cells: Jupyter. I believe an update to the documentation to include this attribute might be beneficial.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then export syntax is a pre-processing directive used by nbdev so indeed not part of our schema or feature set. It (and the others ) can indeed be safely ignored here!

@yuninxia yuninxia marked this pull request as draft June 1, 2023 17:33
@yuninxia yuninxia force-pushed the bugfix/issue-5752 branch from 072be84 to 7147814 Compare June 1, 2023 20:19
@ghost ghost assigned cscheid Jun 2, 2023
@yuninxia yuninxia marked this pull request as ready for review June 3, 2023 15:12
@jjallaire
Copy link
Collaborator

Thanks for all of the iteration here! This is looking great save for one thing. We end up doing this twice for each cell (once to compute the max and once to present the label for progress):

nb_cell_yaml_options(client.nb.metadata.kernelspec.language, cell)

I wonder if we could just map the cells into a cell_labels array of strings at the top, compute the max, and then just index into that array during the cell-by-cell progress?

@yuninxia yuninxia force-pushed the bugfix/issue-5752 branch from cc1c803 to f7d7bdb Compare June 5, 2023 00:24
@yuninxia
Copy link
Contributor Author

yuninxia commented Jun 5, 2023

Thank you for your advice!

Summary of the Latest Commit

It now avoids calling nb_cell_yaml_options() twice. Here's the modified code:

# compute total code cells (for progress)
current_code_cell = 1
total_code_cells = sum(cell.cell_type == 'code' for cell in client.nb.cells)

# map cells to their labels
cell_labels = [nb_cell_yaml_options(client.nb.metadata.kernelspec.language, cell).get('label', '') for cell in client.nb.cells]

# find max label length (for progress)
max_label_len = max(len(label) for label in cell_labels)

Further optimization is possible here by iterating client.nb.cells just once to boost efficiency, especially with large data (like using Quarto for book writing):

current_code_cell = 1
total_code_cells = 0
cell_labels = []
max_label_len = 0

for cell in client.nb.cells:
   # compute total code cells (for progress)
   if cell.cell_type == 'code':
      total_code_cells += 1
   # map cells to their labels
   label = nb_cell_yaml_options(client.nb.metadata.kernelspec.language, cell).get('label', '')
   cell_labels.append(label)
   # find max label length
   max_label_len = max(max_label_len, len(label))

Future Work: The output information of R is different from Python and Julia. Can we change the output information of R to the same format as Python and Julia?

The current test results using Python and Julia are consistent, as shown below:

  Cell 1/3: 'fig-polar-1'......Done
  Cell 2/3: 'fig-polar-1024'...Done
  Cell 3/3: ''.................Done

When using R with quarto render hello-r.qmd --to html, the output format is as follows:

processing file: hello-r.qmd
1/4                   
2/4 [fig-airquality]  
3/4                   
4/4 [fig-airquality-2]
output file: hello-r.knit.md

Concerns with R's output format:

  1. It differs from Python/Julia.
  2. Each cell run leads to two counts. For example, with only one code cell, the output is redundant. There should be 1/1 [fig-airquality].
processing file: hello-r.qmd
1/2                   
2/2 [fig-airquality]  
output file: hello-r.knit.md
  1. The hello-r.knit.md file isn't found locally, yet the render status still displays output file: hello-r.knit.md.

@jjallaire
Copy link
Collaborator

The R output format is actually coming from a totally different stack (knitr) so we don't control it directly. In more recent versions of knitr it is actually a progress bar so even more different. I'm not worried about this divergence.

@jjallaire
Copy link
Collaborator

Thanks again so much for this (and for all of the iteration). Merging now!

@jjallaire jjallaire merged commit 84d4659 into quarto-dev:main Jun 5, 2023
@yuninxia
Copy link
Contributor Author

yuninxia commented Jun 5, 2023

Thank you for being so supportive!

@bryanwhiting
Copy link

Thanks for building this!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR] Replace "Cell 1/N" by label (if provided) when using Jupyter
5 participants