Skip to content

Render tables using html in notebooks. #713

Closed
@timsaucer

Description

@timsaucer

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Many users, especially those who want to try out DataFusion for the first time, will use notebooks, either Jupyter, Databricks, or others. It would be a nice feature to have dataframes shown in these notebooks rendered using html like some other dataframe libraries.

Describe the solution you'd like

In order to do this, we need to implement _repr_html_ on the PyDataFrame object. This can operate in the same manner as show() and limit the output to a few lines. Additional enhancements could include setting config parameters for how much data to show.

Describe alternatives you've considered

The other alternative is to continue to use show() to inspect the data. Users can output the dataframe to pandas and then use it's rendering capability.

Additional context

Here is a minimal demonstrable version we could start with in PyDataFrame

    fn _repr_html_(&self, py: Python) -> PyResult<String> {
        let mut html_str = "<table border='1'>\n".to_string();


        let df = self.df.as_ref().clone().limit(0, Some(10))?;
        let batches = wait_for_future(py, df.collect())?;

        if batches.is_empty() {
            html_str.push_str("</table>\n");
            return Ok(html_str);
        }

        let schema = batches[0].schema();

        let mut header = Vec::new();
        for field in schema.fields() {
            header.push(format!("<th>{}</td>", field.name()));
        }
        let header_str = header.join("");
        html_str.push_str(&format!("<tr>{}</tr>\n", header_str));

        for batch in batches {
            let formatters = batch
                .columns()
                .iter()
                .map(|c| ArrayFormatter::try_new(c.as_ref(), &FormatOptions::default()))
                .map(|c| c.map_err(|e| PyValueError::new_err(format!("Error: {:?}", e.to_string()))))
                .collect::<Result<Vec<_>, _>>()?;

            for row in 0..batch.num_rows() {
                let mut cells = Vec::new();
                for formatter in &formatters {
                    cells.push(format!("<td>{}</td>", formatter.value(row)));
                }
                let row_str = cells.join("");
                html_str.push_str(&format!("<tr>{}</tr>\n", row_str));
            }
        }

        html_str.push_str("</table>\n");

        Ok(html_str)
    }

This produces the following example:
Screenshot 2024-05-22 at 3 02 07 PM

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions