Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Many users, especially those who want to try out DataFusion for the first time, will use notebooks, either Jupyter, Databricks, or others. It would be a nice feature to have dataframes shown in these notebooks rendered using html like some other dataframe libraries.
Describe the solution you'd like
In order to do this, we need to implement _repr_html_
on the PyDataFrame
object. This can operate in the same manner as show()
and limit the output to a few lines. Additional enhancements could include setting config parameters for how much data to show.
Describe alternatives you've considered
The other alternative is to continue to use show()
to inspect the data. Users can output the dataframe to pandas and then use it's rendering capability.
Additional context
Here is a minimal demonstrable version we could start with in PyDataFrame
fn _repr_html_(&self, py: Python) -> PyResult<String> {
let mut html_str = "<table border='1'>\n".to_string();
let df = self.df.as_ref().clone().limit(0, Some(10))?;
let batches = wait_for_future(py, df.collect())?;
if batches.is_empty() {
html_str.push_str("</table>\n");
return Ok(html_str);
}
let schema = batches[0].schema();
let mut header = Vec::new();
for field in schema.fields() {
header.push(format!("<th>{}</td>", field.name()));
}
let header_str = header.join("");
html_str.push_str(&format!("<tr>{}</tr>\n", header_str));
for batch in batches {
let formatters = batch
.columns()
.iter()
.map(|c| ArrayFormatter::try_new(c.as_ref(), &FormatOptions::default()))
.map(|c| c.map_err(|e| PyValueError::new_err(format!("Error: {:?}", e.to_string()))))
.collect::<Result<Vec<_>, _>>()?;
for row in 0..batch.num_rows() {
let mut cells = Vec::new();
for formatter in &formatters {
cells.push(format!("<td>{}</td>", formatter.value(row)));
}
let row_str = cells.join("");
html_str.push_str(&format!("<tr>{}</tr>\n", row_str));
}
}
html_str.push_str("</table>\n");
Ok(html_str)
}