Skip to content

rustdoc: Fix generics generation in search index #88268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 30, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/librustdoc/formats/item_type.rs
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ crate enum ItemType {
ProcAttribute = 23,
ProcDerive = 24,
TraitAlias = 25,
Generic = 26,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are generic parameters treated as items?

Copy link
Member Author

@GuillaumeGomez GuillaumeGomez Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pure generic ones, like fn foo<T: Display + Debug>(a: T), in here, a cannot be both "Display" and "Debug", it needs to go through a level:

arg: [
  {
    name: "",
    generics: [
      {
        name: "Display"
        generics: []
      },
      {
        name: "Debug"
        generics: []
      }
    ]
  }
]

So in here, the argument with name "" is a "pure" generic, hence creating a new ItemType for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But value parameters like a: T don't have item types (at least from looking at the ItemType enum definition), so how are they represented?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just with an empty name. If they are still present, might be interesting to remove them since they're useless for the search index.

}

impl Serialize for ItemType {
Expand Down Expand Up @@ -173,6 +174,7 @@ impl ItemType {
ItemType::ProcAttribute => "attr",
ItemType::ProcDerive => "derive",
ItemType::TraitAlias => "traitalias",
ItemType::Generic => "generic",
}
}
}
Expand Down
170 changes: 95 additions & 75 deletions src/librustdoc/html/render/cache.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use std::collections::hash_map::Entry;
use std::collections::BTreeMap;

use rustc_data_structures::fx::{FxHashMap, FxHashSet};
use rustc_data_structures::fx::FxHashMap;
use rustc_middle::ty::TyCtxt;
use rustc_span::symbol::Symbol;
use serde::ser::{Serialize, SerializeStruct, Serializer};
Expand Down Expand Up @@ -192,32 +192,24 @@ crate fn get_index_search_type<'tcx>(
item: &clean::Item,
tcx: TyCtxt<'tcx>,
) -> Option<IndexItemFunctionType> {
let (all_types, ret_types) = match *item.kind {
let (mut inputs, mut output) = match *item.kind {
clean::FunctionItem(ref f) => get_all_types(&f.generics, &f.decl, tcx),
clean::MethodItem(ref m, _) => get_all_types(&m.generics, &m.decl, tcx),
clean::TyMethodItem(ref m) => get_all_types(&m.generics, &m.decl, tcx),
_ => return None,
};

let inputs = all_types
.iter()
.map(|(ty, kind)| TypeWithKind::from((get_index_type(ty), *kind)))
.filter(|a| a.ty.name.is_some())
.collect();
let output = ret_types
.iter()
.map(|(ty, kind)| TypeWithKind::from((get_index_type(ty), *kind)))
.filter(|a| a.ty.name.is_some())
.collect::<Vec<_>>();
inputs.retain(|a| a.ty.name.is_some());
output.retain(|a| a.ty.name.is_some());
let output = if output.is_empty() { None } else { Some(output) };

Some(IndexItemFunctionType { inputs, output })
}

fn get_index_type(clean_type: &clean::Type) -> RenderType {
fn get_index_type(clean_type: &clean::Type, generics: Vec<TypeWithKind>) -> RenderType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove get_generics and change this function to take the generics as an argument? I don't see any changes that would make that necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because otherwise you need to set it into the returned type. With this, it cannot be forgotten.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean. I'd like this change to be as small and simple as possible since I haven't interacted with this part of rustdoc before. I'd really appreciate it if you could revert all changes in this PR other than the part that inlines bounds better, because it's really hard for me to understand what the changes are doing when they are all mixed together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the changes are about that, including this one...

RenderType {
name: get_index_type_name(clean_type, true).map(|s| s.as_str().to_ascii_lowercase()),
generics: get_generics(clean_type),
generics: if generics.is_empty() { None } else { Some(generics) },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why store an option for this? The caller can check if the Vec is empty, this means it has to check for both None and empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if it's not an option, it'll generate [], which is unneeded. Before to not generate anything then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment to make it clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused though - can't you just do this same check wherever it would generate an empty Vec? Why does it have to be specifically here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the other places... In this case, it's contained inside a vec, so if the vec length is 1 or 2, it's not really a problem. For the other places, it might be different.

}
}

Expand Down Expand Up @@ -246,23 +238,6 @@ fn get_index_type_name(clean_type: &clean::Type, accept_generic: bool) -> Option
}
}

/// Return a list of generic parameters for use in the search index.
///
/// This function replaces bounds with types, so that `T where T: Debug` just becomes `Debug`.
/// It does return duplicates, and that's intentional, since search queries like `Result<usize, usize>`
/// are supposed to match only results where both parameters are `usize`.
fn get_generics(clean_type: &clean::Type) -> Option<Vec<String>> {
clean_type.generics().and_then(|types| {
let r = types
.iter()
.filter_map(|t| {
get_index_type_name(t, false).map(|name| name.as_str().to_ascii_lowercase())
})
.collect::<Vec<_>>();
if r.is_empty() { None } else { Some(r) }
})
}

/// The point of this function is to replace bounds with types.
///
/// i.e. `[T, U]` when you have the following bounds: `T: Display, U: Option<T>` will return
Expand All @@ -272,27 +247,77 @@ crate fn get_real_types<'tcx>(
generics: &Generics,
arg: &Type,
tcx: TyCtxt<'tcx>,
recurse: i32,
res: &mut FxHashSet<(Type, ItemType)>,
) -> usize {
fn insert(res: &mut FxHashSet<(Type, ItemType)>, tcx: TyCtxt<'_>, ty: Type) -> usize {
if let Some(kind) = ty.def_id_no_primitives().map(|did| tcx.def_kind(did).into()) {
res.insert((ty, kind));
1
recurse: usize,
res: &mut Vec<TypeWithKind>,
) {
fn insert_ty(
res: &mut Vec<TypeWithKind>,
tcx: TyCtxt<'_>,
ty: Type,
mut generics: Vec<TypeWithKind>,
) {
let is_full_generic = ty.is_full_generic();

if is_full_generic && generics.len() == 1 {
// In this case, no need to go through an intermediate state if the generics
// contains only one element.
//
// For example:
//
// fn foo<T: Display>(r: Option<T>) {}
//
// In this case, it would contain:
//
// ```
// [{
// name: "option",
// generics: [{
// name: "",
// generics: [
// name: "Display",
// generics: []
// }]
// }]
// }]
// ```
//
// After removing the intermediate (unnecessary) full generic, it'll become:
//
// ```
// [{
// name: "option",
// generics: [{
// name: "Display",
// generics: []
// }]
// }]
// ```
//
// To be noted that it can work if there is ONLY ONE generic, otherwise we still
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it only work if there's a single generic argument? It seems weird to have an outer object that just has name: ""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's what I tried to explain but I failed I guess haha.

So basically, arguments can only have a type and generics. What happens in case the argument is a full generic which is Display + Debug + Write? Then we can't say it's one type (from a search perspective) but a generic type (with empty name) having three generics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just discussed this with Guillaume, and I think I can explain it now.

For purposes of search-indexing, fn foo<T: Display>(x: T) is treated as the type fn(x: Display). This makes a search like trait:Display -> show foo. (It turns out you can filter by item type in rustdoc searches, which is part of what ItemType is for.)

However, fn foo<T: Display + Clone>(x: T)'s argument cannot be reduced into an ItemType::Trait because there are two traits involved, the Display trait and the Clone trait. So instead it's represented as a ItemType::Generic, which means there was more than one bound involved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. :)

// need to keep it as is!
res.push(generics.pop().unwrap());
return;
}
let mut index_ty = get_index_type(&ty, generics);
if index_ty.name.as_ref().map(|s| s.is_empty()).unwrap_or(true) {
return;
}
if is_full_generic {
// We remove the name of the full generic because we have no use for it.
index_ty.name = Some(String::new());
res.push(TypeWithKind::from((index_ty, ItemType::Generic)));
} else if let Some(kind) = ty.def_id_no_primitives().map(|did| tcx.def_kind(did).into()) {
res.push(TypeWithKind::from((index_ty, kind)));
} else if ty.is_primitive() {
// This is a primitive, let's store it as such.
res.insert((ty, ItemType::Primitive));
1
} else {
0
res.push(TypeWithKind::from((index_ty, ItemType::Primitive)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this bit? How do you know it's a primitive? Can you add an assertion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a ty.is_primitive() check just like before. Did I miss something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I misread the code. Why did you remove the else branch? What did 0 represent?

Copy link
Member Author

@GuillaumeGomez GuillaumeGomez Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, it returned the number of items inserted to know if something needed to be added into the HashSet. Now we add all arguments (as long as they have a name).

So basically, the else branch did nothing and returned value confirming it did nothing.

}
}

if recurse >= 10 {
// FIXME: remove this whole recurse thing when the recursion bug is fixed
return 0;
return;
}
let mut nb_added = 0;

if let Type::Generic(arg_s) = *arg {
if let Some(where_pred) = generics.where_predicates.iter().find(|g| match g {
Expand All @@ -301,6 +326,7 @@ crate fn get_real_types<'tcx>(
}
_ => false,
}) {
let mut ty_generics = Vec::new();
let bounds = where_pred.get_bounds().unwrap_or_else(|| &[]);
for bound in bounds.iter() {
if let GenericBound::TraitBound(poly_trait, _) = bound {
Expand All @@ -309,41 +335,32 @@ crate fn get_real_types<'tcx>(
continue;
}
if let Some(ty) = x.get_type() {
let adds = get_real_types(generics, &ty, tcx, recurse + 1, res);
nb_added += adds;
if adds == 0 && !ty.is_full_generic() {
nb_added += insert(res, tcx, ty);
}
get_real_types(generics, &ty, tcx, recurse + 1, &mut ty_generics);
}
}
}
}
insert_ty(res, tcx, arg.clone(), ty_generics);
}
if let Some(bound) = generics.params.iter().find(|g| g.is_type() && g.name == arg_s) {
let mut ty_generics = Vec::new();
for bound in bound.get_bounds().unwrap_or(&[]) {
if let Some(path) = bound.get_trait_path() {
let ty = Type::ResolvedPath { did: path.def_id(), path };
let adds = get_real_types(generics, &ty, tcx, recurse + 1, res);
nb_added += adds;
if adds == 0 && !ty.is_full_generic() {
nb_added += insert(res, tcx, ty);
}
get_real_types(generics, &ty, tcx, recurse + 1, &mut ty_generics);
}
}
insert_ty(res, tcx, arg.clone(), ty_generics);
}
} else {
nb_added += insert(res, tcx, arg.clone());
if let Some(gens) = arg.generics() {
for gen in gens.iter() {
if gen.is_full_generic() {
nb_added += get_real_types(generics, gen, tcx, recurse + 1, res);
} else {
nb_added += insert(res, tcx, (*gen).clone());
}
let mut ty_generics = Vec::new();
if let Some(arg_generics) = arg.generics() {
for gen in arg_generics.iter() {
get_real_types(generics, gen, tcx, recurse + 1, &mut ty_generics);
}
}
insert_ty(res, tcx, arg.clone(), ty_generics);
}
nb_added
}

/// Return the full list of types when bounds have been resolved.
Expand All @@ -354,38 +371,41 @@ crate fn get_all_types<'tcx>(
generics: &Generics,
decl: &FnDecl,
tcx: TyCtxt<'tcx>,
) -> (Vec<(Type, ItemType)>, Vec<(Type, ItemType)>) {
let mut all_types = FxHashSet::default();
) -> (Vec<TypeWithKind>, Vec<TypeWithKind>) {
let mut all_types = Vec::new();
for arg in decl.inputs.values.iter() {
if arg.type_.is_self_type() {
continue;
}
let mut args = FxHashSet::default();
// FIXME: performance wise, it'd be much better to move `args` declaration outside of the
// loop and replace this line with `args.clear()`.
let mut args = Vec::new();
get_real_types(generics, &arg.type_, tcx, 0, &mut args);
if !args.is_empty() {
// FIXME: once back to performance improvements, replace this line with:
// `all_types.extend(args.drain(..));`.
all_types.extend(args);
} else {
if let Some(kind) = arg.type_.def_id_no_primitives().map(|did| tcx.def_kind(did).into())
{
all_types.insert((arg.type_.clone(), kind));
all_types.push(TypeWithKind::from((get_index_type(&arg.type_, vec![]), kind)));
}
}
}

let ret_types = match decl.output {
let mut ret_types = Vec::new();
match decl.output {
FnRetTy::Return(ref return_type) => {
let mut ret = FxHashSet::default();
get_real_types(generics, return_type, tcx, 0, &mut ret);
if ret.is_empty() {
get_real_types(generics, return_type, tcx, 0, &mut ret_types);
if ret_types.is_empty() {
if let Some(kind) =
return_type.def_id_no_primitives().map(|did| tcx.def_kind(did).into())
{
ret.insert((return_type.clone(), kind));
ret_types.push(TypeWithKind::from((get_index_type(return_type, vec![]), kind)));
}
}
ret.into_iter().collect()
}
_ => Vec::new(),
_ => {}
};
(all_types.into_iter().collect(), ret_types)
(all_types, ret_types)
Comment on lines -390 to +410
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the reason you changed the FxHashSets to Vecs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If by "this" you mean the fact that a same type can be present multiple times into generics, then yes. ;)

}
3 changes: 2 additions & 1 deletion src/librustdoc/html/render/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ crate struct IndexItem {
#[derive(Debug)]
crate struct RenderType {
name: Option<String>,
generics: Option<Vec<String>>,
generics: Option<Vec<TypeWithKind>>,
}

/// Full type of functions/methods in the search index.
Expand Down Expand Up @@ -2387,6 +2387,7 @@ fn item_ty_to_strs(ty: ItemType) -> (&'static str, &'static str) {
ItemType::ProcAttribute => ("attributes", "Attribute Macros"),
ItemType::ProcDerive => ("derives", "Derive Macros"),
ItemType::TraitAlias => ("trait-aliases", "Trait aliases"),
ItemType::Generic => unreachable!(),
}
}

Expand Down
37 changes: 24 additions & 13 deletions src/librustdoc/html/static/js/search.js
Original file line number Diff line number Diff line change
Expand Up @@ -299,10 +299,10 @@ window.initSearch = function(rawSearchIndex) {
var elems = Object.create(null);
var elength = obj[GENERICS_DATA].length;
for (var x = 0; x < elength; ++x) {
if (!elems[obj[GENERICS_DATA][x]]) {
elems[obj[GENERICS_DATA][x]] = 0;
if (!elems[obj[GENERICS_DATA][x][NAME]]) {
elems[obj[GENERICS_DATA][x][NAME]] = 0;
}
elems[obj[GENERICS_DATA][x]] += 1;
elems[obj[GENERICS_DATA][x][NAME]] += 1;
}
var total = 0;
var done = 0;
Expand Down Expand Up @@ -345,6 +345,7 @@ window.initSearch = function(rawSearchIndex) {
// Check for type name and type generics (if any).
function checkType(obj, val, literalSearch) {
var lev_distance = MAX_LEV_DISTANCE + 1;
var tmp_lev = MAX_LEV_DISTANCE + 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this tmp_lev change? Is it really necessary for the generics fix? IIUC, the [NAME] additions below are necessary, but I don't see why the tmp_lev addition is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I'll revert this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, nevermind! So originally it was declared just below (here), but since it's now being used here as well, I needed to have the declaration at the top of the function.

Copy link
Member

@camelid camelid Sep 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand why the other changes were made either ;)

Like with #88268 (comment), can you please have someone who's more knowledgeable about rustdoc's JS review this part?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll ask @Nemo157.

var len, x, firstGeneric;
if (obj[NAME] === val.name) {
if (literalSearch) {
Expand All @@ -354,10 +355,10 @@ window.initSearch = function(rawSearchIndex) {
var elems = Object.create(null);
len = obj[GENERICS_DATA].length;
for (x = 0; x < len; ++x) {
if (!elems[obj[GENERICS_DATA][x]]) {
elems[obj[GENERICS_DATA][x]] = 0;
if (!elems[obj[GENERICS_DATA][x][NAME]]) {
elems[obj[GENERICS_DATA][x][NAME]] = 0;
}
elems[obj[GENERICS_DATA][x]] += 1;
elems[obj[GENERICS_DATA][x][NAME]] += 1;
}

var allFound = true;
Expand All @@ -382,7 +383,7 @@ window.initSearch = function(rawSearchIndex) {
// If the type has generics but don't match, then it won't return at this point.
// Otherwise, `checkGenerics` will return 0 and it'll return.
if (obj.length > GENERICS_DATA && obj[GENERICS_DATA].length !== 0) {
var tmp_lev = checkGenerics(obj, val);
tmp_lev = checkGenerics(obj, val);
if (tmp_lev <= MAX_LEV_DISTANCE) {
return tmp_lev;
}
Expand All @@ -392,8 +393,8 @@ window.initSearch = function(rawSearchIndex) {
if ((!val.generics || val.generics.length === 0) &&
obj.length > GENERICS_DATA && obj[GENERICS_DATA].length > 0) {
return obj[GENERICS_DATA].some(
function(name) {
return name === val.name;
function(gen) {
return gen[NAME] === val.name;
});
}
return false;
Expand All @@ -404,17 +405,27 @@ window.initSearch = function(rawSearchIndex) {
// a levenshtein distance value that isn't *this* good so it goes
// into the search results but not too high.
lev_distance = Math.ceil((checkGenerics(obj, val) + lev_distance) / 2);
} else if (obj.length > GENERICS_DATA && obj[GENERICS_DATA].length > 0) {
}
if (obj.length > GENERICS_DATA && obj[GENERICS_DATA].length > 0) {
Comment on lines -407 to +409
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Why is this necessary? I'm not comfortable reviewing changes to the JS search code other than adding [NAME] as necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because before, if we found that the item had a "close enough" name to what we're looking for, it then ran checks on generics as well, otherwise it was going to look into the generics.

Now, in both cases we go look into generics and then we pick the best match out of the two.

And yes, sorry for the JS changes... If that can make you feel better, you can always confirm it's working as expected by running the JS search tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I interpret your description is that it's not necessary to bundle the JS change with the html/formats/cache.rs change; it's a separate bugfix. If so, can you please move this change to a separate PR and assign someone who's more knowledgeable about search.js on it? Or ask them to review the JS part of this PR?

And yes, sorry for the JS changes... If that can make you feel better, you can always confirm it's working as expected by running the JS search tests.

Rustdoc doesn't currently have very exhaustive test coverage, so I'd rather have someone more knowledgeable look over the JS change.

// We can check if the type we're looking for is inside the generics!
var olength = obj[GENERICS_DATA].length;
for (x = 0; x < olength; ++x) {
lev_distance = Math.min(levenshtein(obj[GENERICS_DATA][x], val.name),
lev_distance);
tmp_lev = Math.min(levenshtein(obj[GENERICS_DATA][x][NAME], val.name), tmp_lev);
}
if (tmp_lev !== 0) {
// If we didn't find a good enough result, we go check inside the generics of
// the generics.
for (x = 0; x < olength && tmp_lev !== 0; ++x) {
tmp_lev = Math.min(
checkType(obj[GENERICS_DATA][x], val, literalSearch),
tmp_lev
);
}
}
}
// Now whatever happens, the returned distance is "less good" so we should mark it
// as such, and so we add 1 to the distance to make it "less good".
return lev_distance + 1;
return Math.min(lev_distance, tmp_lev) + 1;
}

function findArg(obj, val, literalSearch, typeFilter) {
Expand Down
Loading