Skip to content

make UnmodifiableSetView compile time unmodifiable #853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stephane-archer opened this issue Jan 21, 2025 · 7 comments
Closed

make UnmodifiableSetView compile time unmodifiable #853

stephane-archer opened this issue Jan 21, 2025 · 7 comments
Labels
closed-not-planned Closed as we don't intend to take action on the reported issue package:collection

Comments

@stephane-archer
Copy link

UnmodifiableSetView only offers immutability by throwing exceptions on inappropriate function calls.
These checks are done at runtime, and my code can call add all day long; nothing will complain until I run the program.
UnmodifiableSetView or any alternative should remove these functions from the interface and provide compile time checks. my code should not be allowed to call UnmodifiableSetView.add because it should not exits in the first place.

mixin _UnmodifiableSetMixin<E> implements Set<E> {
  static Never _throwUnmodifiable() {
    throw UnsupportedError("Cannot change an unmodifiable set");
  }

  /// This operation is not supported by an unmodifiable set.
  bool add(E value) => _throwUnmodifiable();

  /// This operation is not supported by an unmodifiable set.
  void clear() => _throwUnmodifiable();

  /// This operation is not supported by an unmodifiable set.
  void addAll(Iterable<E> elements) => _throwUnmodifiable();

  /// This operation is not supported by an unmodifiable set.
  void removeAll(Iterable<Object?> elements) => _throwUnmodifiable();

  /// This operation is not supported by an unmodifiable set.
  void retainAll(Iterable<Object?> elements) => _throwUnmodifiable();

  /// This operation is not supported by an unmodifiable set.
  void removeWhere(bool test(E element)) => _throwUnmodifiable();

  /// This operation is not supported by an unmodifiable set.
  void retainWhere(bool test(E element)) => _throwUnmodifiable();

  /// This operation is not supported by an unmodifiable set.
  bool remove(Object? value) => _throwUnmodifiable();
}
@jakemac53
Copy link
Contributor

jakemac53 commented Jan 21, 2025

This is an intentional design choice, granted an unfortunate one for the reasons you describe, but it is necessary in order to implement the Set interface. The best this class can do is throw at runtime.

A different choice could have been made to only implement the Iterable interface (and not Set) - but if you want that you don't need a wrapper class at all, you can just assign the set directly to something which only implements Iterable. People could still cast it back to a Set (which would be modifiable) but that would have to be explicit.

With the knowledge of those two options, I would personally suggest the following, based on your use case:

  • Use the unmodifiable set view only if you need to pass the object to something which expects a Set, but want it to be unmodifiable. This is the best you can do, they asked for a Set, and Sets are modifiable (statically).
  • Otherwise, if this is for your own internal use just assign the set to a variable/field whose type is Iterable, and don't cast it back to a Set. This will give you a read only API with static guarantees.

@stephane-archer
Copy link
Author

This is an intentional design choice, granted an unfortunate one for the reasons you describe

Can we improve this?

@jakemac53 What is your opinion on defining a USet that has only the relevant function from the Set interface? I can't see any relevant use case for the current UnmodifiableSetView

@lrhn
Copy link
Member

lrhn commented Jan 22, 2025

The use-case for UnmodifiableSetView is to pass a Set that you own to other code, and being sure they won't mutate the set. The docs say:

An UnmodifiableSetView contains a Set,
and prevents that set from being changed through the view.

If you know and trust that the code you pass your set to won't mutate that set, you can just pass the set itself. If you don't trust that, and want to be safe, you can wrap it as an UnmodifiableSetView which throws if they try to mutate your set, by accident or by design.

Can we improve this?

Not without significant changes to ... almost everything.

A set with no mutating operations is not the Set interface. It should be a superclass of the interface that adds mutating members (or it could be unrelated, like the BuiltSet/SetBuilder distinction, but that'll be even harder to migrate to than just having a superclass/subclass relation, and a much bigger change in direction).

To add a non-mutable superclass of Set we can either:

  • remove mutating members from Set and introduce a MutableSet/SetBuilder/SetBuffer/whatnot subclass.
  • introduce a read-only superclass of Set, say Collection, which only has non-mutating members of Set, which is the Iterable members + lookup + union/intersection/difference.

(They're the same type hierarchy, the only distinction is what existing code using the word Set would refer to. And naming things, which is hard.)

In either case, we'd have to say which type a set literal creates. A const literal should be a non-mutable set. The other literal should probably stay mutable. We may want to introduce something like final {v1, v2} as syntax for a non-mutable set, which may need to be disambiguated with pattern declaration syntax (maybe not for sets, but if we do this for Set, we'll have do it for List and Map too).

Removing members from Set would break all code in the world that builds Sets. It would require either migrating all that code say SetBuilder instead, everywhere except the arguments to union/intersection/difference, or doing a clever migration that introduces SetBuilder only for code that actually mutates the set. That'd take some nice inter-method data flow analysis to automate.
In either case it's a big migration. If we take the easy migration of SetSetBuilder everywhere , it has a very low payback short-term when all code keeps doing the same thing, and there'll be a second manual conversion to actually start using Set. That'll likely be hellishly difficult to do (sets might not appear much in APIs, but lists and maps do, a lot). It'll probably be easier than Null-Safety migration.

Adding a non-mutable superclass would not break existing code.
It would require giving types to union/intersection/difference, which currently take a Set<E> (for union) or Set<Object?> (for intersection/difference) as argument and returns a Set<E>.
We could change Collection.union etc. to return a Collection<E>, with Set's overrides returning a Set<E>.
We can't change the parameter type to Collection<E> because then all existing Set implementations taking Set<E> as argument become invalid overrides. But then you can't pass a Collection, which is what you'd want to do,
so we will have to migrate all existing Set implementations to take Collection as argument.
(We could make the parameter covariant, but that's just a type error waiting to happen, which is what this entire exercise is about avoiding. Maybe we can make it temporarily deprecated-covariant, warning all code that uses that covariance. It'll be a bespoke analyzer hack just for this, unless we introduce @Deprecated.covariant() as an annotation.)
And then we need to do the same 'hellishly difficult" conversion as above to start using the Collection class. (And the Sequence/Vector superclass of List and ... Dictionary? Table? superclass of Map.)

If that's what we wanted, I wouldn’t start from here.

There is really no chance of introducing unmodifiable superclasses of collections today and getting them into common use any time soon.

If the unmodifiable class is a superclass, then you can't use types to ensure that you don't get a mutable set.
The mutable Set is-a Collection. You can only use an argument type of Collection<T> to say that you don't
plan to mutate the set that you get ... but you can downcast to Set if it is one, so there is no strong guarantee.
If you ask for a mutable Set, you'll have to be given a mutable set. If all the caller has is an unmodifiable Collection, they'll have to create the mutable set for you.

We could introduce such unmodifiable super-classes, not force any use, and see if people would start using them.

That would fail right out of the door because people won't be able to use an instance of the superclass for much unless the platform libraries will accept it as an argument.
So the argument type to Set.union should be the unmodifiable supertype, otherwise you can't use it for anything anyway. As mentioned, that breaks all existing Set implementations which only accept Set. (Or they'll become covariant.)

The dart:convert classes should start accepting unmodifiable Vector<int>s as input. That's again breaking to existing implementations.

Should there be unmodifiable typed-data lists? Uint8Vector as as superclass of Uint8List?

There should be toCollection and toVector (unmodifiable list) methods on Iterable. We can't add those without breaking every Iterable implementation. (Not without first introducing interface default methods, or a similar language feature). They'll at most be extension methods, which isn't great, or you'll have to use Collection.from(...).
(Or we should have literals for the unmodifiable collections, so you can do final [...someIterable] or similar.)

Etc.

So, not easy.

Also not a big benefit to the general user. If you keep using the mutable version everywhere, your code will work.
If we have both Collection and Set (and similar for lists and maps) as types, you should consider which type to use in your API. Every time you would return a mutable List, you should consider whether you should return an unmodifiable Vector instead. If you do, the receiver will have to do .toList() first if they want to mutate it. (If you have a List, because you built it inside the function, and you are returning a Vector, you can either return the List typed as Vector, or you can convert it to a Vector first. (That's not trivial to do efficiently on all platforms with a consisten
API, so you might do an extra linear copy.)
I would, as a general principle, recommend converting to Vector, because otherwise your caller might start
casting the value back to List because "it's always a List anyway", which can lock you in to not returning a Vector later. Definesive API implementation means never (consistently) returning more than you promise, because then someone might start depending on it.

Now we have extra complexity in API design, extra complexity in API implementation, the possibility of two APIs not being very compatible because one uses Set and the other Collection, and you can't pass a Collection into a function expecting a Set.

I do understand the drive to make every property of an object into a property of the type, so that static type checking can prevent runtime errors,
It's a trade-off, though, because it moves complexity into the type hierarchy, rather than keeping it in code, or conventions. Bigger type hierarchy makes APIs more complicated, to the point where you start having two versions of the same function, one Collection foo(Collection ...) and one Set fooSet(Set ...), so that you can be compatible with both variants, like C++ const and non-const functions (or you can use generics to do C foo<T extends Collection>(C ...) if the code really is the same).

The tradeoff that Dart has chosen, and it was done very early in the language and platform library design when the language was much more dynamic and type-unsound, was that using one (potentially-)mutable type for each of List, Set and Map, was the best for the most users. Individual instances could prohibit mutation (or for lists, growing), and it was up to the user to not create or pass such a collection where a mutable collection is needed.

The added complexity of having multiple versions of each would not pay for itself in benefits for actual users.
Most users are not API writers, if they just use List, Set, Map and the associated literals everywhere, their code will "just work".

Even in today's more soundly typed language, I actually believe that tradeoff is still correct.

It's incredibly rare that a function mutates a collection that it gets as argument, to the point where if it does so,
mutating that collection is most likely a primary intent of the function. If a function mutates an incoming collection,
it needs to clearly document that it does so.
And they do, operations that mutate incoming collections are generally documented and understandably named.
If a collection parameter is named accumulator, you know that things will be added to it.
If it's named foos, it'll just be read as a collection of Foos.

TL;DR: An unmodifiable collection should be a supertype of the mutable collection, otherwise API inheritance wont work. It won't be named UnmodifiableSet because then {} is UnmodifiableSet sends the wrong message. Having both would increase complexity of the type hierarchy, in API design and implementation, and in extra conversions between modifiable and unmodifable types. The tradeoff that Dart has chosen is to reduce complexity for most users in most case by making mutability the default, and non-mutability not part of the type-system. That does mean that if you make a mistake with an unmodifiable object, which someone has likely introduced in order to make sure shared values
don't change, you get a runtime error (rather than the shared values being changed).

One thing Dart could do is to introduce marker interfaces;

abstract interface class UnmodifiableCollection {}
abstract interface class FixedLengthList {}

and make every unmodifiable collection implement the former, and fixed-length lists implement the latter.
They're marker interfaces, they don't implement Iterable or List, so you can't usefully use them as parameter types.
(Otherwise it'd be a matter of time before someone write workWithList(FixedLengthList<T> flist) ..., reintroducing the complexity into the type hierarchy and API design. I know, every time we've introduced a type that wasn't intended to apper in APIs, someone used it almost immediately. (There were Sequence and EfficientLengthIterable, which we had to remove again to prevent such use. People still use HashMap and LinkedHashMap as types instead of just Map. If only Dart had a way to introduce a class that didn't introduce a type ... if that made sense.)

@lrhn lrhn closed this as completed Jan 22, 2025
@lrhn lrhn added the closed-not-planned Closed as we don't intend to take action on the reported issue label Jan 22, 2025
@stephane-archer
Copy link
Author

@lrhn thanks for your really detail answer.

An unmodifiable collection should be a supertype of the mutable collection, otherwise API inheritance wont work. [...] Having both would increase complexity of the type hierarchy, in API design and implementation.

What about having it separate? The use case I describe is, "give me a compile error if I call add on a Set", I describe something similar to UnmodifiableSetView, I don't care here if I can not use a UnmodifiableSetView everywhere a Set can be used. I want some compile errors to help me achieve immutability in my functions.

It's incredibly rare that a function mutates a collection that it gets as argument, to the point where if it does so,
mutating that collection is most likely a primary intent of the function.

if the function wraps the collection given as an argument in UnmodifiableSetView you can be sure it is not mutated. some compiler errors when modifying that function are better than some runtime errors.

@lrhn
Copy link
Member

lrhn commented Jan 22, 2025

I'd just cast the set to Iterable.
It has most of the members of Set and can't be mutated.
If you want to be absolutely certain it's not cast back to Setand changed, wrap it in an IterableView (I think such a class exists somewhere.)

You can't use union, intersection and difference, but those require having a Set already and creates a Set again.
You can just do .where(other.contains).toSet() instead of intersection, and something similar for the others.
Then there is only lookup, which can't be emulated efficiently.

@stephane-archer
Copy link
Author

stephane-archer commented Jan 22, 2025

@lrhn I changed a lot of type in my codebase for iterable as parameters type instead of set or list and I'm 99% already there.
I didn't knew iterable was imitable and you can inforce immutability that way

@eernstg
Copy link
Member

eernstg commented Jan 28, 2025

This is an amazing analysis, @lrhn!

Among other things, it illustrates how daunting the request of this issue is: Iterable, List, Set, Map and the associated literal forms are used absolutely everywhere, and it's in principle a massively breaking change to eliminate access to the mutating members of those interfaces in all those locations where they shouldn't be used.

The most important observation is probably the following:

It's incredibly rare that a function mutates a collection that it gets as argument, to the point where if it does so, mutating that collection is most likely a primary intent of the function. If a function mutates an incoming collection, it needs to clearly document that it does so.

This is the main reason why the proposal below most likely won't be very disruptive.

Dividing collection types into mutable and non-mutable ones

Assuming that this statement about being 'incredibly rare' is true, along with an assumption that the amount of code that needs to mutate a collection is generally much smaller than the amount of code that just needs read-only access to it, we could consider using the following approach:

  • Introduce use-site invariance. That is, we'll have support for types like List<exactly num> which is a possible static type of <num>[], but it can never be the static type of <int>[].
  • Introduce a lint avoid_mutating_covariant_collection that flags every invocation of a mutating member on a receiver whose type is a covariant collection type (such as List<T> for any type T).

The point is that we can now separate the two cases clearly, with support from the type checker:

  • A collection which is intended to be mutated should have a static type of the form List<exactly T> for some T when it's a list, and similarly for other kinds of collections (e.g., Map<exactly K, exactly V> for some K and V) and subtypes (e.g., LinkedList<exactly T> for some T).
  • A collection which is not intended to be mutated should have a static type that doesn't include exactly on the type parameters (that is, it should simply have the same type as it would have today).

The invariance property provides a compile-time guarantee that the mutating operations (like List.add) will not encounter any run-type type errors. In other words, this is the kind of static type a collection should have if you plan to mutate it.

Conversely, the types we're using today are always unsafe with respect to mutating members. For example, (<int>[] as List<num>).add(1.5) will compile without errors or warnings, but it will throw at run time because of a failed dynamic type check. In other words, a receiver with this type should not invoke any mutating members.

Note that List<exactly T> is a subtype of List<T>, which means that it is always possible to provide a "mutable collection" where a "non-mutable collection" is expected. It's directly assignable, without any cast or coercion.

Also note that this distinction is different from the distinction between modifiable and non-modifiable collections.

It is possible, and perhaps even useful (depending on the run-time cost associated with copying a potentially large number of collections) to pass a non-modifiable collection in each case where a collection is passed with a covariant type (like List<T>, not List<exactly T>). This means that the collection is characterized as non-mutable by its static type, and it also means that actual mutation will fail at run time.

However, if the protection offered by static typing is considered sufficient then there's no need to create new copies of various collection objects just so they can be unmodifiable. Presumably, non-modifiable collections will be used in fewer cases than today: It is the only kind of protection against mutation we have today; when List<T> and similar types are considered to mean "don't mutate this object!", the need for run-time non-modifiability may decrease.

Apart from the compile-time/run-time difference between the proposal in this comment and the use of unmodifiable collections, they also differ in that unmodifiability is an inherent property of a collection object whereas the protection based on the distinction between types like List<exactly num> and List<num> is based on the compile-time properties of the API which is used to access that object. For instance, it is perfectly possible for two pieces of code to have a reference to the same collection with the type List<exactly num> and List<num>. This just means that the former can modify it, safely, and the latter should not (and it's in any case unsafe).

Literal types

All collection literals will have invariant static types: <int>[1] will have the static type List<exactly int> rather than List<int>. The former is a subtype of the latter, which means that List<int> xs = [1]; will continue to work.

However, var xs = <int>[1]; will change the inferred type of xs: Today it gets the inferred type List<int>, but with use-site invariance it will have the inferred type List<exactly int>. This means that xs is automatically characterized as mutable at the point where it is created. This is probably what we want, based on the assumption that most collections are only modified by code which is near to the code that created this collection.

Migration

This approach does not give rise to any changes to the declarations of collection types. This implies that all the code that works fine today can just be left unchanged. Hence, migration can be done incrementally.

If a library L is selected for migration to a more strict handling of collection mutability, the lint avoid_mutating_covariant_collection is enabled for L. If there are no lint messages then you're done.

If there is one or more messages from avoid_mutating_covariant_collection then each flagged construct is an unsafe mutation of a collection.

This may turn out to be a bug, which is good: You just found it. Otherwise, the collection is really intended to be mutated at this point, so the static type of the collection should be made invariant. You can use standard techniques to enable mutating access to some amount of code, and non-mutating access to other parts of the code.

// Before.
class A {
  final List<num> xs = [];
  void foo() => xs.add(1.5); // Lint!
}

// After.
class A {
  final List<exactly num> xs = [];
  void foo() => xs.add(1.5); // OK.
}

// Alternative after, restricting mutating access to the list to the current library.
class A {
  final List<exactly num> _xs = []; // Mutating access granted privately.
  List<num> get xs => _xs; // Everybody else just gets the less-privileged `List<num>`.
  void foo() => _xs.add(1.5); // OK.
}

If some flagged mutation is performed on an imported entity foreignList from some library L2 then you may need to consider whether this mutation is appropriate or not. If it is indeed appropriate (according to the documentation of L2) then you may wish to put some pressure on the maintainers of L2 to migrate their library such that mutability is expressed clearly: If foreignList is intended to be modified in other libraries then it should be declared with type List<exactly T> for some T, not List<T>. On the other hand, if it's not intended to be modified then L should stop doing it (and communicate with the maintainers of L2 in order to learn how to achieve the same effect in a better way).

The maintainers of L2 may also respond that they aren't ready to migrate their code in the near future. In this case you can at least change your code such that it doesn't get flagged by the lint, and such that there will not be any run-time type errors:

import 'l2.dart'; // Which has `List<num> foreignList = ...;`

// Before.
void foo() {
  foreignList.add(1.5); // Lint!
}

// After, if `l2.dart` won't be migrated any time soon.
void foo() {
  var xs = foreignList; // Create a local variable: Enable promotion.
  if (xs is List<exactly num>) {
    xs.add(1.5); // OK.
  } else {
    ... // Error handling, presumably more graceful than throwing a `TypeError`.
  }
}

Steady state

It would be possible to migrate all Dart code to use the style where mutability of collections is expressed explicitly everywhere. However, it seems likely that this will not happen. The reason is that the dynamically checked covariance and the freely available access to mutating members that shouldn't be used don't create a huge amount of problems (apparently), and developers have lots of other things to do.

Perhaps the approach described here will only be used in code which is particularly critical (because it's used in a situation where a bug could cause serious real-world damage, or because the code is very widely used, and even smaller improvements of the overall correctness would affect a lot of people).

This implies that we're likely to have a situation where some parts of a system are migrated to use this approach, and other parts are not.

This could create some readability issues at the boundary. For example, foreignList in the previous section has type List<num>, but it is intended to be modified (from L, from L2, from anywhere). This conflicts with the interpretation that the type List<num>(in L) is taken to mean that it should not be mutated.

Why not generalize?

It could be claimed that if this approach is any good for collections then it should also be used for any other class which has unsafe mutating methods. The lint avoid_mutating_covariant_collection would then need to be generalized to flag all invocations of methods whose signature isn't covariant in the type parameters of the enclosing class (this is a more general statement of the main property which is used to select the corresponding collection members).

This would be possible, but collections are special in that they are used so pervasively, and they have literal forms, and it's such a daunting task to handle the issue at the root cause (which is that the type parameters weren't declared as invariant from day one).

There may be other classes that would benefit from a similar treatment as the collection classes. However, in general, other classes are much more amenable to other strategies, for instance, changing the declaration to use declaration-site invariance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-not-planned Closed as we don't intend to take action on the reported issue package:collection
Projects
None yet
Development

No branches or pull requests

4 participants