Skip to content

Union Types #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 24 commits into from
Closed

Union Types #1

wants to merge 24 commits into from

Conversation

nikic
Copy link
Member

@nikic nikic commented Sep 4, 2019

Proposal for the addition of union types T1|T2|.... There has been a previous proposal on this topic, which has been declined.

Final RFC and Vote

(Original Rendered Proposal)

nikic and others added 2 commits September 4, 2019 10:45
Co-Authored-By: Gabriel Caruso <[email protected]>
@carusogabriel
Copy link

Discussions about the previous RFC:

@AegirLeet
Copy link

I really like it. I've encountered many situations where I wanted a method to accept multiple types while retaining type safety. The solution so far was using docblock type hints + static analysis + runtime checks. Union types on a language level would be a much nicer solution!

In addition to the nullable (?Type) and iterable types mentioned in the introduction, isn't callable another union type, since it can be either an array (with certain criteria), a string (with certain criteria) or a Closure?

For this reason, support for the `false` pseudo-type is included in this proposal. A `true` pseudo-type is *not* part of the proposal, because similar historical reasons for its necessity do not exist.

The `false` pseudo-type cannot be used as a standalone type, it can only be used as part of a union.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some clarity on why true is not included as a pseudo-type is needed here.

Copy link

@asgrim asgrim Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A true pseudo-type is not part of the proposal, because similar historical reasons for its necessity do not exist.

it's already there ;)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. A bit weak, IMO, so if there isn't much complexity involved, I'd suggest introducing it to reduce the edge cases.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that true should be included as a pseudo type. Without reading the RFC one would assume it's included. It helps documentation & tutorials since no effort is needed to explain why it's not included.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. A bit weak, IMO, so if there isn't much complexity involved, I'd suggest introducing it to reduce the edge cases.

@mnapoli I noticed your downvote. Care to elaborate? (Personally, I value your opinion.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mnapoli I noticed your downvote. Care to elaborate?

@joshdifabio it's easier to simply downvote what you find on the internet rather than write a proper argument, I went with the lazy way 😄

Nikita summed it up nicely already, I don't have a strong opinion on this and I don't think my opinion matters much here, but: false would exist for legacy reasons. true would have no real reason to exist except consistency, and I would favor aiming for what's essential over consistency or completeness.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is sound; my one concern is that allowing false as an explicit return type seems to indicate that it's an OK/good thing to do. I think most agree these days that returning something|false is bad design, but PHP (and sadly much of userspace) is full of that bad design.

Is there some way, perhaps just in documentation, that we could call out that returning foo|false is supported but a bad idea? Or at least avoid giving the implication that it's a good idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think people should just fall back to Foo|bool for now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially now that the syntax for nullable type has switched to T|null, it should be made clear that not only false, but also both ?false and false|null, are unsupported.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding support for false in favor of bool without proper support for literal types helps to promote bad design decisions

@sebastianbergmann
Copy link

Appreciate the new RFC medium.

@akalongman
Copy link

akalongman commented Sep 4, 2019

This feature would be great! 😍

@nikic can this feature bring performance issues?

array_filter(array $array, callable $callback, ArrayFilterFlags $flag): array;
```

Proper enums are likely a better solution to this problem space, though depending on the implementation they may not be retrofitted to existing functions for backwards-compatibility reasons.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really love to see both custom types and enums, as separate language features, so hope we keep the door open to both. In particular, I am strongly of the opinion that an enum value should not be coercible to a plain type, because that leads to nonsense like MONDAY == JANUARY; as you say, that would rule out using them for existing function signatures. Custom types, meanwhile, could have more complex definitions - e.g. combining unions, intersections, constrained callables, and domains such as int{>=0,<100}.

class B extends A {}

class Test {
public A|B $prop;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should not be allowed at all? Since A type already includes B as well, using A|B type has no added value over just A.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted elsewhere in the RFC, this can't be known in the general case at compile-time, because classes A and B may not have been defined yet.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and type analysers can do the complaining for you (though at least in my case it'll just treat as public A $prop).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, makes sense. Yeah, I'm sure PHPStan will handle this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be moved to a separate section: Known problems or Limitations because while it should work as @enumag says, the engine doesn't make it possible ATM.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see there would be some value if the language was able to point this out, but there's plenty of ways of writing redundant code, so I don't see it as a "limitation" particularly. For instance, you can already write if ( $x instanceof A || $x instanceof B ), or even if ( $x instanceof A || $x instanceof A ), which is clearly redundant. A tool like PHPStan may well spot this for you, and can probably spot redundant union types as well, because it can analyse your whole project.

@tsantos84
Copy link

Type aliases is great.

@hikari-no-yume
Copy link

If an RFC is to be on Github and in Markdown anyway, I wonder if it couldn't be made as a pull request to the language specification repo instead. It could be beneficial to that spec for it not to be updated later after RFCs are passed, but instead the spec update could be the RFC. Perhaps this is too radical however :)

Copy link

@Ocramius Ocramius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

The comments above are mostly tiny details, I think the proposal is very much sound and very welcome too.

@freshp
Copy link

freshp commented Sep 4, 2019

I hope it is the right place to ask for something? Its the first RFC I read, thats why please be patient.
Excuse me too, if I misinterpreted the RFC but I dont find any example where I want to use this feature, besides nullability?

IMO many programmers are going to use it like a mixed-typehint with "strict_types = 1". (something like : ":int|float|string|bool") Also IMO PHP was getting typified, harded, easier to read and write since version 7. Thx for that, but this feature could go in another direction.

Maybe a realworld example could help me to understand, which sourcecode is getting easier to maintain or to write with this new feature?


What is likely intended instead is `?T`, which allows returning either `T` or `null`.

### `false` pseudo-type
Copy link

@dbrekelmans dbrekelmans Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO adding the false pseudo-type because of library implementation details is the wrong choice. Adding a language-level feature to cope with a historical design flaw is only adding to the pile. It's legacy as soon as it's implemented.

I absolutely understand your reasoning; I myself would prefer to know a function cannot return true. However, if typehinting |bool over |false is such an issue, it should be fixed at the root (return null instead of false) instead of adding a band-aid.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're getting at, but rather than "legacy as soon as it's implemented" it could also be seen as a pre-cursor for a more general feature, where you can specify a union of values rather than just types. See "Literal Types" in the Future Scope section for a brief discussion of this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to BC constraints, changes to return type like that have a devastating potential to break uncountable (read infinite) amount of code on the internet, affecting adoption rate for the upgrade. I don't think you are wrong to say it's legacy the moment it's implemented, but it's part of coping with the 20 years baggage of PHP.
I think discussing the merits of BC breaks is not worth anybody's time at this RFC as Internals have been debating that for several months already (that I know of).

Copy link

@dbrekelmans dbrekelmans Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deleugpn I'm certainly not proposing such a BC break and I agree let's avoid the BC discussion. I'd rather leave the wound exposed (typehint |bool where false is returned) than cover it up with this pseudo-type band-aid.

@IMSoP Looking at it that way, it makes more sense. :)

Copy link

@mattacosta mattacosta Sep 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with false is really one of meaning as opposed to technical requirement. With all the "standard" types, developers adopt a general definition of what they represent. The value false is generally defined similarly to:

// The two values of a logical condition.
enum Boolean {
  False,
  True,
}

As we can see, false was never meant to represent an error condition much in the same way that the string "apple" or the number 1234 don't generally represent error conditions either. Including false as a pseudo-type simply perpetuates this misrepresentation.

Furthermore, I would argue that may actually be detrimental to developers as its usage in this manner typically indicates an error.

  • In the case of a function or method that should return a value, what was likely meant was null or another type similar to Rust's Option<T> or Result<T, E> which are meant to represent error conditions.
    This also covers literal types, because they are generally used to represent a subset of some other type. Since true is invalid in this context and false should not be combined with anything (its supertype does not have any other values), false can no longer represent its intended meaning, one of two values.
  • In the case of a function or method that should not return a value, the correct result should actually be a thrown error instead. Using false in this second case is especially bad, as it may also prevent static analysis from narrowing the intended type.
class Map {
  function has(key: TKey): bool {}
  function get(key: TKey): T | false {}
}

function foo(): T | null {
  $map = new Map();
  if ($map->has("bar")) {
    // Here we expect `get()` to always return `T`, however because of our poor
    // design decision to not throw an exception, the type cannot be narrowed,
    // and an error would be reported here because `T | false` != `T | null`.
    return $map->get("bar");
  }
  return null;
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're getting at, but rather than "legacy as soon as it's implemented" it could also be seen as a pre-cursor for a more general feature, where you can specify a union of values rather than just types. See "Literal Types" in the Future Scope section for a brief discussion of this.

Maybe it would be better to leave false out of this proposal but flesh it out more as a part of literal types? If someone wants to use that paradigm, allowing type false = false would allow the end use to pull off those sort of Shenanigans without encouraging poor practices by specifically implementing them in code.


### Property types

Property types are invariant, which means that types must stay the same during inheritance. However, the "same" type may be expressed in different ways. Prior to union types, one such possibility was to have two aliased classes `A` and `B`, in which case a property type may legally change from `A` to `B` or vice versa.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because I had to think this through for a moment, I'd suggest explaining why property types are invariant during inheritance. Specifically that setting a property would allow for contravariance, but that reading would only allow for covariance. The intersection of those rules is invariance.

@robbertstevens
Copy link

I don't if it was already asked before, but why would you want this over method overloading? If a method accepts both a string and an integer wouldn't you want to have different behaviours in the method?

In which method overloading creates a much clearer code base than a method that both handles a string and an integer.

@beberlei
Copy link

beberlei commented Sep 4, 2019

@robbertstevens Dan has aggregated the current state of why method overloading is probably never going to happen in PHP here: https://github.com/Danack/RfcCodex/blob/master/method_overloading.md

@msarca
Copy link

msarca commented Sep 4, 2019

There are some issues I would like to address. Union types are necessary, but I believe that some improvements must be made to this proposal.

Type aliasing

Let's look at the following example.

use Foo\bar;

function test(bar $bar) {
    // is $bar an object? 
}

If the Foo\bar type exists, then $bar is definitely an object. This is how it's been for years. But now, with the introduction of exported types, $bar could be anything. And this incertitude could prove to be a PITA in the long run.

namespace Foo;
type bar = int|float;

// Usable as \Foo\bar from elsewhere

Proposal

Introduce use type. This way we could no longer make wrong assumptions.

use Foo\bar;
use type Foo\baz;
use type int|float|string|bool as scalar; // local type

function test1(baz $baz, scalar $scalar) {
    // Don't make assumptions, check type definition.
}

function test2(bar $bar) {
    // It's an object
}

Nullable types

Nullable types should not be allowed when defining a union type.

use type ?(T1|T2|T3) as foo;
use type bar;

// What's this? Nullable nullable type?
function test(?foo $val) {
}

// Is bar nullable?
function test2(bar $val) {
   // Should I check if $val is null?
   if (is_null($val)) {
   }
}

Proposal

Keep it simple. No nullable types inside unions

use type T1|T2|T3 as foo;

// Not nullable
function test(foo $val) {
    // $val can't be null
}

// Nullable
function test2(?foo $val) {
   // I should check if $val is null
   if (is_null($val)) {
   }
}

Keep the syntax clean

There should be a single way of declaring a union type. The in place way of declaring a union type should be removed.

use type float|int as number;

// Confusing..
function sum(float|int $a, int|float $b): number{
}

// ??
function example(int|float $value, int $flag = 2|4|8|16|32, string|object $item = null): float|int|false {
}

It's hard to read, hard to follow and the syntax is super ugly.

Proposal

Keep the syntax unchanged by forcing the user to either export a union type or declare it locally with the help of use type.

use type float|int as number;

// Makes sense now..
function sum(number $a, number $b): number{
}

This way the ReflectionUnionType is no longer necessary. Only the ReflectionType needs to be changed.

class ReflectionType {
    isUnionType(): bool;
    getUnionTypes(): array;
}

Putting all together..

type scalar=int|float|string|bool; // export union type
use type int|float as number; // local union type

class Foo
{
    private number $foo; // non-nullable property
    private ?number $bar; // nullable property
    
    // non-nullable arguments & return type
    function sum(number $a, number b):number {
    }
   
    // nullable arguments and return type
    function test(?scalar $value): ?scalar {
    }
}

@Ocramius
Copy link

@mattacosta @kocsismate in PHP, there is a concept of "unset object properties", which matches what you want here.

@nikic
Copy link
Member Author

nikic commented Sep 12, 2019

To add to what @Ocramius said, typed properties default to uninitialized, so you can simply use private ?Value $lazyProperty and check whether it's initialized in getLazyProperty().

Unfortunately, there is no easy way to distinguish between an uninitialized and a null property right now (only possible via reflection), so this still falls short in practice. But generally this would be the correct way to go about it, I think.

@kocsismate
Copy link
Member

kocsismate commented Sep 12, 2019

@nikic Yes, I agree to you, and I already missed an easy-to-use functionality to check uninitialized properties when I wanted to lazy-load other properties which had type declarations. As far as I remember I had to downgrade type declarations to PHPDoc at the end :/ .

Speaking about unset, I use PHPStan strict rules so I didn't even consider using unset properties. But this is probably another topic which falls out of the scope of the discussion about union types. :)

@IMSoP
Copy link

IMSoP commented Sep 12, 2019

I have always hated that typed properties added an extra "uninitialized" state, which can't be detected, and only exists in certain scenarios. I still think it was a bad design.

If you want more than one terminal value, then some kind of enum or algebraic data type is probably the ideal, and a union with specific values can emulate this a bit, e.g. string|INVALID_ENTRY|FILE_NOT_FOUND. Having both null and false allows you to write string|null|false, but you're still stuck if you want a third terminal value, or if you want a lazy boolean.

@kocsismate
Copy link
Member

I admit that my Value|null|false solution is not elegant at all, but currently this is the easiest viable option (which won't be reported by static analyzers) for such a trivial use-case I have (a builder class for unit testing). Of course, I would welcome a mean to detect uninitialized properties besides reflection or inspecting the var_dump() result :D - it would certainly eliminate the need to use my workaround.

P.S: I believe the uninitialized state is a "necessary evil", otherwise we had trouble declaring non-nullable class types for properties or we ended up in a situation where Java is in now (where classes are implicitly nullable).

@mattacosta
Copy link

The reason I assumed that your code could work with booleans is because if not, then the property could just as easily have been typed as T|null|bool, which would mean that once again, false does not provide anything meaningful.

All of that is irrelevant to my original question of if a restriction would be feasible though. Assuming it is, I ask because the primary argument for false, as specified in the RFC, is to support internal functions. If that argument is sound, then it would seem to be a better solution. If not, then shouldn't the RFC text be updated to include the reasoning behind including this "convenience" in actual code? There's nothing mentioning its potential impact, nor anything backing the special rules involved either.

As a side note: It also kind of sucks that the false-as-a-type discussion is in like five different places throughout this PR, some of which is hidden behind resolved change requests. Maybe the next RFC experiment could use the TC39 model of having each proposal be its own repository with actual issues, and this repo just be a directory of sorts. Of course then the repo would have to be transferred to the PHP organization at some point for preservation... so ugh.

@markushausammann
Copy link

Please make all RFC's available on Github like this from now on!

@kasparsklavins
Copy link

kasparsklavins commented Sep 24, 2019

A classical example is the strpos() family of functions, which returns int|false.

While it would be possible to model this less accurately as int|bool, this gives the false impression that the function can also return a true value, which makes this type information significantly less useful to humans and static analyzers both.

A similar argument could be said in favour of scalar types for, e.g., preg_match that returns 1|0|false.

@patrickallaert

This comment has been minimized.

@qbbr
Copy link

qbbr commented Oct 8, 2019

Proposal+, GH RFC - good practice! Thx @nikic.


The `null` type is supported as part of unions, such that `T1|T2|null` can be used to create a nullable union. The existing `?T` notation is considered a shorthand for the common case of `T|null`.

An earlier version of this RFC proposed to use `?(T1|T2)` for nullable union types instead, to avoid having two ways of expressing nullability in PHP. However, this notation is both rather awkward syntactically, and differs from the well-established `T1|T2|null` syntax used by phpdoc comments. The discussion feedback was overwhelmingly in favor of supporting the `T1|T2|null` notation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just ?T1|T2?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ambiguous syntax.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have more than two types, like T1|T2|T3, then it's become complicated to place ce question marks (?T1|T2?|T3? or ?T1|?T2|T3?).
The T1|T2|T3|null syntax looks more comprehensive in this case.

Copy link

@mcneely mcneely Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthieu88160 The second question mark was outside as I was asking a question. More Types would look like this ?T1|T2|T3.

Copy link

@mcneely mcneely Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Majkl578 But is it really that ambiguous? The question mark already establishes to allow null. ?T1 which is current is the same as null|T1. Whether someone reads that as (null|T1)|T2|T3 or null|T1|T2|T3 isn't going to make a difference.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(null|T1)|T2|T3 and null|T1|T2|T3 is equivalent.

Please read the discussion in this thread where this choice was discussed: #1 (comment)

@nikic nikic added the Accepted RFC accepted label Nov 8, 2019
@nikic
Copy link
Member Author

nikic commented Nov 8, 2019

Voting has closed with 61 votes in favor and 5 against. As such, the RFC is accepted.

@nikic nikic closed this Nov 8, 2019
@rask
Copy link

rask commented Nov 8, 2019

Nice!

Meta: will RFCs be opened to broader discussion via GitHub in the future as well? Also would it make sense to actually merge the accepted RFCs into the repo, instead of just closing the PR?

@nikic
Copy link
Member Author

nikic commented Nov 8, 2019

@rask See https://externals.io/message/107747. The tl;dr for now is: RFCs can be initially submitted against GitHub, but have to move to the PHP wiki before voting.

@kalinin-k-a
Copy link

kalinin-k-a commented Nov 8, 2019

Sad but true.
Union types satisfies the worst programming practices. It makes interfaces unpredictable, it forces developers to write a lot of if(typecheck) {} ....

Nullable types are fine. I also understand that a lot of builtin function return false if error, so something like falseable is also good. For example ?int - falsable int, ??int - nullable int.

But the most of examples I see here https://gist.github.com/nikic/64ff90c5038522606643eac1259a9dae are just funny. RFC must prohibit such thigs:
Organization|Organization[]|Person|Person[]
\PHPUnit_Framework_MockObject_MockObject|\Payum\Paypal\ExpressCheckout\Nvp\Api
string|integer|array|Zend_Date
"Enum|null|bool|int|float|string|array
Awful!

Maybe some words as "scalar", "numeric" are acceptable, but such a full freedom in types definition is a.. is a... is a bottom|disaster|php-sunset|distrubance|upset|etc

@zzz6519003
Copy link

to save space?

@ZRayCC
Copy link

ZRayCC commented Nov 13, 2019

It's terrible feature. I can't accept that. I like that one variable has one type.
In a team, some people will make the code worst because of the union types. It a bad standard and hardly trace.

@php php locked as resolved and limited conversation to collaborators Nov 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Accepted RFC accepted
Projects
None yet
Development

Successfully merging this pull request may close these issues.