Skip to content

incompatibility of package-block and __DATA__ #22613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
happy-barney opened this issue Sep 20, 2024 · 31 comments
Closed

incompatibility of package-block and __DATA__ #22613

happy-barney opened this issue Sep 20, 2024 · 31 comments

Comments

@happy-barney
Copy link

happy-barney commented Sep 20, 2024

When using package block syntax and __DATA__ in single file, __DATA__ doesn't belong to package
but to the main.

__DATA__ documentation states:

Text after __DATA__ may be read via the filehandle "PACKNAME::DATA",
    where "PACKNAME" is the package that was current when the __DATA__ token
    was encountered.

Technically it is correct, __DATA__ token is not specified inside package block (ie, package block prevents usage of __DATA__)
But with code like in example that may be misleading.
It may also harm adoption of syntax.

Example:

use strict;
use warnings;

package Foo {
    print while <DATA>;
}

__DATA__
hello
world

Output

Name "Foo::DATA" used only once: possible typo at example.pl line 7.
readline() on unopened filehandle DATA at example.pl line 7.

All Perls v5.12 .. v5.40 are affected.

Proposal:

a) support __DATA__ inside package block (ie, it will be treated as } as well
b) treat insignificant content following } as content still belonging to the latest package block

@Tux
Copy link
Contributor

Tux commented Sep 20, 2024

I do not see that warning when using __DATA__, but I do see it when using __END__

@mauke
Copy link
Contributor

mauke commented Sep 20, 2024

I get

hello
world

as expected.

@happy-barney
Copy link
Author

@mauke @Tux ouch, oops, I mentioned package block and typed example without it :-( ... editing issue

@mauke
Copy link
Contributor

mauke commented Sep 20, 2024

That is working as expected. The whole point of the package Foo { ... } syntax is that everything outside the braces is not part of the Foo package.

@happy-barney
Copy link
Author

@mauke that I know, I mentioned that in technically part.

I also mentioned possible confusions and fact, that there is no alternative to attach __DATA__ to package except of old package syntax

@Leont
Copy link
Contributor

Leont commented Sep 20, 2024

there is no alternative to attach DATA to package except of old package syntax

I get that you might not like that aesthetically, but when is that a practical problem?

@happy-barney
Copy link
Author

@Leont bad wording, likely to confuse

rest is only convenience - forcing inconsistent syntax across code base or, in case of class, weird syntax for class (especially if there is still hope of increasing usage of Perl)

@iabyn
Copy link
Contributor

iabyn commented Sep 22, 2024 via email

@shlomif
Copy link
Contributor

shlomif commented Sep 22, 2024

In addition, I'd like to note I feel __DATA__ is too abused and overused. I recommend using here-docs or https://metacpan.org/pod/File::ShareDir rather than the global-variable-like-*DATA

@happy-barney
Copy link
Author

@shlomif too doesn't mean that its usage isn't advocated as well

@Tux
Copy link
Contributor

Tux commented Sep 23, 2024

In addition, I'd like to note I feel __DATA__ is too abused and overused. I recommend using here-docs or https://metacpan.org/pod/File::ShareDir rather than the global-variable-like-*DATA

I wholeheartedly disagree. I use __DATA__ and __END__ a lot and I have never in my life used File::ShareDir nor did I ever feel the need for that.

Main reason is that scripts with __END__/__DATA__ hold that data within and are easy to copy to other hosts whereas external files need to be copied seperately with all possible isseus.

Here-docs are file for a single-use short piece of data, but when dealing with a list or longer data structures or text pieces to apply tests to (like: does this parse), here-docs only blur the code

My € 2.00

@haarg
Copy link
Contributor

haarg commented Oct 2, 2024

Everything here is working as expected, and I don't think there's anything worth changing. __DATA__ is inherently a file-based feature, and trying to make it interact with inner lexical scopes isn't going to work well. class isn't any different from package here, and can be written without an enclosing block.

If you need a block of data inside a scope, you can use a heredoc.

@happy-barney
Copy link
Author

@haarg I will repair your sentence: "Everything here is working as implemented", because this issue is about to change "as expected".

@haarg
Copy link
Contributor

haarg commented Oct 2, 2024

No, I chose my words intentionally. Based on how these features work, doing something other that how they work right now would be very strange.

@ap
Copy link
Contributor

ap commented Apr 3, 2025

Yeah I don’t see the issue here. The package is not in scope outside the block so the __DATA__ doesn’t apply to it. Finding a way to make __DATA__ somehow get assigned to a package other than the one that’s in scope would be awful.

@happy-barney
Copy link
Author

@ap issue is consistency of language. Currently you can have either package-block or __DATA__ but not both.

Look at that from point of view of newbie in language used to package-block syntax from other languages.

@ap
Copy link
Contributor

ap commented Apr 3, 2025

There is no way to fix that (inasmuch as it even needs “fixing”). __END__ and __DATA__ by definition consumes the rest of the file and package NAME BLOCK by definition needs a block-end marker. There is no way to put these features together, except by force of screwing up the design of at least one of them with some sort of craziness. This is a problem only if you impose on yourself a rule to always use package NAME BLOCK.

@happy-barney
Copy link
Author

happy-barney commented Apr 3, 2025

@ap as they say "never say never" - I put two proposal into issue

Possible implementation of proposals:

a) support DATA inside package block

  • allow indented __DATA
  • such data must match qr ( \} \s* $ )x, which will be removed by lexer and lexer (edit: finish sentence) will do yyl_data_handle and return PERLY_BRACE_CLOSE before yyl_fake_eof

current perly.y

PERLY_BRACE_OPEN remember stmtseq PERLY_BRACE_CLOSE

new perly.y

PERLY_BRACE_OPEN remember stmtseq optdatablock PERLY_BRACE_CLOSE

b) treat insignificant content following } as content still belonging to the latest package block

I don't have exact implementation but idea:

  • Perl_package currently sets PL_curstash
    • extend it with storing effective stash
  • at the end of block, effective stash will be preserved
  • yyl_data_handle currently uses current stash - will use effective stash instead
  • yylex, when returning token, will assign current stash to effective stash

@book
Copy link
Contributor

book commented Apr 7, 2025

there is no alternative to attach DATA to package except of old package syntax

Indeed, here's how one can tell Perl which package the __DATA__ section belongs to, all the while using package blocks:

use strict;
use warnings;
 
package Foo {
    print while <DATA>;
}

package Bar {
    # There's only one DATA handle per file,
    # and it's in the other package,
    # as we'll soon discover...
}

package Foo; # everything that follows belongs to package Foo
__DATA__
hello
world

Output:

hello
world

Currently you can have either package-block or __DATA__ but not both.

I think the above showed an example having both a package-block and a __DATA__ section.

package BLOCK declares the package of the block that follows, a package line declares the package of the lines that follow.

__DATA__ is inherently a line-based structure ("from this line on..."), like the old package statement and several others (__END__, # line...). So it doesn't feel very inconsistent to me to have to use some line-based syntax to ensure which package applies to it.

Look at that from point of view of newbie in language used to package-block syntax from other languages.

I don't know of other languages that have an equivalent of the __DATA__ feature. (That is likely a sign of my own ignorance; I'd be happy to learn about them!)

@happy-barney
Copy link
Author

removed as still out of point of issue - issue is not about how it works now, but how it should work better

Look at that from point of view of newbie in language used to package-block syntax from other languages.

I don't know of other languages that have an equivalent of the __DATA__ feature. (That is likely a sign of my own ignorance; I'd be happy to learn about them!)

re other language with __DATA__ - for example Ruby has __END__, PHP has __halt_compiler

re newbie (explained):

  • Perl has strong concept 1 package 1 file (yes, you can mess ...)
  • pattern symbol-space BLOCK is more common across programming languages:
    • typescript: namespace Foo { }
    • java: class Foo { }
    • ruby: module Foo ... end
    • C++: namespace Foo { }; class Foo { }

@ap
Copy link
Contributor

ap commented Apr 7, 2025

as they say "never say never" - I put two proposal into issue

“Never” still applies. You didn’t put the features together, you proposed two possible ways to screw up the design of either one of them with some sort of craziness… just like I said.

The whole point of __DATA__ is that it gives you a file handle. If you make the feature do some kind of post-processing of the content so that the file handle no longer actually directly accesses the file on disk, then the feature is pointless and anything you would want to do with it can be done with just a heredoc. We have had indentable heredocs for a while. If that is what you are asking for then your answer is “use a heredoc for that”.

And “let’s break the basic concept of how scopes work just for this one obscure super-special case” is not a serious proposal. Your list of languages is nice and handy here, go ahead and show me a construct in any of them where a scope consumes things that come after the end of the scope.

You have a problem with the language because of a self-imposed stylistic preference. The solution is not to change the language but to fix your stylistic preference.

@ap
Copy link
Contributor

ap commented Apr 7, 2025

re other language with __DATA__ - for example Ruby has __END__

Not a useful example. Ruby only has a single global DATA handle, nothing like Perl’s per-package handles. If you have __END__ in multiple files, only one of those will actually be accessible. It certainly doesn’t try to allow __END__ inside a scope.

PHP has __halt_compiler

This is even more primitive. It just causes a constant __COMPILER_HALT_OFFSET__ to be defined, and you get to open the file and seek to that offset yourself. (Though this design does make it even more explicit that the file handle is the entire point of the exercise.) I think this constant is per-file but I’m not sure. Anyway PHP doesn’t do anything special to address putting this directive inside a scope, either.

@happy-barney
Copy link
Author

re other language with __DATA__ - for example Ruby has __END__

Not a useful example. Ruby only has a single global DATA handle, nothing like Perl’s per-package handles.

doesn't what you say applies for combination of package-block and __DATA__ ? then it is always single global main::DATA ... and that's point raised by this issue.

@book
Copy link
Contributor

book commented Apr 7, 2025

it is always single global main::DATA

There's as many DATA handles as there are packages, and files to hold them.

@happy-barney
Copy link
Author

it is always single global main::DATA

There's as many DATA handles as there are packages, and files to hold them.

I admit I'm not native speaker but as this pair of sentences really so badly formulated ?
(if so, I'd ask someone to play devil's advocate and re-write my comments little bit better)

@ap
Copy link
Contributor

ap commented Apr 7, 2025

doesn't what you say applies for combination of package-block and __DATA__ ? then it is always single global main::DATA ... and that's point raised by this issue.

Yes, if you write your multiple Perl code files in such a way that their __END__ markers are all in package main, then only one of them will become main::DATA.

So don’t do that.

@guest20
Copy link

guest20 commented Apr 7, 2025

Since you can't stick a __DATA__ inside curlies, how about a namespaced __DATA__ tokens?

package Foo {
  sub lines { <DATA> }
}

Foo::__DATA__
first line, index 0
second line, index 1 

@happy-barney
Copy link
Author

happy-barney commented Apr 7, 2025

Since you can't stick a __DATA__ inside curlies, how about a namespaced __DATA__ tokens?

That is interesting idea as well.

there is also possibility of new handle, eq FILE_DATA
or even better capablity to turn DATA into lexical (... my earlier poc will become handy, #22850)

@Grinnz
Copy link
Contributor

Grinnz commented Apr 7, 2025 via email

@haarg
Copy link
Contributor

haarg commented Apr 7, 2025

  • package NAMESPACE BLOCK syntax reverts to the previous package after the block ends. The BLOCK must be properly terminated.
  • __DATA__ stops parsing the file and leaves the file handle being read from open and stores it in *DATA in the currently active package. Syntactically, it is the equivalent to the end of the file.

Neither of these things are going to change, even if it means these two features don't usefully interact with each other.

Something involving a lexical file handle would have to work significantly differently from __DATA__. You are welcome to propose something like that if you have a design for it.

@haarg haarg closed this as completed Apr 7, 2025
@haarg haarg closed this as not planned Won't fix, can't repro, duplicate, stale Apr 7, 2025
@guest20
Copy link

guest20 commented Apr 7, 2025

@Grinnz that's a way better syntax for it than my suggestion ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests