Skip to content

[5.6.2-to-be] PerlIO is EVIL #4165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
p5pRT opened this issue Jul 3, 2001 · 24 comments
Closed

[5.6.2-to-be] PerlIO is EVIL #4165

p5pRT opened this issue Jul 3, 2001 · 24 comments

Comments

@p5pRT
Copy link

p5pRT commented Jul 3, 2001

Migrated from rt.perl.org#7220 (status was 'rejected')

Searchable as RT7220$

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2001

From [email protected]

  #!perl -w

  open F, "$^X -wle 'binmode STDOUT; print q()' |" or die;
  my $c = getc F or warn;
  $c eq "\n" or warn ord $c;
  close F or die;

is silent on usual perl, and warns "255" on 5.6.2-to-be. No, I have
no idea why it is 255, I constructd this example to demonstrate a
*different* bug of PerlIO.

The *actual* problem is that char-by-char input requires DUPLICATE pressing
of ENTER key for this key to be seen by Perl. Debugging this problem
(via Term​::ReadKey test suite) shows the following logic​:

  pp_getc() calls is_eof() which does getc/ungetc
  calls getc()

  [BTW, I see no logic in this sequence of events.]

The problem is that ungetc() can't unget "\n" if this \n is the first char
in the buffer, and quietly drops "\n" to the floor. The (horrible) code
looks like this​:

  if (ch == '\n')
  {
  if (b->ptr - 2 >= b->buf)
  {

Tracing through perlio.c shows how much EVIL it is. All 80K of it. I
simply have no words for this code. Extremely inefficient and bloated,
doing enormous amount of work for trivial tasks - not even mentioning
how buggy it turns out to be! Apparently the principal task is to have
CRNL translation, and the principal usage is on Dosish systems - which
*already* do this with binary().

In the example of Term​::ReadKey, CRTL *knows* that TTYs should be opened
in the binary mode. PerlIO does not. There are miriads of such special
cases.

This code *duplicates* the work already done in the CRTL, thus lead to
double translations (as easy examples show). Not even mentioning that
this code is probably 1/10 as efficient as the code in CRTL (and 5 times
bulkier and sub-call-deeper) - in*****tent is the only word I could find
for perlio.c.

The task of CRNL translation is very quick and easy - with a proper
implementation. No need to extra copying, no nothing​: just keep an extra
pointer into the buffer. Whatever is before the pointer is translated,
whatever is after is not - and the char immediately after the pointer
should be ignored. On _fill() move the pointer to "\n" in the nearest
"\r\n" pair, and replace the first "\r" by "\n". That's all. ungetc()
modifies only the already-translated part, there is no need to untranslate
things (which is impossible anyway). Similar things may be done for all
the other needed translations.

[It is also very efficient to report only the part until this buffer as
available, if it is available. This allows simple inlining of getc().]

I do not even want to mention that in many cases all Perl needs is a
pointer to the buffer, but PerlIO insists on copying the stuff...

I do not think that PerlIO is ready for 5.6.2.

Hope this helps,
Ilya


Flags​:
  category=core
  severity=critical


Site configuration information for perl v5.7.1​:

Configured by vera at Mon Jun 25 01​:33​:44 PDT 2001.

Summary of my perl5 (revision 5.0 version 7 subversion 17) configuration​:
  Platform​:
  osname=os2, osvers=2.30, archname=os2
  uname='os2 ia-ia 2 2.30 i386 '
  config_args='-des -Dprefix=i​:/perllib -Dusedevel'
  hint=recommended, useposix=true, d_sigaction=define
  usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
  useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
  use64bitint=undef use64bitall=undef uselongdouble=undef
  Compiler​:
  cc='gcc', ccflags ='-Zomf -Zmt -DDOSISH -DOS2=2 -DEMBED -I. -D_EMX_CRT_REV_=63 -Wall',
  optimize='-O2 -fomit-frame-pointer -malign-loops=2 -malign-jumps=2 -malign-functions=2 -s',
  cppflags='-Zomf -Zmt -DDOSISH -DOS2=2 -DEMBED -I. -D_EMX_CRT_REV_=63 -Wall'
  ccversion='', gccversion='2.8.1', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=4
  alignbytes=4, usemymalloc=y, prototype=define
  Linker and Libraries​:
  ld='gcc', ldflags ='-Zexe -Zomf -Zmt -Zcrtdll -Zstack 32000 -Zlinker /e​:2'
  libpth=i​:/emx.add/lib i​:/emx/lib D​:/DEVTOOLS/OPENGL/LIB I​:/JAVA11/LIB i​:/emx/lib/mt
  libs=-lsocket -lm -lbsd
  perllibs=-lsocket -lm -lbsd
  libc=i​:/emx/lib/mt/c_import.lib, so=dll, useshrplib=true, libperl=libperl.lib
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
  cccdlflags='-Zdll', lddlflags='-Zdll -Zomf -Zmt -Zcrtdll -Zlinker /e​:2'

Locally applied patches​:
  DEVEL10906


@​INC for perl v5.7.1​:
  lib/os2
  lib
  i​:/perllib/lib/5.7.1/os2
  i​:/perllib/lib/5.7.1
  i​:/perllib/lib/site_perl/5.7.1/os2
  i​:/perllib/lib/site_perl/5.7.1
  i​:/perllib/lib/site_perl/5.00553/os2
  i​:/perllib/lib/site_perl/5.00553
  i​:/perllib/lib/site_perl
  .


Environment for perl v5.7.1​:
  HOME=j​:/home
  LANG=EN_US
  LANGUAGE (unset)
  LD_LIBRARY_PATH (unset)
  LOGDIR (unset)
  PATH=i​:\NETSCAPE\PROGRAM;d​:\OPENDOC\EPM;d​:\OPENDOC\OREXX;d​:\OPENDOC\BIN;i​:\DRV;i​:\TOOLKIT\BETA\BIN\PMEISTER;i​:\TOOLKIT\BETA\BIN;i​:\TOOLKIT\BIN\OSARED;i​:\TOOLKIT\SOM\COMMON;i​:\TOOLKIT\SOM\BIN;i​:\TOOLKIT\BIN;D​:\OS2;D​:\OS2\SYSTEM;i​:\VABASIC\SOM\BIN;D​:\OS2\INSTALL;D​:\;D​:\OS2\MDOS;D​:\OS2\APPS;i​:\BIN;i​:\UTILS;i​:\EMACS\19.33\BIN;i​:\emx.add\BIN;i​:\EMX\BIN;F​:\PBM;i​:\EMTEX\bin;i​:\ckermit;I​:\PAGETURN;d​:\tcpip\bin;D​:\TCPIP\UMAIL;D​:\VIEWER\BIN;I​:\XFree86\bin;i​:\VABASIC\BIN;i​:\SMART;I​:\Java11\rmi-iiop\bin;I​:\JAVA11\BIN;d​:\JAVAOS2\bin;D​:\MMOS2;I​:\JAVA11\ICATJAVA\BIN;I​:\gs\tools;I​:\gs\gs7.00\bin;
  PERLLIB_PREFIX=f​:/perllib;i​:/perllib
  PERL_BADLANG (unset)
  PERL_SH_DIR=i​:/bin
  SHELL (unset)

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2001

From [Unknown Contact. See original ticket]

Message RFC822:
Message-ID: 00c601c0d46f$3dbbf380$7b42983e@vad
From: Vadim Konovalov [email protected]
To: [email protected]
Cc: [email protected]
Subject: Re: perlio weak place
Date: Fri, 4 May 2001 09:52:53 +0200
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.21)
List-Help: mailto:[email protected]
List-Unsubscribe: mailto:[email protected]
Content-Type: text/plain;
charset="iso-8859-1"

Nick,
thank you for clarifying.
I still have a few questions.

Once again I've faced a can of worms in perl built with BorlandC++
compiler
somewhere near perlio.

While I'm digging around perlio, I've noticed something that might look
dangerous.
Namely, function

PerlIO * PerlIO_allocate(pTHX)
{
/* Find a free slot in the table, allocating new table as necessary */
....

finds a slot in table, which will be filled further by non-zero, so this
slot will be marked as used.

PerlIO_allocate does NOT mark the slot as used - it leaves it as NULL.
When caller "pushes" a layer it will be marked as used.

yes, I see that, and I meant that a slot will be marked further by someone
else. In our case, it's during the "pushing" of a layer when that slot will
be marked as used.
So, it's just catastrophe to do something like:
ptr1=PerlIO_allocate(aTHX);
ptr2=PerlIO_allocate(aTHX);
/do something/
PerlIO_push(aTHX_ ptr1,self,mode,PerlIOArg);

PerlIO_push(aTHX_ ptr2,self,mode,PerlIOArg);
/okay, now ptr1 and ptr2 are intermixed! (actually they will point into
same memory chunk, and freeing them could result in freeing same chunk
twice)
/

Here works a rule that no-one should dare to break: if "PerlIO_allocate" is
called, that pointer must be pushed ASAP, or else unpredictable breakage
will occur.
This is not seen in any comments, just follows from program logic, so it's
easy to make an error here.

Anyway, I can live with that while things are working properly.

Just a question - why not to use normal standard perlish way of allocating
memory?
Why here is a need to implement Just Another Memory Alocator (JAMA -- means
"mountain" in Japanese, thus resembling a huge heap :) )?

Best wishes,
Vadim.

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2001

From [Unknown Contact. See original ticket]

Obviously, doing CRLF in Perl when CRTL supports it already is not a
good idea. What CRLF discipline should do on such an architecture is
set_mode(whatever) on the corresponding FD (with appropriate
flush()ing, of course), and store the info in the flag. That's all.

And the current state of FD should have been queried when PerlIO is
created (may be this is done already?).

Ilya

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2001

From @jhi

Ilya, the PerlIO subsystem has been in open and public development for
more than eight months now. It's too bad you weren't there to find
out and fix bugs since the shape PerlIO is in now is the shape it's
going out in 5.7.2 (not 5.6.2, mind, 5.6.2 will probably wait well
after 5.8.1) AND 5.8.0.

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2001

From [Unknown Contact. See original ticket]

Ilya Zakharevich <ilya@​math.ohio-state.edu> writes​:

This is a bug report for perl from ilya@​ia-ia.uucp,
generated with the help of perlbug 1.33 running under perl v5.7.1.

#!perl -w

open F, "$^X -wle 'binmode STDOUT; print q()' |" or die;
my $c = getc F or warn;
$c eq "\n" or warn ord $c;
close F or die;

is silent on usual perl, and warns "255" on 5.6.2-to-be. No, I have
no idea why it is 255, I constructd this example to demonstrate a
*different* bug of PerlIO.

The *actual* problem is that char-by-char input requires DUPLICATE pressing
of ENTER key for this key to be seen by Perl. Debugging this problem
(via Term​::ReadKey test suite) shows the following logic​:

pp_getc() calls is_eof() which does getc/ungetc
calls getc()

[BTW, I see no logic in this sequence of events.]

It is legacy scheme - for systems where we cannot snoop the buffer
we ask (e.g.) stdio to get a char, if it succeeds we are not at EOF.
But then we need to put char back.

The problem is that ungetc() can't unget "\n" if this \n is the first char
in the buffer, and quietly drops "\n" to the floor. The (horrible) code
looks like this​:

  if \(ch == '\\n'\)
   \{
    if \(b\->ptr \- 2 >= b\->buf\)
     \{

Tracing through perlio.c shows how much EVIL it is. All 80K of it.

If the handle is in text mode then '\n' should have come from a CRLF pair,
so there should be room for it.

I
simply have no words for this code. Extremely inefficient and bloated,
doing enormous amount of work for trivial tasks - not even mentioning
how buggy it turns out to be! Apparently the principal task is to have
CRNL translation, and the principal usage is on Dosish systems - which
*already* do this with binary().

All the handles should be binary() at the OS level. The reason for crlf
layer is that various CRLF->'\n' logic in C runtime on Win32 is
even worse than what I have in perlio.c.

If OS2's CRLF logic is solid an efficent then it may make sense to
NOT use crlf layer but just use OS2's read/write directly.

In the example of Term​::ReadKey, CRTL *knows* that TTYs should be opened
in the binary mode. PerlIO does not.

So get Term​::ReadKey to tell PerlIO it should be binary.

There are miriads of such special
cases.

This code *duplicates* the work already done in the CRTL, thus lead to
double translations (as easy examples show).

That is NOT the intent. The intent is that crlf layer is used when​:
A. The OS CRTL does not do that (e.g. reading DOSISH files on UNIX).
B. When the CRTL is buggy and inefficient (e.g. Borland and VC++ C on Win32).
 

Not even mentioning that
this code is probably 1/10 as efficient as the code in CRTL (and 5 times
bulkier and sub-call-deeper) - in*****tent is the only word I could find
for perlio.c.

The task of CRNL translation is very quick and easy - with a proper
implementation.

The main pain is getting tell() and seek() to work right.

No need to extra copying, no nothing​: just keep an extra
pointer into the buffer. Whatever is before the pointer is translated,
whatever is after is not - and the char immediately after the pointer
should be ignored. On _fill() move the pointer to "\n" in the nearest
"\r\n" pair, and replace the first "\r" by "\n". That's all. ungetc()
modifies only the already-translated part, there is no need to untranslate
things (which is impossible anyway).

I agree it is "impossible" - but there is a _need_. If the handle was in
text mode and then switched back to binary mode then the CRLF need to be
"put back". (E.g. a CRLF MIME header on a file which then switches to embedded
JPEG or whatever.).
If the stream is seek-able then safe approach is to seek and re-read.

Similar things may be done for all
the other needed translations.

Patches welcome. I wrote the CRLF layer very quickly.
But crlf layer does not copy either, we use the CRLF layer as the buffer
layer.

[It is also very efficient to report only the part until this buffer as
available, if it is available. This allows simple inlining of getc().]

I do not even want to mention that in many cases all Perl needs is a
pointer to the buffer, but PerlIO insists on copying the stuff...

PerlIO does _NOT_ insist on copying - that is exactly the sort of thing
the Win32 CRTL do which I am trying to avoid.
There may be the odd spot where implementation is not optimal.

I do not think that PerlIO is ready for 5.6.2.

Hope this helps,
Ilya

---
Flags​:
category=core
severity=critical
---
Site configuration information for perl v5.7.1​:

Configured by vera at Mon Jun 25 01​:33​:44 PDT 2001.

Summary of my perl5 (revision 5.0 version 7 subversion 17) configuration​:
Platform​:
osname=os2, osvers=2.30, archname=os2
uname='os2 ia-ia 2 2.30 i386 '
config_args='-des -Dprefix=i​:/perllib -Dusedevel'
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
Compiler​:
cc='gcc', ccflags ='-Zomf -Zmt -DDOSISH -DOS2=2 -DEMBED -I. -D_EMX_CRT_REV_=63 -Wall',
optimize='-O2 -fomit-frame-pointer -malign-loops=2 -malign-jumps=2 -malign-functions=2 -s',
cppflags='-Zomf -Zmt -DDOSISH -DOS2=2 -DEMBED -I. -D_EMX_CRT_REV_=63 -Wall'
ccversion='', gccversion='2.8.1', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=4
alignbytes=4, usemymalloc=y, prototype=define
Linker and Libraries​:
ld='gcc', ldflags ='-Zexe -Zomf -Zmt -Zcrtdll -Zstack 32000 -Zlinker /e​:2'
libpth=i​:/emx.add/lib i​:/emx/lib D​:/DEVTOOLS/OPENGL/LIB I​:/JAVA11/LIB i​:/emx/lib/mt
libs=-lsocket -lm -lbsd
perllibs=-lsocket -lm -lbsd
libc=i​:/emx/lib/mt/c_import.lib, so=dll, useshrplib=true, libperl=libperl.lib
Dynamic Linking​:
dlsrc=dl_dlopen.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
cccdlflags='-Zdll', lddlflags='-Zdll -Zomf -Zmt -Zcrtdll -Zlinker /e​:2'

Locally applied patches​:
DEVEL10906

---
@​INC for perl v5.7.1​:
lib/os2
lib
i​:/perllib/lib/5.7.1/os2
i​:/perllib/lib/5.7.1
i​:/perllib/lib/site_perl/5.7.1/os2
i​:/perllib/lib/site_perl/5.7.1
i​:/perllib/lib/site_perl/5.00553/os2
i​:/perllib/lib/site_perl/5.00553
i​:/perllib/lib/site_perl
.

---
Environment for perl v5.7.1​:
HOME=j​:/home
LANG=EN_US
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=i​:\NETSCAPE\PROGRAM;d​:\OPENDOC\EPM;d​:\OPENDOC\OREXX;d​:\OPENDOC\BIN;i​:\DRV;i​:\TOOLKIT\BETA\BIN\PMEISTER;i​:\TOOLKIT\BETA\BIN;i​:\TOOLKIT\BIN\OSARED;i​:\TOOLKIT\SOM\COMMON;i​:\TOOLKIT\SOM\BIN;i​:\TOOLKIT\BIN;D​:\OS2;D​:\OS2\SYSTEM;i​:\VABASIC\SOM\BIN;D​:\OS2\INSTALL;D​:\;D​:\OS2\MDOS;D​:\OS2\APPS;i​:\BIN;i​:\UTILS;i​:\EMACS\19.33\BIN;i​:\emx.add\BIN;i​:\EMX\BIN;F​:\PBM;i​:\EMTEX\bin;i​:\ckermit;I​:\PAGETURN;d​:\tcpip\bin;D​:\TCPIP\UMAIL;D​:\VIEWER\BIN;I​:\XFree86\bin;i​:\VABASIC\BIN;i​:\SMART;I​:\Java11\rmi-iiop\bin;I​:\JAVA11\BIN;d​:\JAVAOS2\bin;D​:\MMOS2;I​:\JAVA11\ICATJAVA\BIN;I​:\gs\tools;I​:\gs\gs7.00\bin;
PERLLIB_PREFIX=f​:/perllib;i​:/perllib
PERL_BADLANG (unset)
PERL_SH_DIR=i​:/bin
SHELL (unset)

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2001

From [Unknown Contact. See original ticket]

Additionally, there are several places where local temporary variables assigned just to make further assignment less verbose.
("buf" in PerlIOStdio_read function is an example).

That is not the reason. The reason is to get the function/macro calls out of the
inner loop.

IMHO while idea of perlio.c is good, and it allows cool things, it's implementation could be cleaned greatly.

Patches welcome.

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2001

From [Unknown Contact. See original ticket]

Ilya Zakharevich <ilya@​math.ohio-state.edu> writes​:

On Tue, Jul 03, 2001 at 11​:20​:42AM +0200, Konovalov, Vadim Vladimirovich (Vadim) wrote​:

I've reported similar thoughts a couple of months ago - see attached message.

Additionally, there are several places where local temporary variables assigned just to make further assignment less verbose. ("buf" in PerlIOStdio_read function is an example).

IMHO while idea of perlio.c is good, and it allows cool things, it's implementation could be cleaned greatly.

Obviously, doing CRLF in Perl when CRTL supports it already is not a
good idea.

Try stepping through Borland C CRTL version and see if you still agree...
Also note that win32/*.c has ActiveState re-implemtation of parts of MSVC's
read() because there are bugs in MS's DLL.

What CRLF discipline should do on such an architecture is
set_mode(whatever) on the corresponding FD (with appropriate
flush()ing, of course), and store the info in the flag. That's all.

And the current state of FD should have been queried when PerlIO is
created (may be this is done already?).

Querying the state needs Configure-oid probing to see how the CRTL flags
that state. There isn't a clean API. Switching modes other than
just-after-open is not reliable on most CRTLs.

Ilya

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2001

From [Unknown Contact. See original ticket]

This code *duplicates* the work already done in the CRTL,
thus lead to
double translations (as easy examples show).

That is NOT the intent. The intent is that crlf layer is used when​:
A. The OS CRTL does not do that (e.g. reading DOSISH files on UNIX).
B. When the CRTL is buggy and inefficient (e.g. Borland and
VC++ C on Win32).

(speaking of layers in general)
  C. To implement transparent pack/unpack/encode/crypt via something like PerlIO​::gzip layer.
  D. utf
There are probably more to add, by I'm not that knowledgable.

I want to add that things became worse​: I can't report a bug in the Borland CRT anymore :(
Looks like they've introduced something like chargeable bug support :((

And that bug makes perl compiled by Borland not to pass test suite ...

Best wishes,
<!ENTITY Vadim REALLIFE "Vadim V.Konovalov, St.Petersburg, Russia">
&Vadim;

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2001

From @jhi

On Fri, Jul 06, 2001 at 02​:58​:31PM +0200, Konovalov, Vadim Vladimirovich (Vadim) wrote​:

This code *duplicates* the work already done in the CRTL,
thus lead to
double translations (as easy examples show).

That is NOT the intent. The intent is that crlf layer is used when​:
A. The OS CRTL does not do that (e.g. reading DOSISH files on UNIX).
B. When the CRTL is buggy and inefficient (e.g. Borland and
VC++ C on Win32).

(speaking of layers in general)
C. To implement transparent pack/unpack/encode/crypt via something like PerlIO​::gzip layer.
D. utf
There are probably more to add, by I'm not that knowledgable.

I want to add that things became worse​: I can't report a bug in the Borland CRT anymore :(
Looks like they've introduced something like chargeable bug support :((

That's insane if that means they will not even accept bug reports.
You are doing them a service by telling their product is broken, and
they want to charge you for that? Hey, you should be sending *them*
a bill...

(I could understand charging if there were a *support contract* of some
sort in place, e.g. they promise to do something about a reported bug
(e.g. a suppport engineer would answer, not necessarily *fix* the bug...)
within some time limit.)

Send bug reports from one-time anonymous hotmail accounts? :-)

And that bug makes perl compiled by Borland not to pass test suite ...

Best wishes,
<!ENTITY Vadim REALLIFE "Vadim V.Konovalov, St.Petersburg, Russia">
&Vadim;

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2001

From [Unknown Contact. See original ticket]

Additionally, there are several places where local temporary
variables assigned just to make further assignment less verbose.
("buf" in PerlIOStdio_read function is an example).

That is not the reason. The reason is to get the
function/macro calls out of the
inner loop.

Sorry, I can't see any loops there​:

SSize_t
PerlIOStdio_read(PerlIO *f, void *vbuf, Size_t count)
{
dSYS;
FILE *s = PerlIOSelf(f,PerlIOStdio)->stdio;
SSize_t got = 0;
if (count == 1)
  {
  STDCHAR *buf = (STDCHAR *) vbuf;
  /* Perl is expecting PerlIO_getc() to fill the buffer
  * Linux's stdio does not do that for fread()
  */
  int ch = PerlSIO_fgetc(s);
  if (ch != EOF)
  {
  *buf = ch;
  got = 1;
  }
  }
else
  got = PerlSIO_fread(vbuf,1,count,s);
return got;
}

Anyway, that tiny snippet of code is not essential, so it does not worth much debating...

IMHO while idea of perlio.c is good, and it allows cool
things, it's implementation could be cleaned greatly.

Patches welcome.

Okay.
Here is a tiny-optimization patch. Although patch is obvious, I checked for perl to pass tests after applying it.

Inline Patch
--- d:\WORK\PerlCompile\perl@11148-orig\perlio.c	Sat Jun 30 22:41:57 2001
+++ d:\WORK\PerlCompile\perl@11148\perlio.c	Fri Jul  6 17:07:24 2001
@@ -2278,14 +2278,13 @@
  SSize_t got = 0;
  if (count == 1)
   {
-   STDCHAR *buf = (STDCHAR *) vbuf;
    /* Perl is expecting PerlIO_getc() to fill the buffer
     * Linux's stdio does not do that for fread()
     */
    int ch = PerlSIO_fgetc(s);
    if (ch != EOF)
     {
-     *buf = ch;
+     *((STDCHAR *)vbuf) = ch;
      got = 1;
     }
   }


I'm not sure that I am able to produce more essential patches, but I'll try.

Best wishes,
<!ENTITY Vadim REALLIFE "Vadim V.Konovalov, St.Petersburg, Russia">
&Vadim;

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2001

From @doughera88

On Fri, 6 Jul 2001, Konovalov, Vadim Vladimirovich (Vadim) wrote​:

Additionally, there are several places where local temporary
variables assigned just to make further assignment less verbose.
("buf" in PerlIOStdio_read function is an example).

So? What's wrong with being less verbose? Clarity counts too,
though obviously different authors have different views of
clarity.

Okay.
Here is a tiny-optimization patch. Although patch is obvious, I checked for perl to pass tests after applying it.

[ patch snipped ]

This is rather pointless since any decent optimizer ought to do the same
for you anyway. I checked. Both gcc and Sun's cc produced identical code
for the two versions.

If you think the changed code is somehow "cleaner", that's fine and not
worth arguing about.

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2001

From [Unknown Contact. See original ticket]

Additionally, there are several places where local temporary
variables assigned just to make further assignment less verbose.
("buf" in PerlIOStdio_read function is an example).

So? What's wrong with being less verbose? Clarity counts too,
though obviously different authors have different views of
clarity.

Okay.
Here is a tiny-optimization patch. Although patch is
obvious, I checked for perl to pass tests after applying it.

[ patch snipped ]

This is rather pointless since any decent optimizer ought to
do the same
for you anyway. I checked. Both gcc and Sun's cc produced
identical code
for the two versions.

agreed, and that's why I did not suggested that patch earlier. Precisely, I thought that compiler should optimize that stuff away, and that is why I did not included this (?​:pseudo-)?optimization patch into my 11079 one.

If you think the changed code is somehow "cleaner", that's
fine and not
worth arguing about.

It is obvious that perlio.c is written in a different style that other perl itself, and one could argue whether this is good or bad.
I accept "cleaner" or not style as it is.

Best wishes,
Vadim V.Konovalov, Software developer
ZAO Lucent Technologies, St.Petersburg branch
e-mail​: vkonovalov@​lucent.com
phone​: +7(812)3298522

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2001

From [Unknown Contact. See original ticket]

Vadim Vladimirovich Konovalov <vkonovalov@​lucent.com> writes​:

Additionally, there are several places where local temporary
variables assigned just to make further assignment less verbose.
("buf" in PerlIOStdio_read function is an example).

That is not the reason. The reason is to get the
function/macro calls out of the
inner loop.

Sorry, I can't see any loops there​:

SSize_t
PerlIOStdio_read(PerlIO *f, void *vbuf, Size_t count)
{
dSYS;
FILE *s = PerlIOSelf(f,PerlIOStdio)->stdio;
SSize_t got = 0;
if (count == 1)
{
STDCHAR *buf = (STDCHAR *) vbuf;
/* Perl is expecting PerlIO_getc() to fill the buffer
* Linux's stdio does not do that for fread()
*/
int ch = PerlSIO_fgetc(s);
if (ch != EOF)
{
*buf = ch;
got = 1;
}
}
else
got = PerlSIO_fread(vbuf,1,count,s);
return got;
}

But s is used in both branches of the if - so factoring it out saves code
size.

Sorry this is my style, it isn't completely pointless - having "loads"
at the top of the function tends help instruction schedulers put them
in the "right place". GCC at least used to fail to put "loads" like
that in the delay slot of the jump that implements the if() - it did
not spot that there was a "common" instruction in the two basic blocks.

Anyway, that tiny snippet of code is not essential, so it does not worth much debating...

IMHO while idea of perlio.c is good, and it allows cool
things, it's implementation could be cleaned greatly.

Patches welcome.

Okay.
Here is a tiny-optimization patch. Although patch is obvious, I checked for perl to pass tests after applying it.

--- d​:\WORK\PerlCompile\perl@​11148-orig\perlio.c Sat Jun 30 22​:41​:57 2001
+++ d​:\WORK\PerlCompile\perl@​11148\perlio.c Fri Jul 6 17​:07​:24 2001
@​@​ -2278,14 +2278,13 @​@​
SSize_t got = 0;
if (count == 1)
{
- STDCHAR *buf = (STDCHAR *) vbuf;
/* Perl is expecting PerlIO_getc() to fill the buffer
* Linux's stdio does not do that for fread()
*/
int ch = PerlSIO_fgetc(s);
if (ch != EOF)
{
- *buf = ch;
+ *((STDCHAR *)vbuf) = ch;
got = 1;
}
}

Does that actually change the generated code at all?
IIRC there is at least one compiler which is used for perl which
does not like lvalue casts. I seem to remember doing the inverse of your
patch to fix that.

I'm not sure that I am able to produce more essential patches, but I'll try.

Best wishes,
<!ENTITY Vadim REALLIFE "Vadim V.Konovalov, St.Petersburg, Russia">
&Vadim;

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2001

From [Unknown Contact. See original ticket]

That is not the reason. The reason is to get the
function/macro calls out of the
inner loop.

Sorry, I can't see any loops there​:

SSize_t
PerlIOStdio_read(PerlIO *f, void *vbuf, Size_t count)
{
dSYS;
FILE *s = PerlIOSelf(f,PerlIOStdio)->stdio;
SSize_t got = 0;
if (count == 1)
{
STDCHAR *buf = (STDCHAR *) vbuf;
/* Perl is expecting PerlIO_getc() to fill the buffer
* Linux's stdio does not do that for fread()
*/
int ch = PerlSIO_fgetc(s);
if (ch != EOF)
{
*buf = ch;
got = 1;
}
}
else
got = PerlSIO_fread(vbuf,1,count,s);
return got;
}

But s is used in both branches of the if - so factoring it out saves code
size.

... but "buf" is never used except inside "if" condition, so it'll be faster
to use it only when it is used.

Sorry this is my style, it isn't completely pointless - having "loads"
at the top of the function tends help instruction schedulers put them
in the "right place". GCC at least used to fail to put "loads" like
that in the delay slot of the jump that implements the if() - it did
not spot that there was a "common" instruction in the two basic blocks.

Nick, I really appreciate your perl/Tk work and am sure about high quality
of your code.

Patches welcome.

Okay.
Here is a tiny-optimization patch. Although patch is obvious, I checked
for perl to pass tests after applying it.

--- d​:\WORK\PerlCompile\perl@​11148-orig\perlio.c Sat Jun 30 22​:41​:57 2001
+++ d​:\WORK\PerlCompile\perl@​11148\perlio.c Fri Jul 6 17​:07​:24 2001
@​@​ -2278,14 +2278,13 @​@​
SSize_t got = 0;
if (count == 1)
{
- STDCHAR *buf = (STDCHAR *) vbuf;
/* Perl is expecting PerlIO_getc() to fill the buffer
* Linux's stdio does not do that for fread()
*/
int ch = PerlSIO_fgetc(s);
if (ch != EOF)
{
- *buf = ch;
+ *((STDCHAR *)vbuf) = ch;
got = 1;
}
}

Does that actually change the generated code at all?
IIRC there is at least one compiler which is used for perl which
does not like lvalue casts. I seem to remember doing the inverse of your
patch to fix that.

mmm... I suggested not that much patches, so I have a feeling that I
remember them all.
One patch - to rule them all, one to forget, :)

joke, forget this.

Being serious, I have a feeling that all compilers must be able to do lvalue
casts. If your experience says otherwise (I beleive your experience much
wider that mine), then I'll do nothing except I'll agree with your point.

Really warmest wishes,
<!ENTITY Vadim REALLIFE "Vadim V.Konovalov, St.Petersburg, Russia">
&Vadim;

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2001

From [Unknown Contact. See original ticket]

On Fri, Jul 06, 2001 at 01​:39​:10PM +0100, Nick Ing-Simmons wrote​:

pp_getc() calls is_eof() which does getc/ungetc
calls getc()

[BTW, I see no logic in this sequence of events.]

It is legacy scheme - for systems where we cannot snoop the buffer
we ask (e.g.) stdio to get a char, if it succeeds we are not at EOF.
But then we need to put char back.

We *can* snoop the buffer with PerlIO (and without it too in my setup).

  if \(ch == '\\n'\)
   \{
    if \(b\->ptr \- 2 >= b\->buf\)
     \{

Tracing through perlio.c shows how much EVIL it is. All 80K of it.

If the handle is in text mode then '\n' should have come from a CRLF pair,
so there should be room for it.

Absolutely not. It can com from CRLF or from a lone LF.

All the handles should be binary() at the OS level.

This is unacceptable if the CRTL level CRLF handling is 10x as
efficient as PerlIO's (which I expect holds for EMX - and any other
sanely coded handler).

If OS2's CRLF logic is solid an efficent then it may make sense to
NOT use crlf layer but just use OS2's read/write directly.

Agreed...

In the example of Term​::ReadKey, CRTL *knows* that TTYs should be opened
in the binary mode. PerlIO does not.

There are miriads of such special
cases.

So get Term​::ReadKey to tell PerlIO it should be binary.

You propose to duplicate the CRTL's knowledge of the system's
specifics in all the modules?

The task of CRNL translation is very quick and easy - with a proper
implementation.

The main pain is getting tell() and seek() to work right.

This is just plain impossible without an enormous overhead of marking
each \n with whether it came from \n or \r\n.

A reasonable target is to make tell() work on the results of seek().
This is what EMX does, and what causes very few well-localized
complaints.

Ilya

@p5pRT
Copy link
Author

p5pRT commented Jul 6, 2001

From [Unknown Contact. See original ticket]

On Fri, Jul 06, 2001 at 01​:46​:05PM +0100, Nick Ing-Simmons wrote​:

And the current state of FD should have been queried when PerlIO is
created (may be this is done already?).

Querying the state needs Configure-oid probing to see how the CRTL flags
that state.

No need to Configure. There is win32ish.h etc.

There isn't a clean API. Switching modes other than
just-after-open is not reliable on most CRTLs.

Existence of non-reliable CRTLs should not make it not work anywhere.

Ilya

@p5pRT
Copy link
Author

p5pRT commented Jul 9, 2001

From [Unknown Contact. See original ticket]

Ilya Zakharevich <ilya@​math.ohio-state.edu> writes​:

On Fri, Jul 06, 2001 at 01​:46​:05PM +0100, Nick Ing-Simmons wrote​:

And the current state of FD should have been queried when PerlIO is
created (may be this is done already?).

Querying the state needs Configure-oid probing to see how the CRTL flags
that state.

No need to Configure. There is win32ish.h etc.

It is Configure-oid - because Borland, VC++ and Mingw32 all have different
header files - so you at least need a 3-way #ifdef nest.

There isn't a clean API. Switching modes other than
just-after-open is not reliable on most CRTLs.

Existence of non-reliable CRTLs should not make it not work anywhere.

Agreed.

Ilya

@p5pRT
Copy link
Author

p5pRT commented Jul 9, 2001

From [Unknown Contact. See original ticket]

On Mon, Jul 09, 2001 at 09​:10​:14AM +0100, Nick Ing-Simmons wrote​:

Querying the state needs Configure-oid probing to see how the CRTL flags
that state.

No need to Configure. There is win32ish.h etc.

It is Configure-oid - because Borland, VC++ and Mingw32 all have different
header files - so you at least need a 3-way #ifdef nest.

For two macros​: get-state/set-state? Since set-state is already
covered (for binmode()), all one needs is to add get-state to already
existing #ifdefs.

Even if no #ifdef nest already exists, it is trivial to add. It is
not that tens of different new Win32ish compilers will jump into
existence.

Ilya

@p5pRT
Copy link
Author

p5pRT commented Apr 13, 2010

@chorny - Status changed from 'open' to 'stalled'

@p5pRT
Copy link
Author

p5pRT commented Jul 5, 2012

From @doy

Does anything still need to be done here? I don't know if PerlIO has
gotten any less evil, but it does look like it's here to stay these
days. Are there still specific issues in this ticket that haven't been
addressed, or can it be closed?

-doy

@p5pRT
Copy link
Author

p5pRT commented Jul 5, 2012

The RT System itself - Status changed from 'stalled' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Sep 22, 2012

From @jkeenan

No response in over two months to doy's request for feedback.

Closing ticket.

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Sep 22, 2012

From [Unknown Contact. See original ticket]

No response in over two months to doy's request for feedback.

Closing ticket.

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

p5pRT commented Sep 22, 2012

@jkeenan - Status changed from 'open' to 'rejected'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant