Skip to content

crypt() works randomly with utf-8 #4784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
p5pRT opened this issue Jan 4, 2002 · 22 comments
Closed

crypt() works randomly with utf-8 #4784

p5pRT opened this issue Jan 4, 2002 · 22 comments

Comments

@p5pRT
Copy link

p5pRT commented Jan 4, 2002

Migrated from rt.perl.org#8180 (status was 'resolved')

Searchable as RT8180$

@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2002

From [email protected]

Created by [email protected]

crypt in current devels behaves randomly because pp_crypt "decodes" the
password, but allocates a too small buffer​:

  Newz(688, t, len, char);

this doesn't leave space for the final 0 byte (which crypt needs). An
(untested) but "obviously" ;) correct fix is to replace the above line
with​:

  Newz(688, t, len + 1, char);

Have fun ;)

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.7.2:

Configured by root at Mon Dec 17 04:14:59 CET 2001.

Summary of my perl5 (revision 5.0 version 7 subversion 2 patch 13722) configuration:
  Platform:
    osname=linux, osvers=2.4, archname=i686-linux
    uname='linux cerebro 2.4.8-ac9 #7 smp thu aug 30 00:15:46 cest 2001 i686 unknown '
    config_args=''
    hint=previous, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=y, bincompat5005=undef
  Compiler:
    cc='gcc-2.95.4', ccflags ='-fno-strict-aliasing -I/usr/local/include -I/opt/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-g -Os -march=pentium -mcpu=pentium -funroll-loops',
    cppflags='-fno-strict-aliasing -I/usr/local/include -I/opt/include -fno-strict-aliasing -I/usr/local/include -I/opt/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -fno-strict-aliasing -I/usr/local/include -I/opt/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    ccversion='', gccversion='2.95.4 20010319 (prerelease)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc-2.95.4', ldflags ='-L/usr/local/lib -L/opt/lib'
    libpth=/usr/local/lib /lib /usr/lib /opt/lib
    libs=-ldl -lm -lc -lcrypt
    perllibs=-ldl -lm -lc -lcrypt
    libc=/lib/libc-2.2.4.so, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib -L/opt/lib'

Locally applied patches:
    DEVEL13686


@INC for perl v5.7.2:
    /opt/perl/lib/perl5
    /opt/perl/lib/perl5
    /opt/perl/lib/perl5
    /opt/perl/lib/perl5
    .


Environment for perl v5.7.2:
    HOME=/root
    LANG (unset)
    LANGUAGE (unset)
    LC_CTYPE=de_DE
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/root/s2:/root/s:/opt/qt/bin:/bin:/usr/bin:/usr/app/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/app/bin:/usr/app/sbin:/usr/X11/bin:/opt/jdk118/bin:/opt/bin:/opt/sbin:.:/root/cc/dejagnu/bin
    PERLDB_OPTS=ornaments=0
    PERL_BADLANG (unset)
    SHELL=/bin/bash


@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2002

From @jhi

On Sat, Jan 05, 2002 at 12​:40​:39AM +0100, Marc Lehmann wrote​:

This is a bug report for perl from root@​cerebro.laendle,
generated with the help of perlbug 1.33 running under perl v5.7.2.

-----------------------------------------------------------------
[Please enter your report here]

crypt in current devels behaves randomly

Hmmm, not for any of my platforms (Linux, Solaris, HP-UX, Tru64).

because pp_crypt "decodes" the
password, but allocates a too small buffer​:

Newz(688, t, len, char);

this doesn't leave space for the final 0 byte (which crypt needs). An

Documented... where?

(untested) but "obviously" ;) correct fix is to replace the above line
with​:

Newz(688, t, len + 1, char);

But I can add that one byte there if it makes your crypt() happier.

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2002

From @jhi

On Sat, Jan 05, 2002 at 12​:40​:39AM +0100, Marc Lehmann wrote​:

This is a bug report for perl from root@​cerebro.laendle,
generated with the help of perlbug 1.33 running under perl v5.7.2.

-----------------------------------------------------------------
[Please enter your report here]

crypt in current devels behaves randomly because pp_crypt "decodes" the
password, but allocates a too small buffer​:

Newz(688, t, len, char);

this doesn't leave space for the final 0 byte (which crypt needs). An
(untested) but "obviously" ;) correct fix is to replace the above line
with​:

Newz(688, t, len + 1, char);

Hmmm, the Newz() is relevant only for the DO_UTF8() branch. I hope
you realize that feeding UTF-8 to crypt() is highly unexpected by
any vendor :-) We just fake a somewhat reasonable behaviour, grabbing
the lowest byte of the Unicode characters and then crypt()ing that.

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2002

From [Unknown Contact. See original ticket]

On Sat, Jan 05, 2002 at 02​:34​:26AM +0200, Jarkko Hietaniemi <jhi@​iki.fi> wrote​:

crypt in current devels behaves randomly

Hmmm, not for any of my platforms (Linux, Solaris, HP-UX, Tru64).

This is wrong. It will fail on EACH and EVERY UNIX, unless malloc
does very funny things. Just like strcpy and strcmp and open expect a
zero-terminated string, so does crypt.

Newz(688, t, len, char);

Documented... where?

Does it need documentation? common sense tells me that crypt doesn't
magically guess password length for me without a trailing zero byte.

Newz(688, t, len + 1, char);
But I can add that one byte there if it makes your crypt() happier.

_my_ crypt? how does your crypt guess your password length when it gets
"xxyy\0" but the user only typed in "xx" and "yy\0" just happens to be
memory garbage after the allocated string?

On Sat, Jan 05, 2002 at 02​:37​:55AM +0200, Jarkko Hietaniemi <jhi@​iki.fi> wrote​:

Hmmm, the Newz() is relevant only for the DO_UTF8() branch. I hope
you realize that feeding UTF-8 to crypt() is highly unexpected by
any vendor :-)

Sorry, but who told me that the *internal* representation of strings does
not matter to the perl level? I have *no* way of knowing wether the utf8
flag has been set on the scalar in question.

And as long as perl randomly converts latin1 to utf-8 at various points
(concatanation etc.). crypt will sometimes get a utf-8 string, and perl
*must* do "appropriate" character conversion for me in this and any other
syscall or library function epxecting a string. Appropriate might mean
"passing it through unprocessed".

It's perfectly safe to feed in utf-8 into *any* vendors crypt as long as
all characters are <256 (as a matter of fact, crypt only takes the lowest
7 bits anyway).

It's even perfectly safe to feed utf-8 with >255 chars into crypt, the
behaviour is perfectly deterministic and documented.

We just fake a somewhat reasonable behaviour, grabbing the lowest byte
of the Unicode characters and then crypt()ing that.

You confuse Unicode with characters > 256. Characters below 256 exist in
unicode, and are perfectly safe to feed into any syscall.

--
  -----==- |
  ----==-- _ |
  ---==---(_)__ __ ____ __ Marc Lehmann +--
  --==---/ / _ \/ // /\ \/ / pcg@​goof.com |e|
  -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
  The choice of a GNU generation |
  |

@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2002

From [Unknown Contact. See original ticket]

On Sat, Jan 05, 2002 at 02​:36​:22AM +0100, Marc Lehmann <pcg@​goof.com> wrote​:

It's perfectly safe to feed in utf-8 into *any* vendors crypt as long as
  ^^^^^^^^^ unicode

You confuse Unicode with characters > 256. Characters below 256 exist in
  ^^^^^^ 255

--
  -----==- |
  ----==-- _ |
  ---==---(_)__ __ ____ __ Marc Lehmann +--
  --==---/ / _ \/ // /\ \/ / pcg@​goof.com |e|
  -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
  The choice of a GNU generation |
  |

@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2002

From @jhi

On Sat, Jan 05, 2002 at 02​:36​:22AM +0100, Marc A. Lehmann wrote​:

On Sat, Jan 05, 2002 at 02​:34​:26AM +0200, Jarkko Hietaniemi <jhi@​iki.fi> wrote​:

crypt in current devels behaves randomly

Hmmm, not for any of my platforms (Linux, Solaris, HP-UX, Tru64).

This is wrong. It will fail on EACH and EVERY UNIX, unless malloc
does very funny things. Just like strcpy and strcmp and open expect a
zero-terminated string, so does crypt.

Newz(688, t, len, char);

Documented... where?

Does it need documentation? common sense tells me that crypt doesn't
magically guess password length for me without a trailing zero byte.

Newz(688, t, len + 1, char);
But I can add that one byte there if it makes your crypt() happier.

_my_ crypt? how does your crypt guess your password length when it gets
"xxyy\0" but the user only typed in "xx" and "yy\0" just happens to be
memory garbage after the allocated string?

Okay, on this you are right.

On Sat, Jan 05, 2002 at 02​:37​:55AM +0200, Jarkko Hietaniemi <jhi@​iki.fi> wrote​:

Hmmm, the Newz() is relevant only for the DO_UTF8() branch. I hope
you realize that feeding UTF-8 to crypt() is highly unexpected by
any vendor :-)

Sorry, but who told me that the *internal* representation of strings does
not matter to the perl level? I have *no* way of knowing wether the utf8
flag has been set on the scalar in question.

And as long as perl randomly converts latin1 to utf-8 at various points
(concatanation etc.). crypt will sometimes get a utf-8 string, and perl
*must* do "appropriate" character conversion for me in this and any other
syscall or library function epxecting a string. Appropriate might mean
"passing it through unprocessed".

It's perfectly safe to feed in utf-8 into *any* vendors crypt as long as
all characters are <256 (as a matter of fact, crypt only takes the lowest
7 bits anyway).

It's even perfectly safe to feed utf-8 with >255 chars into crypt, the
behaviour is perfectly deterministic and documented.

We just fake a somewhat reasonable behaviour, grabbing the lowest byte
of the Unicode characters and then crypt()ing that.

You confuse Unicode with characters > 256. Characters below 256 exist in
unicode, and are perfectly safe to feed into any syscall.

--
-----==- |
----==-- _ |
---==---(_)__ __ ____ __ Marc Lehmann +--
--==---/ / _ \/ // /\ \/ / pcg@​goof.com |e|
-=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
The choice of a GNU generation |
|

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Jan 4, 2002

From @jhi

(1) The dynamic allocation with Newz() happens only in the case when
  the string is UTF-8 encoded. Rare case, and new code​: that's why
  we haven't met it much yet.
  (In the non-UTF-8 case the standard zero-terminated PVs are used.)

(2) Even if we had met the UTF0-8 case more often, we would have been
  overly lucky by that fact the crypt() [1] only reads at most eight
  characters-- and malloc probably doesn't deal in much smaller blocks.
  So no buffer overflows (garbage, yes), by silly blind luck.
  But now the zero byte will be there, that's great.

(3) Trying to argue that crypt() would have any sensible meaning on the
  raw UTF-8 bytes is madness. The only other option I see is simply
  croak()ing if fed UTF-8 data. crypt() is defined to operate on
  the bits of bytes. Not characters. Not variable-length UTF-8
  encoded Unicode characters. I'm sorry but that's that, and that's
  about all on this matter I will care to argue.

[1] At least the old de facto standard crypt(). Yes, there are fancier
crypt() out there that read more input.

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2002

From [Unknown Contact. See original ticket]

Marc Lehmann wrote​:
|> It's perfectly safe to feed in utf-8 into *any* vendors crypt as long as
|> all characters are <256 (as a matter of fact, crypt only takes the lowest
|> 7 bits anyway).
|>
|> It's even perfectly safe to feed utf-8 with >255 chars into crypt, the
|> behaviour is perfectly deterministic and documented.
|...
|> You confuse Unicode with characters > 256. Characters below 256 exist in
|> unicode, and are perfectly safe to feed into any syscall.

One small nit about that -- one that doesn't really impact for or against
your argument, but just to clarify, if only for myself, about Perl/utf8....

Utf8 looks like ASCII or values < 128. From 128 through 255 (and beyond),
characters are represented by multiple bytes. So I think everywhere where
you wrote 255/256, you probably should have 127/128.

But then, if all characters of a string have ordinals <= 255, perl does
(or merely tries?) to keep the encoding native, and not utf8.

I guess that's the question. If a system's native encoding is byte-oriented,
and all characters can fit into bytes, is it *guaranteed* to be stored
in the native (syscall-ready) encoding, and not in utf8?

|> Sorry, but who told me that the *internal* representation of strings
|> does not matter to the perl level?

Jarkko Responds​:
|> The only other option I see is simply croak()ing if fed UTF-8 data. ...
|> I'm sorry but that's that, and that's about all on this matter I will
|> care to argue.

I've heard that NT syscalls, for example, accept strings as UTS-2 (16-bit
entities), and so on such systems, the internal representation must be
converted before it can be used, yes?

For 8-bit systems, I guess it hinges on the guarantee. If the guarantee
really is true (which I'd suspect, given Jarkko's reply), there's really no
way to present the data to the system call (crypt, or any other) in a way
that makes any sense. But if the guarantee isn't true, shouldn't a
conversion at least be attempted, to see if all characters can fit in the
native encoding?

  Jeffrey

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2002

From [Unknown Contact. See original ticket]

On Sat, 5 Jan 2002, Marc wrote​:

On Sat, Jan 05, 2002 at 02​:34​:26AM +0200, Jarkko Hietaniemi <jhi@​iki.fi> wrote​:

crypt in current devels behaves randomly

Hmmm, not for any of my platforms (Linux, Solaris, HP-UX, Tru64).

This is wrong. It will fail on EACH and EVERY UNIX, unless malloc
does very funny things. Just like strcpy and strcmp and open expect a
zero-terminated string, so does crypt.

Newz(688, t, len, char);

Documented... where?

Does it need documentation? common sense tells me that crypt doesn't
magically guess password length for me without a trailing zero byte.

The manpage for the crypt() on Linux says :

  By taking the lowest 7 bit of each character of the key, a
  56-bit key is obtained.

Which implies that it doesn't guess the length but rather assumes that it
is only eight characters - I believe this behaviour is sanctioned by the
appropriate standards.

/J\
--
Jonathan Stowe |
<http​://www.gellyfish.com> | This space for rent
  |

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2002

From @jhi

_my_ crypt? how does your crypt guess your password length when it gets
"xxyy\0" but the user only typed in "xx" and "yy\0" just happens to be
memory garbage after the allocated string?

Yes, you are right, a terminating zero byte needs to be there, and
I must have been hallucinating. Maybe it's because of my out-of-band
knowledge that crypt() works on max eight characters (knowledge which
is wrong if crypt() works on more characters than that, *cough*...)
And now the zero byte is in there, let's move on.

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2002

From [Unknown Contact. See original ticket]

On Sat, Jan 05, 2002 at 11​:04​:19AM +0000, Jonathan Stowe <gellyfish@​gellyfish.com> wrote​:

   56\-bit  key  is  obtained\.

Which implies that it doesn't guess the length but rather assumes that it
is only eight characters - I believe this behaviour is sanctioned by the
appropriate standards.

If this were true, every login program would be in error, as would be perl.

No matter what, the fix is a required fix.

--
  -----==- |
  ----==-- _ |
  ---==---(_)__ __ ____ __ Marc Lehmann +--
  --==---/ / _ \/ // /\ \/ / pcg@​goof.com |e|
  -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
  The choice of a GNU generation |
  |

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2002

From [Unknown Contact. See original ticket]

On Sat, Jan 05, 2002 at 01​:02​:32AM -0800, Jeffrey Friedl <jfriedl@​yahoo.com> wrote​:

|> You confuse Unicode with characters > 256. Characters below 256 exist in
|> unicode, and are perfectly safe to feed into any syscall.

Utf8 looks like ASCII or values < 128. From 128 through 255 (and beyond),
characters are represented by multiple bytes. So I think everywhere where
you wrote 255/256, you probably should have 127/128.

I think 255/256 is right. All world is not the US, and 8-bit-characters are
widely used outwside the US and most sysclals, as a natter of fact, unde
runix, expect 8-bit.

Under linux, for example, syscalls are defined to use utf-8, but nobody uses
that, instead everybody just uses his own native encoding.

Now Jarkko claimsthat perl programmers must not now the internal
representation of characters, so of course perl *must* comvert to the
native encoding deterministically, wether it's UCS-2, utf-8 or latin1 or
8-bit-bytes.

But then, if all characters of a string have ordinals <= 255, perl does
(or merely tries?) to keep the encoding native, and not utf8.

Perl tries, not very hard. I found trhe bug in crypt (which, again, is the
missing zero byte on the C string and not the charatcer encoding, which
didn't matter) this way​:

  $pass = "public";
  warn "pass is $pass\n";
  crypt $pass,

Since $pass was mareked as utf-8, I triggered the (non-utf-8-related) bug in
crypt. I have no iudea why Jarkoo claims nonsense like c strings don't need
zero termination or whatever.

(I highly regard Jarkko as a great programmer, but his treatement of
perfectly bugs and fixes, as witnessed by private mail sI got from him, is
unacceptable (the ix doesn't fix anything, but I'll apply it anyways)).

and all characters can fit into bytes, is it *guaranteed* to be stored
in the native (syscall-ready) encoding, and not in utf8?

I think not in all cases, but in most (so I agree mostly with
jarkko). This is why I am so puzzled about Jarkko switching sides and now
requiring perl programmers to call utf8_downgreade or uitf_8 manually
before calling crypt, which is entirely unreasnable, especially for ascii
strings (But also for 8 bit characters).

|> The only other option I see is simply croak()ing if fed UTF-8 data.
|> ... I'm sorry but that's that, and that's about all on this matter I
|> will care to argue.

So pelr will croak randomly because some previous operationb utf-8-ifid
the string. You just made perl totally useless and non-deterministic.

Of course, you should also do this for all syscalls.

I've heard that NT syscalls, for example, accept strings as UTS-2
(16-bit entities), and so on such systems, the internal representation
must be converted before it can be used, yes?

I thought UCS-2 (and lately UTF-17, because UCS-2 can't do full unicode),
but yes, I wqould expect this to work and not croak randomly.

really is true (which I'd suspect, given Jarkko's reply), there's really
no way to present the data to the system call (crypt, or any other) in
a way that makes any sense.

If perl suddenly can't work with the files rreturned by readdir, I would
consider this the greatest reakage of times.

Anmd if perl stops working with ascii paths because they get marked as
utf-8, this is pure insanity.

However, the bug report was about a missing zero byte, which Jarkko also
claims to be unnecessary. I don't understand why jarkko claims sucha
nonsense.

Why is it so difficult to get an obvious bugfix into pelr?

--
  -----==- |
  ----==-- _ |
  ---==---(_)__ __ ____ __ Marc Lehmann +--
  --==---/ / _ \/ // /\ \/ / pcg@​goof.com |e|
  -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
  The choice of a GNU generation |
  |

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2002

From [Unknown Contact. See original ticket]

On Sat, Jan 05, 2002 at 06​:49​:51AM +0200, Jarkko Hietaniemi <jhi@​iki.fi> wrote​:

(1) The dynamic allocation with Newz() happens only in the case when
the string is UTF-8 encoded. Rare case, and new code​: that's why
we haven't met it much yet.

Sure, why aren't you happy because I found this obvious bug before you
released 5.8?

overly lucky by that fact the crypt\(\) \[1\] only reads at most eight
characters\-\- and malloc probably doesn't deal in much smaller blocks\.
So no buffer overflows \(garbage\, yes\)\, by silly blind luck\.

Well, the bytes behind malloc might well be at the end of the page, so it
could segfault.

Even more of a wonder that I found the bug.

(3) Trying to argue that crypt() would have any sensible meaning on the
raw UTF-8 bytes is madness.

I agree fully. That's why you *must* convert it to 8-bit-bytes.

The only other option I see is simply
croak\(\)ing if fed UTF\-8 data\.  crypt\(\) is defined to operate on
the bits of bytes\.  Not characters\.

Cool perl. Croaks depending on the phase of the moon.

[1] At least the old de facto standard crypt(). Yes, there are fancier
crypt() out there that read more input.

Like the one on your linux platform that you claimed not accept
non-terminated strings. Of cozrse you just claimed that without thinking.
Good engineering.

*pissed off*

--
  -----==- |
  ----==-- _ |
  ---==---(_)__ __ ____ __ Marc Lehmann +--
  --==---/ / _ \/ // /\ \/ / pcg@​goof.com |e|
  -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
  The choice of a GNU generation |
  |

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2002

From [Unknown Contact. See original ticket]

Let me add here that Jarkko and I seem to want to kill each other without
reason and without holding personal grudges against each other at all. Maybe
we should meet in person once...

In any case, the fix is/will be applied and perl works as I think it
should and as Jarkko thinks it should, so I consider the case settled.

Sorry for making p5p a small battlefield.

--
  -----==- |
  ----==-- _ |
  ---==---(_)__ __ ____ __ Marc Lehmann +--
  --==---/ / _ \/ // /\ \/ / pcg@​goof.com |e|
  -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
  The choice of a GNU generation |
  |

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2002

From @jhi

Since $pass was mareked as utf-8, I triggered the (non-utf-8-related) bug in
crypt. I have no iudea why Jarkoo claims nonsense like c strings don't need
zero termination or whatever.

I have no idea why you haven't been reading my several public and
private messages where I acknowledge that you were right and I was
wrong.

I think not in all cases, but in most (so I agree mostly with
jarkko). This is why I am so puzzled about Jarkko switching sides and now
requiring perl programmers to call utf8_downgreade or uitf_8 manually

The downgrade happens automatically in pp_crypt() so I have no idea
what you are talking about.

Why is it so difficult to get an obvious bugfix into pelr?

Your bugfix is now in Perl. Has been there since your bug report.
Could we all, like, calm down?

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2002

From [Unknown Contact. See original ticket]

On Sat, Jan 05, 2002 at 09​:40​:17PM +0200, Jarkko Hietaniemi <jhi@​iki.fi> wrote​:

I have no idea why you haven't been reading my several public and
private messages where I acknowledge that you were right and I was
wrong.

I am really sorry, I just answered mail in the order they arrive, and I
am not awake nor reachable 24 hours a day. Mails via p5p is additionally
delayed, so this adds to the problem. Usually this is not a problem.

I think not in all cases, but in most (so I agree mostly with
jarkko). This is why I am so puzzled about Jarkko switching sides and now
requiring perl programmers to call utf8_downgreade or uitf_8 manually

The downgrade happens automatically in pp_crypt() so I have no idea
what you are talking about.

I was just puzzled because of your remark about passing UTF-8 into crypt
was not a good idea, but since the perl level doesn't expose utf-8-ness
vs. bytes explicitly (as is your goal, "unicode characters", if I got that
right), crypt sometimes gets called with utf-78 stzrings and sometimes
not. Your remark implied (to me) that the utf8->bytes conversion were a
courtesy, and perl could instead just croak.

But since perl utf8_upgrades ascii scalars regularly and without any
documented rules (which is *correct* as I, as a perl programmer, mustn't
care wether a scalar is utf-8 or not since it's just "characters") the
idea of crypt croaking because a scalar is utf-8 (something which happens
for the same code in some perl versions but not in others) doesn't appeal
to me as useful.

But maybe you meant something completely different and we just
misinterpreted each other again and got angry.

Why is it so difficult to get an obvious bugfix into pelr?
Your bugfix is now in Perl. Has been there since your bug report.
Could we all, like, calm down?

I hope this mail qualifies as objective and not angry again. If somebody
tells me this is not the case I am puzzled but will need to learn from
that.

Sorry for the confusion.

--
  -----==- |
  ----==-- _ |
  ---==---(_)__ __ ____ __ Marc Lehmann +--
  --==---/ / _ \/ // /\ \/ / pcg@​goof.com |e|
  -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
  The choice of a GNU generation |
  |

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2002

From @jhi

idea of crypt croaking because a scalar is utf-8 (something which happens
for the same code in some perl versions but not in others) doesn't appeal
to me as useful.

I think what I meant was that the string cannot be downgraded to
all-less-than-256 characters, than we could croak​: "Hey, you, why are
you passing wide characters to crypt()?" Now what we do is walking
through the Unicode characters and just fishing out the lowest bytes
of those, and then feeding those to crypt().

--
$jhi++; # http​://www.iki.fi/jhi/
  # There is this special biologist word we use for 'stable'.
  # It is 'dead'. -- Jack Cohen

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2002

From [Unknown Contact. See original ticket]

On Sat, 5 Jan 2002, Marc wrote​:

On Sat, Jan 05, 2002 at 11​:04​:19AM +0000, Jonathan Stowe <gellyfish@​gellyfish.com> wrote​:

   56\-bit  key  is  obtained\.

Which implies that it doesn't guess the length but rather assumes that it
is only eight characters - I believe this behaviour is sanctioned by the
appropriate standards.

If this were true, every login program would be in error, as would be perl.

Er, this *is* true. Of course for shorter passwords it would be an error
yes.

/J\
--
Jonathan Stowe |
<http​://www.gellyfish.com> | This space for rent
  |

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2002

From [Unknown Contact. See original ticket]

On Sat, Jan 05, 2002 at 10​:00​:49PM +0200, Jarkko Hietaniemi <jhi@​iki.fi> wrote​:

all-less-than-256 characters, than we could croak​: "Hey, you, why are
you passing wide characters to crypt()?" Now what we do is walking
through the Unicode characters and just fishing out the lowest bytes
of those, and then feeding those to crypt().

(As I wrote to you earlier, but not on this list), I am fine with croaking
or using the lowest 8 bits, or dancing around when there are >255
characters.

So everything (== crypt ;) is extraordinarily fine in perl right now ;)

--
  -----==- |
  ----==-- _ |
  ---==---(_)__ __ ____ __ Marc Lehmann +--
  --==---/ / _ \/ // /\ \/ / pcg@​goof.com |e|
  -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
  The choice of a GNU generation |
  |

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2002

From [Unknown Contact. See original ticket]

On Sat, Jan 05, 2002 at 08​:25​:31PM +0000, Jonathan Stowe <gellyfish@​gellyfish.com> wrote​:

Which implies that it doesn't guess the length but rather assumes that it
is only eight characters - I believe this behaviour is sanctioned by the
appropriate standards.

If this were true, every login program would be in error, as would be perl.

Er, this *is* true.

Nonsense. Show me a crypt that blindly reads eight characters, you will
find no such monster.

The standard (which is SuSv2 in this case) also doesn't claim such
nonsense, although one could misinterpret it that way. But even if the
standard wouldn't disallow such nonsense, it neither claims it, nor
sanctions it, nor would this invalidate each and every crypt in the world,
it would be an obvious bug in the standard.

Of course for shorter passwords it would be an error yes.

And it currently isn't an error, so you are wrong...

--
  -----==- |
  ----==-- _ |
  ---==---(_)__ __ ____ __ Marc Lehmann +--
  --==---/ / _ \/ // /\ \/ / pcg@​goof.com |e|
  -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
  The choice of a GNU generation |
  |

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2002

From [Unknown Contact. See original ticket]

On Jan 5, A., MISSINGpcgTERMINATOR, and UNEXPECTED_DATA_AFTERpcg@​goofsaid​:

On Sat, Jan 05, 2002 at 08​:25​:31PM +0000, Jonathan Stowe <gellyfish@​gellyfish.com> wrote​:

Which implies that it doesn't guess the length but rather assumes that it
is only eight characters - I believe this behaviour is sanctioned by the
appropriate standards.

If this were true, every login program would be in error, as would be perl.

Er, this *is* true.

Nonsense. Show me a crypt that blindly reads eight characters, you will
find no such monster.

The crypt() I use, that Perl uses, and most login programs I've seen use,
either pads with null bytes or truncates so that the password length is 8
bytes.

I have a password that is more than 8 characters on a machine, and I only
need to type the first 8 characters of it.

See​:

  http​://www.art.net/~hopkins/Don/unix-haters/login.html
  http​://www.art.net/~hopkins/Don/unix-haters/password.html
  http​://www.art.net/~hopkins/Don/unix-haters/loggedin.html

--
Jeff "japhy" Pinyan japhy@​pobox.com http​://www.pobox.com/~japhy/
RPI Acacia brother #734 http​://www.perlmonks.org/ http​://www.cpan.org/
** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
<stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.

@p5pRT
Copy link
Author

p5pRT commented Jan 5, 2002

From [Unknown Contact. See original ticket]

On Sat, Jan 05, 2002 at 03​:52​:14PM -0500, Jeff 'japhy' Pinyan <jeffp@​crusoe.net> wrote​:

The crypt() I use, that Perl uses, and most login programs I've seen use,
either pads with null bytes or truncates so that the password length is 8
bytes.

And if my buffer is shorter than 8 byte it sometimes segfaults, no sorry
this is just plain wrong (look at perl's crypt code, it does not do any
such padding, as it were incorrect).

When you probably mean is that crypt treats shorter passwords as _if_ it
_were_ zero-padded, but it certainly doesn't pad it, not does it blindly read

I have a password that is more than 8 characters on a machine, and I only
need to type the first 8 characters of it.

I am aware of this. Are you?

Please, I am angry enough at so many people claiming utter nonsense (if
you think I sound angry, I am). Please, for gods sake, read the standard,
read a crypt implementation, read the manpage and apply common sense to
it and you will see that all these claims of crypt blindly reading beyond
my buffer or poking zero bytes into unallocated memory space are utter
nonsense.

Basically, it works like it, in practise, works, and this is in harmony
with the standard (you can observe this by testing!). If you think
otherwise please show me code, but you won't find any in any non-obsolete
OS.

Please also note that Jarkko agrees with that, as probbaly does everybody
else.

(And take replies off-list, this is not perl-related in any way).

*sigh*

--
  -----==- |
  ----==-- _ |
  ---==---(_)__ __ ____ __ Marc Lehmann +--
  --==---/ / _ \/ // /\ \/ / pcg@​goof.com |e|
  -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
  The choice of a GNU generation |
  |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant