Skip to content

question: on uint vs. int decoding. #134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
client9 opened this issue Oct 8, 2015 · 16 comments · Fixed by #215
Closed

question: on uint vs. int decoding. #134

client9 opened this issue Oct 8, 2015 · 16 comments · Fixed by #215

Comments

@client9
Copy link
Contributor

client9 commented Oct 8, 2015

I started getting some of these errors on decode:

attempted to decode type "uint" with method for "int"

The culprit was another msgpack implementation

func (e *msgpackEncDriver) EncodeInt(i int64) {
    if i >= 0 {
        e.EncodeUint(uint64(i))

I found a few other libraries doing this trick. Any suggestions here on how i can morph your library to read them?

update:

I'll spare you reading lua code, but
https://github.com/fperrad/lua-MessagePack/

also defaults to unsigned.

set_integer'unsigned'

update, and one version of python, basically the pattern is "if non-negative, write as unsigned, if negative write as signed"


# You may notice struct.pack("B", obj) instead of the simpler chr(obj) in the
# code below. This is to allow for seamless Python 2 and 3 compatibility, as
# chr(obj) has a str return type instead of bytes in Python 3, and
# struct.pack(...) has the right return type in both versions.

def _pack_integer(obj, fp):
    if obj < 0:
        if obj >= -32:
            fp.write(struct.pack("b", obj))
        elif obj >= -2**(8-1):
            fp.write(b"\xd0" + struct.pack("b", obj))
        elif obj >= -2**(16-1):
            fp.write(b"\xd1" + struct.pack(">h", obj))
        elif obj >= -2**(32-1):
            fp.write(b"\xd2" + struct.pack(">i", obj))
        elif obj >= -2**(64-1):
            fp.write(b"\xd3" + struct.pack(">q", obj))
        else:
            raise UnsupportedTypeException("huge signed int")
    else:
        if obj <= 127:
            fp.write(struct.pack("B", obj))
        elif obj <= 2**8-1:
            fp.write(b"\xcc" + struct.pack("B", obj))
        elif obj <= 2**16-1:
            fp.write(b"\xcd" + struct.pack(">H", obj))
        elif obj <= 2**32-1:
            fp.write(b"\xce" + struct.pack(">I", obj))
        elif obj <= 2**64-1:
            fp.write(b"\xcf" + struct.pack(">Q", obj))
        else:
            raise UnsupportedTypeException("huge unsigned int")
@client9
Copy link
Contributor Author

client9 commented Oct 9, 2015

Yarg, I forgot to say "thank you" in the first place for this project. I love it, and thats why Im trying to figure out how to morph it to work with legacy applications.

It appears, my case func ReadInt64Bytes(b []byte) (i int64, o []byte, err error) { is the one causing my issues.

Given the strong range checking in the subsequent int types e.g.

func ReadInt16Bytes(b []byte) (int16, []byte, error) {
    i, o, err := ReadInt64Bytes(b)
        if i > math.MaxInt16 || i < math.MinInt16 {

it appears they would handle the situation well. Then again you explicitly mentioned "sporadic failure" and how you are trying to avoid them.

Any thoughts on perhaps adding a global flag to allow uints to be read into ints (off by default)?

@glycerine
Copy link
Contributor

I don't understand the issue. Are you saying other msgpack implementations are varying the type they write depending on the value received? That sounds like a serious implementation problem that should stay isolated to their own little poor-practices universe. msgp should not cater to such poor habits.

@client9
Copy link
Contributor Author

client9 commented Oct 9, 2015

Hi glycerine,

Sorry if I wasn't clear. In many untyped language's implementation of the message pack spec, (and some typed ones), they default to using unsigned if the value is non-negative. The spec doesn't really have any guidance here, so they aren't wrong per-se (and they might not even have a concept of signed, or unsigned, nor any way of even indicating such a thing). However, this means if the client is using one of these libraries, a msgp server can not read them. This isn't a rare issue. In the space of a few minutes I found 3 versions, in three 3 languages that do this. Just doing another spot check I found 2 more, below. I suspect many more do this since it makes the programming of the msgpack algorithm a lot smaller.

Now I agree that msgp <--> msgp should keep doing exactly what it is doing as its nice and type safe and prevent all sorts of potential randoms.. My goal is to enable clients who are using common msgpack libraries emitting valid msgpack to be read by msgp (as mentioned, with optional flag of some type, or a code-gen option, and/or "unsafe" version, and still having error checking for overflow).

Does this help explain issue?

thanks all!

n

Javascript (first one I found):
https://github.com/creationix/msgpack-js/blob/master/msgpack.js

   // Integers
    if (value >=0) {
      // positive fixnum
      if (value < 0x80) {
        buffer[offset] = value;
        return 1;
      }
      // uint 8
      if (value < 0x100) {
        buffer[offset] = 0xcc;
        buffer[offset + 1] = value;
        return 2;
      }

and another one for JS but using V8 engine in C++.. if positive, use unsigned.

   } else if (v8obj->IsNumber()) {
        double d = v8obj->NumberValue();
        if (trunc(d) != d) {
            mo->type = MSGPACK_OBJECT_FLOAT;
            mo->via.f64 = d;
        } else if (d > 0) {
            mo->type = MSGPACK_OBJECT_POSITIVE_INTEGER;
            mo->via.u64 = static_cast<uint64_t>(d);
        } else {
            mo->type = MSGPACK_OBJECT_NEGATIVE_INTEGER;
            mo->via.i64 = static_cast<int64_t>(d);
        }

... later "if positive, do unsigned, if negative do signed"...

@philhofer
Copy link
Member

The 'official' work-around for this is to use msgp.Number, which will happily serialize and de-serialize any number (int or float) at the expense of one additional byte of memory. The rationale is that languages that don't draw distinctions between signed and unsigned integers typically (though not always) don't draw distinctions between floats or ints either (e.g. js). Obviously your structs don't look quite as nice when they're filled with msgp.Number, but it's a minimal conforming solution.

I'm happy to entertain other proposals, though, provided that the default behavior remain as-is.

@client9
Copy link
Contributor Author

client9 commented Oct 9, 2015

oh interesting @philhofer I'll take a look. (And note,, I like the default behavior).

@client9
Copy link
Contributor Author

client9 commented Oct 16, 2015

wow time flies... so the reason "everyone" does this somewhat odd "use uint-unless-negative, then use int" is because the c reference library does it this way.

Its a bit hard to find the code however this here's a sample. I find it a bit odd for a C library not to respect the sign that is being passed in. Anyways this why so many "other" libraries don't play nice with msgp since they want to byte identical to the reference imply.

I'm still taking a look at solutions to allow the best of both worlds here in msgp.

thx.

n

https://github.com/msgpack/msgpack-c/blob/master/include/msgpack/pack.hpp

template <typename Stream>
template <typename T>
inline void packer<Stream>::pack_imp_int16(T d)  <--- SIGNED IN
{
    if(d < -(1<<5)) {  // I.E IF NEGATIVE
        if(d < -(1<<7)) {
            /* signed 16 */
            char buf[3];
            buf[0] = static_cast<char>(0xd1u); _msgpack_store16(&buf[1], static_cast<int16_t>(d));
            append_buffer(buf, 3);
        } else {
            /* signed 8 */
            char buf[2] = {static_cast<char>(0xd0u), take8_16(d)};
            append_buffer(buf, 2);
        }
    } else if(d < (1<<7)) {  
        /* fixnum */
        char buf = take8_16(d);
        append_buffer(&buf, 1);
    } else {
        if(d < (1<<8)) {  // ELSE IF POSITIVE
            /* unsigned 8 */  /// <---- DO UNSIGNED
            char buf[2] = {static_cast<char>(0xccu), take8_16(d)};
            append_buffer(buf, 2);
        } else {
            /* unsigned 16 */   /// <---- DO UNSIGNED
            char buf[3];
            buf[0] = static_cast<char>(0xcdu); _msgpack_store16(&buf[1], static_cast<uint16_t>(d));
            append_buffer(buf, 3);
        }
    }
}

@glycerine
Copy link
Contributor

@client9 Fascinating. But painful still the same. Thanks for digging this up.

@client9
Copy link
Contributor Author

client9 commented Nov 30, 2015

Ok, back to this old one!

Given that automatic but sloppy conversations are not desired (inputs and outputs must match exact sign, and numeric type), there two options.

  • Adding a flag to code generator. I think this is too ugly, and it would mean a doubling of tests.
  • Using msg.Number which is kinda gross for the end-users, however unless the code is heavily numeric, not too bad in practice.

The current functions such as "Int()" and "Uint()" do a bit-wise cast to a value, and check to see if the type is conversion is even valid or not. (i..e if asking for Int() it returns true if the underlying value is a int, false otherwise... well more or less). There is no range checking. This makes is difficult to handle the wild variety of "correct value" but "wrong type" situations that occur with clients using different message pack library.

To make msg.Number easier to deal with I propose the following new methods to be added:

  • func (n Number) CastInt64() (int64, ok)
  • func (n Number) CastInt32() (int32, ok)
  • func (n Number) CastUint32() (uint32, ok)
  • func (n Number) CastUint64() (uint64, ok)

(others such as Int8,16 and Uint8, Uint16. CastFloat32 and CastFloat64 might require more thinking but conceptual the same).

These would make best attempt to covert the numeric value to the correct type. Returning the value, and true if success and in range, and false is out of range. This would allow users of the package to be "as sloppy as they want to be" with out writing a lot of boilerplate code. To do this logic outside of the code is a little tricky since the Number type field is not public.

correction: "To do this logic outside of the code is a little tricky since the Number type field is not public."

actually there is a method Type()

n

@philhofer
Copy link
Member

@client9 Sounds reasonable; I'm happy to review a PR if you have time to put one together.

@client9
Copy link
Contributor Author

client9 commented Dec 1, 2015

"if you have time to put one together." yeah, exactly the problem :-)

As a side note, I took at look at the RFC for CBOR which is maybe a msgpack2.0..

On numbers they say the following:

For the purposes of this specification, all number representations
   for the same numeric value are equivalent.  This means that an
   encoder can encode a floating-point value of 0.0 as the integer 0.
   It, however, also means that an application that expects to find
   integer values only might find floating-point values if the encoder
   decides these are desirable, such as when the floating-point value is
   more compact than a 64-bit integer.

which to me says the msg.Number plus cast functions is exactly the correct thing to do (or have them builtin and be sloppy with overflows etc)... when you are dealing with untrusted msgp streams.

@dwlnetnl
Copy link

To make msg.Number easier to deal with I propose the following new methods to be added:

  • func (n Number) CastInt64() (int64, ok)
  • func (n Number) CastInt32() (int32, ok)
  • func (n Number) CastUint32() (uint32, ok)
  • func (n Number) CastUint64() (uint64, ok)

(others such as Int8,16 and Uint8, Uint16. CastFloat32 and CastFloat64 might require more thinking but conceptual the same).

+1
I can implement at least some of it.

@client9
Copy link
Contributor Author

client9 commented Jan 27, 2016

@dwlnetnl go for it!

@pwaller
Copy link
Contributor

pwaller commented Jul 5, 2016

This wiki entry needs updating considering this issue.

It's fascinating that the C library would do this and msgp would barf on the output. This lead to quite a serious issue in our app where everything worked fine until you had 128 or more of something, and then it stopped working.

One thing I didn't see considered in this thread, is what about the code generators? Will there be a way to specify that the field should be coerced into the struct type?

@pwaller
Copy link
Contributor

pwaller commented Jul 5, 2016

I would love like to see this issue reconsidered. For better or worse, the msgpack ecosystem seems to have standardized on a different set of rules than msgp, and in this way, msgp does not seem to be a compatible implementation.

Please see the discussion in msgpack/msgpack#164 and msgpack/msgpack-c#247 for more detail.

The current setup means that decoding msgpack with tinylib/msgp will randomly fail on messages encoded with other libraries, depending on what values are encoded within. This means that something which appears to can simply break when it receives another value from a different implementation, which is difficult and surprising behaviour to test and detect.

"Rule 3: sign matters" currently gives the opposite understanding of how things work in the msgpack ecosystem than they do in reality, unfortunately.

I'm particularly concerned right now about getting the code generator to work, since it's not even possible to put a msgp.Number in there. (The code generator simply drops the field).

The fix I would like to see is that I could just specify a int64 on my struct and it would just work, unless a value is received which doesn't fit in the type. That would be consistent with what is going on in the "C & friends" ecosystem, which is what I am trying to communicate with.

antoniomo added a commit to antoniomo/msgp that referenced this issue May 30, 2017
Addresses tinylib#134 in a way that makes
it more interoperable with other msgpack libraries.

The actual code borrowed from librato#1
@aldencolerain
Copy link

aldencolerain commented Jan 27, 2018

@philhofer What is the status of this issue? I'm willing to make a pull request. Any thoughts on librato#1 / antoniomo#1 ?

@dchenk
Copy link
Contributor

dchenk commented Jan 29, 2018

@aldencolerain and @philhofer and @client9 it's true that apparently all other MessagePack implementations by default encode non-negative integers as unsigned integers (uint8, uint16, uint32, or uint64). I think the best way to add interoperability with all those libraries is to extend ReadInt64() to be able to read any unsigned integer just as it is now able to read any signed integer (no matter how compactly it's encoded). Any thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants