Skip to content

Support custom model object keys #155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
orf opened this issue Nov 19, 2015 · 33 comments · Fixed by #1127
Closed

Support custom model object keys #155

orf opened this issue Nov 19, 2015 · 33 comments · Fixed by #1127

Comments

@orf
Copy link

orf commented Nov 19, 2015

Hey, thanks for your work on this project. I was investigating integrating it with our EmberJS app but I ran into an issue. We have an Employee model that has a normal integer primary key, but also a username attribute. I'd like to only expose the username as the primary key, using DRF this is simple:

class EmployeeSerializer(serializers.ModelSerializer):
    id = serializers.ReadOnlyField(source="user.username")
    ...

However in build_json_resource_obj it just uses instance.pk:

def build_json_resource_obj(fields, resource, resource_instance, resource_name):
    resource_data = [
        ('type', resource_name),
        ('id', encoding.force_text(resource_instance.pk) if resource_instance else None),
        ...

Can you use the serializers ID field if available, and fallback to the instances PK?

@jerel
Copy link
Member

jerel commented Nov 19, 2015

So your Employee model has an integer primary key but you're overwriting it on output to use another field's (username) value? That sounds like a rocky road to go down as I'm pretty sure relationships will fail since you're masquerading a non primary key value as a primary key.

Would it be possible for you to change your model's username field to primary_key=True and get rid of id or better yet change your id field to a uuid? (the latter is what I do, Django 1.8 introduced core support for UUID pks).

I'm a little hesitant to change build_json_resource_obj right off because, at least in my mind, the id should always be a primary key. I'm open to a discussion on the matter though

@orf
Copy link
Author

orf commented Nov 20, 2015

I agree it sounds rocky, but in the context of the app it makes sense. We have an app with a django user model that is backed by LDAP, so the username field is the 'true' primary key and the integer id is an implementation detail for that application. The Emberjs app pulls in data from this system and two others, so exposing the primary key seems risky as its just the ID for this particular row, the username is much safer to share between the systems.

Also we view exposing the user id as a security risk, it opens the system up to easy enumeration attacks whereas the username is not susceptible without a large brute forcing effort. Upgrading the central app would be a large undertaking, and we've been using this method for a while without any issues with the DRF ember package. Upgrading the app to use the username as the primary key would be awesome, but it's a fairly critical application and that would be a large/costly undertaking with already stretched resources.

Surely it's not too much to just use the id field specified in the serializer, it seems a weird limitation to have. We're also all consenting adults here, if someone is exposing a custom ID then they hopefully know what they are doing 👍 (and if not then it's their fault, not you or this packages).

@jsenecal
Copy link
Member

How would you deal with the relations then? What is the ID used in other objects ForeignKeys?

@orf
Copy link
Author

orf commented Nov 20, 2015

There are two or three models being served from the same application, for those foreign keys we just use a CharField with source='user_relation.username'. For the relations stored on other systems we just store the username in a charfield, then the Ember app issues a lookup for /api/employees/user.id/.

Honestly, its also a bit about enumeration. With ~500 employees (past and present) in the database exposing the integer userID makes it super super easy to gather a big list of everyone who has ever worked for us. From this perspective exposing the username is a much much better approach.

@jsenecal
Copy link
Member

@orf This behavior is completely out of the scope of this library.
Also, security by obscurity is not a good way to protect your data. Also like @jerel mentioned switching to uuids could meet both ends and solve this for you.

@orf
Copy link
Author

orf commented Nov 27, 2015

Upgrading a huge, critical LOB application to use GUIDs (which is only more secure by virtue of being a larger keyspace) is a huge undertaking, whereas this is a genuine use case and what I think is a somewhat simple fix. There are many cases where you may not want to expose the internal ID of an object but instead use another unique field, and I really think it would be a good idea to support this rather than just assuming everyone wants the pk exposed and offering absolutely no way to change it (other than monkeypatching, which is icky).

But it's your library and your choice. I'm more than willing to make a MR with the changes and have another discussion based on that?

@jsenecal
Copy link
Member

@orf we are always open to PRs and I did not want to offend you in any ways. it's just that with my knowledge of the internals of DRF, using another field as the ID for a relation, just seems to be a lot of work and one would need to rewrite a lot of things for a very specific edge-case.
My 0.02$.

Again feel free to open a PR if you come up with a solution.

@leifurhauks
Copy link

I think I'm misunderstanding something about your use case. Isn't supporting alternative primary keys what the pk object on django model instances is for in the first place? So that we don't have to hardcode references to instance.id? For example, the following model uses a CharField as a primary key:

class MyModel(models.Model):
    foo = CharField(primary_key=True)
    # ... other fields here

Now instance.pk is an alias for instance.foo on instances of MyModel.

If your username field is unique anyway, what's the barrier to simply setting primary_key=True on the username field?

Then third-party code like rest_framework_json_api and any other third-party django apps you might use would have no problem retrieving your user instances by the username field instead of the id field, without having to modify the code at all.

@leifurhauks
Copy link

Oh, I realize that the 'username' field is on a different model. Assuming the username field is unique on the user model, and if no two Employees can have a foreign key to the same user (i.e. the foreign key to user on Employee is unique), you could have the Employee.user foreign key reference user.username (instead of user.id) and then set that foreign key field as Employee's primary key.

So Employee.user would look like:

user = models.ForeignKey(to_field='username', primary_key=True)

And if employee is in an instance of Employee, employee.pk would return the username.

This would involve one schema migration (to adjust the foreign key constraints) and no data migrations.

@orf
Copy link
Author

orf commented Nov 27, 2015

Hey, yeah in our case the app is still using the old Django profile system rather than a custom user model, but that's not entirely relevant to the issue I'm reporting. I don't think I explained this properly above, sorry!

DRF allows you to expose any attribute you feel fit as the 'id' in a serializer. By default it uses the 'pk' attribute of the model, but this can be changed (see below). All I would like is for this functionality to be exposed by this library, currently it's hard-coded to the pk of the model. In merge request #165 I've taken a stab at it by simply grabbing the pk field from the serializers output, which it currently ignores and just uses model.pk.

class ThingSerializer(ModelSerializer):
    id = serializers.CharField(source='attribute')

    class Meta:
        fields = ('id',)
        model = Thing

class RelatedThing(ModelSerializer):
    # produces a list of the charfield attributes defined above, rather than integer PK's
    some_relation = ThingSerializer(many=True)  
    ...

My particular use case is a bit convoluted, but the core of it is that this library naively assumes that you only want to expose the pk of a model as is, when DRF allows you to expose anything and as long as your serializers are wired up correctly then all the CRUD actions work fine.

rpatterson added a commit to rpatterson/django-rest-framework-json-api that referenced this issue Feb 21, 2017
TODO Test coverage

There are so many places where PK is assumed in
djangorestframework-jsonapi and so many places/ways to change IDs/URLs
in DRF itself, that getting coverage on all the edge cases will be quite
a lot of work.  Some of that work may be better spent re-factoring
things to make the code easier to understand and reduce the number of
edge cases in the first place.
@rpatterson
Copy link
Contributor

If anyone's interested, I've added support for defering to lookup_field/custom serializer id fields such that the DB can still use the defaul/some other pk field/column, while the API can use another field, such as UUID, throughout.

@orf
Copy link
Author

orf commented Feb 22, 2017

Thank you!

@orf
Copy link
Author

orf commented Mar 1, 2017

Hey @rpatterson are you able to make a MR with that change?

@lucacorti
Copy link
Contributor

Hi everyone,

What is the status of this? Is it going to be merged eventually? It would be great to be able to use custom fields for API lookups and model PKs for model relationships.

@orf
Copy link
Author

orf commented May 12, 2017 via email

@lucacorti
Copy link
Contributor

@orf I share your views on the matter. This is probably the result of too much coupling between models and serializers. Direct mapping of resources to database models is a convenient but limited approach IMHO. The ability to use serializers to do this kind of mapping is really important.

@orf
Copy link
Author

orf commented May 12, 2017 via email

@mblayman
Copy link
Collaborator

mblayman commented May 12, 2017

@orf please remember that maintainers are fallible human beings who volunteer their time to work on this project. You appear to be frustrated and the best I can do is apologize. I hope you don't take the lack of feedback personally. I believe that all the original maintainers have moved to a mostly inactive role for many months so all issues and PRs have seen a low volume of feedback. I picked up the maintainer torch at the end of February and have slowly been working through PR queue to get things going again.

I have extremely limited time for DJA and will prioritize on PR work and work through issues when I can.

As for this actual issue, I'm happy to consider a PR if DJA is blocking something that is possible with vanilla DRF. That sounds like a bug to me as well. If a previous PR was closed because of conflicts, it's only because I don't have the knowledge to resolve the conflicts or time to fix the conflicts myself.

Wearing the new maintainer hat, I'm really looking to open up DJA to all contributors/contributions. That's not to say that the other maintainers didn't, but my focus is on a permissive model where I indirectly help by shaping incoming work from other contributors. I'm most interested in the following goals:

  1. Support the JSON API spec to compliance
  2. Be as compatible with DRF as possible
  3. Have sane defaults to be as easy to pick up as possible
  4. Be solid and tested with good coverage (DJA should ideally be a very quiet component in the stack)
  5. Be performant.

I hope that motivates you enough to put your work back up for consideration in a new PR. I'd love to talk through some code. 😄

@lucacorti
Copy link
Contributor

lucacorti commented May 15, 2017

@mblayman I'm sorry if I am causing a dispute over this while trying to make a technical point.
Your work is much appreciated.

Actually, I think you are right about this being a bug in DRF-JA since DRF has a different behaviour as pointed out by @orf

I'm interested in getting this in as I'm currently working on something which needs this capability, so I'm willing to spend some time on getting a PR together if you would consider merging this.

AFAIU the most updated version of the PR is currently on @rpatterson fork. Any help in getting up to speed on the original PR would be great.

@mblayman
Copy link
Collaborator

I am overjoyed to keep this discussion technical. 😄

My last comment was to provide an update about what's going on with the maintainers so everyone has context of the project's status. I'm happy to get back on track with the problem reported in this issue.

@ghost
Copy link

ghost commented Jul 18, 2017

Thanks for the above suggestions, but how then multiple lookup_fields would be handled http://www.django-rest-framework.org/api-guide/generic-views/#creating-custom-mixins ?

see http://discuss.jsonapi.org/t/composite-id-inside-the-resource-object/367/3

sliverc pushed a commit to sliverc/django-rest-framework-json-api that referenced this issue Jul 24, 2018
@kdeloach
Copy link

Being able to select which field to serialize for the primary key would be very useful.

The application I'm working on must maintain the original primary keys for backwards compatibility with the old API while we transition to using UUIDs for the new API. We do not want to expose the original integer primary keys at all.

I almost got this to work by extending JSONRenderer. However, for relations, I would have to copy & paste the entire extract_relationships method because of how the pk field is hard-coded.

class CustomJSONRenderer(rest_framework_json_api.renderers.JSONRenderer):
    @classmethod
    def extract_attributes(cls, fields, resource):
        obj = super(CustomJSONRenderer, cls).extract_attributes(fields, resource)
        obj.pop('uuid')
        return obj

    @classmethod
    def build_json_resource_obj(cls, fields, resource, resource_instance, *args, **kwargs):
        obj = super(CustomJSONRenderer, cls).build_json_resource_obj(fields, resource, resource_instance, *args, **kwargs)
        obj['id'] = str(resource_instance.uuid)
        return obj

@mynameistechno
Copy link

This would also be useful for model-less serializers.

@sliverc
Copy link
Member

sliverc commented Oct 2, 2019

This issue only concerns ModelSerializer in case someone wants to use a different id than the primary key.

For a use case where there is no model you can simply use a Serializer and it should work.

@sliverc sliverc changed the title Support custom object keys Support custom model object keys Oct 2, 2019
@mynameistechno
Copy link

mynameistechno commented Oct 2, 2019

This issue only concerns ModelSerializer in case someone wants to use a different id than the primary key.

I experienced this issue using the plain Serializer and a regular class as the instance. I'd have to go back to confirm but something like this

class Foo:
    def __init__(self):
        # pk instead of id required here if not it barfs
        self.pk = '1'

class FooSerializer(serializers.Serializer):
    id = serializers.CharField(read_only=True)

serializer = FooSerializer(instance=Foo())

Am i doing something wrong?

@sliverc
Copy link
Member

sliverc commented Oct 2, 2019

@mynameistechno Here are some serializers which are working. They use a custom queryset but would be the same as when using a python plain object with a pk set.
In case this doesn't work please open a new issue as it should work and doesn't concern this specific issue.

@mynameistechno
Copy link

I still experience this issue. "object has no attribute pk'

Exception on this line here:

('id', encoding.force_str(resource_instance.pk) if resource_instance else None),

My object is not an instance of a model. I have an id attribute set on the object however. I am using vanilla Serializer, not ModelSerializer. The only way (that I know) to get around this is to set a pk attribute instead of an id on your instance object.

As far as I can tell the renderer (build_json_resource_obj) assumes a pk is present on the object.

@sliverc
Copy link
Member

sliverc commented Mar 31, 2020

@mynameistechno
Yes this is true and I have also stated this in my #155 (comment) above (although it might not have been really clear) that a pk attribute/property needs to be available on plain object to be able to use generic serializer with DJA. This can be a simple read-only property on the object though which simply returns the id.

To be able to configure the attribute/property to be used as id resp. primary key a solution for this issue would need to be implemented.

@daveseff
Copy link

daveseff commented Jun 6, 2020

Ok I'm stuck on this issue.
I have to implement a django rest_framework with JSON API output with pagination. Seems simple enough. I've done it many times before. But this time I don't have an underlying model to work from.

My data comes from a third-party SDK which just returns a list of dictionaries. It's complex data with nested lists and such, but in the end it's just a list of dicts.

I'm posting my snippets here changing the name to not expose the 3rd party provider.

Here is the serializer I have so far:

from rest_framework import serializers

class MySerializer(serializers.Serializer):
    def to_representation(self, instance):
        return instance

This is my view:

class InstanceViewSet(ModelViewSet):
    queryset = sdk.get_all_the_things()
    serializer_class = MySerializer
    pagination_class = CustomPagination

sdk.get_all_the_things() returns a list of dicts, similar to:

[{ 
  "foo": "something",
  "bar": "more stuff", 
  ...
 },
 {
  ...
 },


and so on....
]

I have no control over this input directly and the underlying dict structure could change at any time. Again the format of the dict isn't all that critical.

This just gives me the dreaded 'dict' objects has no attribute 'pk'. Because my 'dict' object doesn't have one. So given that django-rest-framework-json-api is requiring one, Is there something I can do in my code to satisfy the 'pk' attribute?

@sliverc
Copy link
Member

sliverc commented Jun 6, 2020

@daveseff
before you return the objects in get_all_the_things you could use setattr on each dict of the list calculating the primary key depending on the dict content.

setattr(obj_dict, 'pk', 'value')

@mynameistechno
Copy link

@daveseff

before you return the objects in get_all_the_things you could use setattr on each dict of the list calculating the primary key depending on the dict content.

setattr(obj_dict, 'pk', 'value')

Yeah I ended up addressing this in a similar fashion. I take the underlying data dict and create an object instance from a simple class definition. The class has a read only pk property method that returns the id.

@daveseff
Copy link

daveseff commented Jun 7, 2020

Thank you kindly @sliverc and @mynameistechno . I managed to get this working by creating a dummy object and populating a list with those objects. I'm posting my solution here in case any one else comes across this and needs a solution:

class DummyObject(object):
    def __init__(self, **kwargs):
        for field in ('id', 'data'):
            setattr(self, field, kwargs.get(field, None))

Then had to manipulate the list of dicts. This is somewhat ugly code but worked:

resultset = sdk.get_all_the_things()
index = 0
instances = []

for instance in resultset:
    temp_obj = DummyObject()
    temp_obj.data = instance
    temp_obj.pk = index
    instances.append(temp_obj)
    index += 0
return instances

Then Changed my serializer to serialize the object data:

from . import DummyObject

class MySerializer(serializers.Serializer):
    pk = serializers.IntegerField(read_only=True)
    data = serializers.DictField()

    def create(self, validated_data):
        return DummyObject(pk=None, **validated_data)

    class Meta:
        resource_name = 'instances'

@wakingyeung
Copy link

If the fields parameter of build_json_resource_obj method contains an id field, use this id field, otherwise use the default implementation. I think this is very flexible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.