Skip to content

Unique validation doesn't verify uniqueness between data items when many=True #6395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
6 tasks done
maribedran opened this issue Jan 7, 2019 · 15 comments
Closed
6 tasks done
Labels
Milestone

Comments

@maribedran
Copy link

maribedran commented Jan 7, 2019

Checklist

  • I have verified that that issue exists against the master branch of Django REST framework.
  • I have searched for similar issues in both open and closed tickets and cannot find a duplicate.
  • This is not a usage question. (Those should be directed to the discussion group instead.)
  • This cannot be dealt with as a third party library. (We prefer new functionality to be in the form of third party libraries where possible.)
  • I have reduced the issue to the simplest possible case.
  • I have included a failing test as a pull request. (If you are unable to do so we can still accept the issue.)

Steps to reproduce

  • Create a model with an unique field
  • Create a serializer for the model
  • Instantiate the serializer with many=True and a data argument containing items with non unique values for the unique field
  • Call .is_valid() method - it will return True
  • Call .save() method - will raise IntegrityError

Example

class UniquenessModel(models.Model):
    username = models.CharField(unique=True, max_length=100)

class UniquenessSerializer(serializers.ModelSerializer):
    class Meta:
        model = UniquenessModel
        fields = '__all__'

data = [{'username': 'non-existing'}, {'username': 'non-existing'}]
serializer = UniquenessSerializer(data=data, many=True)
serializer.is_valid()   # True
serializer.save()  # django.db.utils.IntegrityError: UNIQUE constraint failed: tests_uniquenessmodel.username

Expected behavior

The serializer should validate the uniqueness between data items and the existing database entries as well as between themselves.

Actual behavior

The serializer only validates uniqueness between individual data items and database entries.

@maribedran
Copy link
Author

I'd like to try solving this issue if it's considered valid, but I'm not sure what would be the best design choice to address it. The test I added is together with the UniqueValidator's tests, but looking at the class' code made me realize it's in the wrong place. This class only deals with the value of one entry and passing the values from the other entries to it looks to be complicated and seems like a bad smell.
I thought of creating a new validator class and adding it to ListSerializer's validators when it's child is a ModelSerializer with unique fields. Does that look reasonable? If so, what would be the correct place to do it?

@xordoquy
Copy link
Collaborator

xordoquy commented Jan 8, 2019

My thoughts on this:

  • documentation should be improved so it makes it explicit that it applies against the DB content
  • a "batch" and correct unique validator will need to be set as a ListSerializer validator and most likely would be a 3rd party (might already exist)

@tomchristie
Copy link
Member

Evidently it is a valid issue yeah, tho it's more a case of "is there a simple enough pull request that would resolve this, or do we just accept it as a constraint and document it appropriately."

I'm slightly surprised to not see this issue raised previously. There's a couple of similar cases that consider it from the POV of race requests with unique values. #3876, #5760 - Tho I think the answer to that is "use ATOMIC_REQUESTS = True" so that everything runs within a database transaction always.

[...] Does that look reasonable? If so, what would be the correct place to do it?

Not sure, sounds fiddly either way around. The best way to make some progress towards this would probably be to consider how it'd be resolved with a plain old serializer class first, rather than with a modelserializer.

If we start instead with this:

class UniquenessModel(models.Model):
    username = models.CharField(unique=True, max_length=100)

class UniquenessSerializer(serializers.Serializer):
    username = models.CharField(max_length=100, validators=[
        UniqueValidator(queryset=UniquenessModel.objects.all())
    ])

Then what's the best resolution for ensuring that your example case in #6396 would pass?

@maribedran
Copy link
Author

On my real life case I just created a list serializer and added this logic to it's validate method. Maybe just documenting this would be enough for most of the cases, but I still wonder if there is a more generic solution. For now I can see two ways around this, but both look complicated.

  1. UniqueValidator has access to it's field's parent serializer and therefore to it's parent's parent when it's a many=True case, so in the set_context method we could get the values passed to the unique field in the other items of the list data. Going up two levels to access other items values looks conceptually wrong, so at first I wouldn't choose that.

  2. Creating a validator that's able to compare items from the hole data set and add it to the list serializer's validators. This should probably be done when the parent is instantiated in many_init, from what I could understand from the code.

@maribedran
Copy link
Author

maribedran commented Jan 8, 2019

The shortest solution I could think of for now was adding this valitator to the list serializer:

class ListUniqueValidator(object):
    message = _('This field must be unique.')

    def __init__(self, unique_field_names):
        self.unique_field_names = unique_field_names

    @staticmethod
    def has_duplicates(counter):
        return any([count for count in counter.values() if count > 1])

    def __call__(self, value):
        from collections import Counter
        field_counters = {
            field_name: Counter(
                item[field_name] for item in value
                if field_name in item
            )
            for field_name in self.unique_field_names
        }
        has_duplicates = any([
            ListUniqueValidator.has_duplicates(counter)
            for counter in field_counters.values()
        ])
        if has_duplicates:
            errors = []
            for item in value:
                error = {}
                for field_name in self.unique_field_names:
                    counter = field_counters[field_name]
                    if counter[item.get(field_name)] > 1:
                        error[field_name] = self.message
                errors.append(error)
            raise ValidationError(errors)

    def __repr__(self):
        return unicode_to_repr('<%s(unique_field_names=%s)>' % (
            self.__class__.__name__,
            self.unique_field_names,
        ))


class UniquenessListSerializer(serializers.ListSerializer):
    validators = [ListUniqueValidator(unique_field_names=['username'])]


class UniquenessSerializer(serializers.ModelSerializer):
    class Meta:
        list_serializer_class = UniquenessListSerializer
        model = UniquenessModel
        fields = '__all__'
data = [{'username': 'non-existing'}, {'username': 'non-existing'}]
serializer = UniquenessSerializer(data=data, many=True)
assert not serializer.is_valid()
assert serializer.errors == {'non_field_errors': [
    {'username': 'This field must be unique.'},
    {'username': 'This field must be unique.'},
]}

The error message formatting is probably wrong, but that'd be the general idea. To make this generic to all list serializers with child model serializers, the validator's set_context method could get the unique field names from the model serializer it receives as parameter. What I'm not sure is whether ther's a reasonable way of doing that through the many methods that add fields and validators to model serializers.

@xordoquy
Copy link
Collaborator

xordoquy commented Jan 9, 2019

@tomchristie this has not be risen earlier because it's a bit out of the scope of the default generic views which don't do batch operation on their resources.
I faced that one once which I've solved with a specific validator and didn't rise it here because it required more validation than what a default single resource creation would require.
Still not convinced it's really an issue with the core and inclined to add a warning about it in the documentation

@zar777
Copy link

zar777 commented Feb 10, 2019

I'm having the same issue with the Serializer.is_valid method, any luck?

@harshavardhanpillimitla
Copy link

harshavardhanpillimitla commented Nov 9, 2023

now , i think this issue is resolved ,

class UniquenessModel(models.Model):
    username = models.CharField(unique=True, max_length=100)

class UniquenessSerializer(serializers.ModelSerializer):
    class Meta:
        model = UniquenessModel
        fields = '__all__'

data = [{'username': 'non-existing'}, {'username': 'non-existing'}]
serializer = UniquenessSerializer(data=data, many=True)
serializer.is_valid()   # True (now it is returns False)

@auvipy
Copy link
Member

auvipy commented Nov 10, 2023

now , i think this issue is resolved ,

class UniquenessModel(models.Model):
    username = models.CharField(unique=True, max_length=100)

class UniquenessSerializer(serializers.ModelSerializer):
    class Meta:
        model = UniquenessModel
        fields = '__all__'

data = [{'username': 'non-existing'}, {'username': 'non-existing'}]
serializer = UniquenessSerializer(data=data, many=True)
serializer.is_valid()   # True (now it is returns False)

thanks. based on this I think we can close the issue for now. please report back if anything break again

@auvipy auvipy closed this as completed Nov 10, 2023
@auvipy auvipy added this to the 3.15 milestone Nov 10, 2023
@auvipy
Copy link
Member

auvipy commented Nov 10, 2023

now , i think this issue is resolved ,

class UniquenessModel(models.Model):
    username = models.CharField(unique=True, max_length=100)

class UniquenessSerializer(serializers.ModelSerializer):
    class Meta:
        model = UniquenessModel
        fields = '__all__'

data = [{'username': 'non-existing'}, {'username': 'non-existing'}]
serializer = UniquenessSerializer(data=data, many=True)
serializer.is_valid()   # True (now it is returns False)

what do you think about this PR https://github.com/encode/django-rest-framework/pull/6396/files ? should we add the test cases to see that it is actually working as per your report?

@FatemeKhodayari
Copy link

FatemeKhodayari commented Dec 11, 2023

Hi. I'm facing this issue as well. Since it's been first mentioned in 2019 and it seems that it was finally fixed about a month ago, will there be a release shipping this fix any time soon?

@auvipy @harshavardhanpillimitla

@harshavardhanpillimitla

Which version of django you are using . I tried with latest version last time
Check once with latest version
@FatemeKhodayari

@FatemeKhodayari
Copy link

FatemeKhodayari commented Dec 11, 2023

Which version of django you are using . I tried with latest version last time Check once with latest version @FatemeKhodayari

Django 4.2 .8
DRF 3.14.0

It seems DRF has no newer version yet.

@G-Kemp101
Copy link

Hi @auvipy @harshavardhanpillimitla , I don't think this issue should be closed as fixed (is it not planned instead?). The case #6395 (comment) still fails on master -serializer.is_valid() is still True and can be shown its still failing in the tests written in #6396.

@tomchristie tomchristie reopened this Apr 30, 2024
@tomchristie tomchristie closed this as not planned Won't fix, can't repro, duplicate, stale Apr 30, 2024
@tomchristie
Copy link
Member

Correct. It was closed as fixed, and then reverted.

#9310 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants