Skip to content

Remember device for 30 days - Part 1 #13166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ristomcgehee
Copy link
Contributor

@ristomcgehee ristomcgehee commented Mar 9, 2023

I am breaking this feature up into 2 PRs to make reviewing easier. This PR adds the new db table and user_service functions for saving devices. Part 2 of this feature can be found in ristomcgehee#2 in case you want to see how this code is used, but you shouldn't need to review Part 2 to review this PR. This PR can be merged to main independently of Part 2.

Part of #5867

Comment on lines +689 to +696
def _generate_device_id(self, user: User) -> str:
"""
Generates a unique device id for the given user.
"""
attempts_max = 1000
attempts = 0
while attempts < attempts_max:
device_id = secrets.token_urlsafe(DEVICE_ID_BYTES)
if not any(device.device_id == device_id for device in user.user_devices):
return device_id
attempts += 1
raise RuntimeError("Failed to generate device id")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we need a device ID and a device secret? My first thought is that it should be sufficient to generate a random token, store it in a cookie, and store the hash of that token along with a reference to the user in the database.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could even just model it as

class UserDevice(db.Model):
     __tablename__ = "user_devices"

     user_id = Column(
         UUID(as_uuid=True), ForeignKey("users.id"), nullable=False, index=True
     )
     device_id = Column(
          UUID(as_uuid=True), unique=True, nullable=False, index=True
     )
     saved_date = Column(
         DateTime(timezone=False), nullable=False, server_default=sql.func.now()
     )
     device_info = # ....maybe a JSONB column?

And instead of storing a secret at all, use the TokenService to send a signed token containing {"username": User.username, "device_id": UserDevice.id}.

We can then just see if the signed token validates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought it would be important to use a slow hash with a salt for the device secret, so that if an attacker gained access to the database, they would have a very hard time reversing the hash and finding the original secret. Using a salt means we have to hash the secret for each value we want to compare in the database. Since we're using a slow hash, I didn't want to have to hash the secret multiple times; if a user had several devices, that could add noticeable time. This is why I introduced the device ID.

Now, I'll admit, I might be going a bit overkill with the security here; a slow hash with a salt might not be necessary with a randomly-generated 128-bit string. It could be fine storing a plain SHA256 hash of the secret in the database. But since this is a highly security sensitive area, I figured I would rather err on the side of too much security.

In regards to sending a signed token instead, if we could store all the necessary information in the token and not need a new database table, I think that would be the better option. However, since we will probably want to give the user the ability to revoke devices, I think a database table is necessary. I think using a signed token or a hash of a secret are both equally good approaches. I'm inclined to leave it as a hash of a secret since that's what I already have, but if there are advantages to the signed token that I'm not thinking of, I am willing to switch to that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree keeping a record in the DB for the Device is necessary for revocation. Only reason I think I prefer the token approach is that it makes the data in the table a bit more transparent, I'm not committed to it. Definitely welcome other thoughts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I heard you on the brute-force side of things, but I'm still leaning towards using a TokenService rather than storing the hashed secret in the database.

Not to rely on tokens themselves, but just for the handling. When a device is saved we still retain the data in the DB for revocation purposes, we just send a signed token to the client that they can provide at login for warehouse to validate.

Any thoughts from the other @pypi/warehouse-admins?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change this to use a TokenService instead. I'm thinking that to complete this issue (#5867), we won't need to add the database table yet. When we do the followup work to add the revocation, we can add the database table then. For now, we would just store the cookie in the user's browser and then verify it on the server side. Does that sound reasonable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ewdurbin What are your thoughts on my previous question?

Copy link
Contributor Author

@ristomcgehee ristomcgehee Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll plan on using a signed token that is verified server side without a database table at the current time. I'm going to close this PR and create a new one since this is a significant rework.

My reasoning for not creating the database table is because revocation will not be a part of this current phase, so we shouldn't add code and data for functionality that won't actually be used. It's possible that we might not get around to implementation the revocation phase for a while. If you think I should include revocation in this current phase or I should include the database table in this current phase in preparation for future revocation, let me know, and I'm happy to adjust my plans.

Comment on lines +689 to +696
def _generate_device_id(self, user: User) -> str:
"""
Generates a unique device id for the given user.
"""
attempts_max = 1000
attempts = 0
while attempts < attempts_max:
device_id = secrets.token_urlsafe(DEVICE_ID_BYTES)
if not any(device.device_id == device_id for device in user.user_devices):
return device_id
attempts += 1
raise RuntimeError("Failed to generate device id")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I heard you on the brute-force side of things, but I'm still leaning towards using a TokenService rather than storing the hashed secret in the database.

Not to rely on tokens themselves, but just for the handling. When a device is saved we still retain the data in the DB for revocation purposes, we just send a signed token to the client that they can provide at login for warehouse to validate.

Any thoughts from the other @pypi/warehouse-admins?

Adding new db table and user_service functions for saving devices
@ristomcgehee
Copy link
Contributor Author

Closing to rework this to use a signed token as ewdurbin suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants