Skip to content

Reconsider current BERT models #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lefnire opened this issue Oct 1, 2020 · 0 comments
Closed

Reconsider current BERT models #28

lefnire opened this issue Oct 1, 2020 · 0 comments
Labels
🤖AI All the ML issues (NLP, XGB, etc) discussion Questions, feedback, discussion help wanted Extra attention is needed

Comments

@lefnire
Copy link
Collaborator

lefnire commented Oct 1, 2020

I'm using the following (code). See SOTA model leaderboards for ideas on what to try next.

  • Embeddings: sentence-transformers/roberta-base-nli-stsb-mean-tokens. This is actually really good, and the recommended model for cosine-similarity (since that's how it's trained). But we'll likely wanna tone it down computationally, see distillation.
  • Question-answering: allenai/longformer-large-4096-finetuned-triviaqa. Really solid results (better than any I've played with), but a computational non-starter; I need to get off this ASAP, and find something that performs as well but less compute.
  • Summarization: facebook/bart-large-cnn. Works great actually, but maybe someone knows of something better?
  • Emotions: mrm8488/t5-base-finetuned-emotion. Absolutely horrendous; none of my emotions are correct to my entries. But the only model I've found which has emotions; the others are just positive/negative.

Considerations:

  • Should we train/fine-tune models on entries? What would be the labels? Is that why I'm getting such poor results? Are there models which perform better off-the-shelf (no fine-tuning) than others? We'd need to consider Privacy Poly re: model-training on entries.
  • For assessment, I've found model performance in their whitepapers to be irrelevant compared to subjective performance on mine / my wife's entries. The models I chose above are after trying out tons of models, so we'd want some way to compare these subjectively?
  • I stopped trying new models after transformers<3.1.0, so their release notes since then would be useful to see what's new. 3.1.0 (pegasus, keep eye for f16 version); 3.2.0; 3.3.0; 3.3.1 (facebook/pag for QA?)
  • We prefer fast over accurate, but not by too much. We want models that can crunch as much as possible (eg Longformer's 4096 tokens is great), since it captures more together vs separate chunks. So (1) compute (2) accuracy (3) batch efficiency. But course, something that balances all 3 well (in that order)
@lefnire lefnire added help wanted Extra attention is needed discussion Questions, feedback, discussion 🤖AI All the ML issues (NLP, XGB, etc) labels Oct 1, 2020
@lefnire lefnire moved this to Beta in Gnothi Nov 6, 2022
@lefnire lefnire added this to Gnothi Nov 6, 2022
@lefnire lefnire closed this as completed Jun 24, 2023
@github-project-automation github-project-automation bot moved this from Next to Done in Gnothi Jun 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖AI All the ML issues (NLP, XGB, etc) discussion Questions, feedback, discussion help wanted Extra attention is needed
Projects
Archived in project
Development

No branches or pull requests

1 participant