Skip to content

Build new priority score for embeddings #12458

@mattkrick

Description

@mattkrick

Now that we're targeting embeddings again, it's time to come up with a better priority score.

I propose the following weights:

  • user: are they enterprise, team, free? how much have they been spamming our embeddings?
  • job type
    • user interaction waiting on result (e.g. search query): 4
    • user interaction not waiting on result (moving to the discuss phase & wanting to see related discussions): 3
    • update corpus (e.g. add/edit/remove a page): 2
    • historical data (we built a new model or added a new object type): 1
  • job age
  • ad-hoc manual jobs (for future use)

Now we just need coefficients for each of these.

Something like: jobType * 1_000_000 + userRequestType * 100_000 - jobAge - throttlePenalty - failureCountPenalty

by adding the failureCount penalty, we could do away with the querying for queued jobs followed by failed jobs.

if we can speed up the job queue to the point to where we can poll more frequently, we won't have to get redis involved, so will try for that

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions