Skip to content

Conversation

@driusan
Copy link
Collaborator

@driusan driusan commented Sep 23, 2024

BulkLoadInstanceData currently returns an array of instruments for each candidate. This is inefficient because every instrument needs to be loaded into memory, when the results are only ever used in a foreach loop. This replaces the array_map with a generator that yields the instrument, and uses an unbuffered database connection.

This uses significantly less memory for large datasets as now only one CommentID needs to be loaded into memory at a time.

@driusan
Copy link
Collaborator Author

driusan commented Sep 23, 2024

Blocked by #9344 (as it uses getNewDatabaseConnection)

*
* @param $commentIDs string[] A list of commentIDs to load in bulk.
*
* @return \NDB_BVL_Instrument[]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type shouldn't have to change but phan isn't great at understanding generators and doesn't recognize them as sometype[] compatible.

@driusan driusan added the State: Blocked PR or issue awaiting an external event such as the merge or another PR to proceed label Sep 23, 2024
@driusan driusan removed the State: Blocked PR or issue awaiting an external event such as the merge or another PR to proceed label Sep 24, 2024
BulkLoadInstanceData currently returns an array of instruments
for each candidate. This is inefficient because every instrument
needs to be loaded into memory, when the results are only ever
used in a foreach loop. This replaces the array_map with a generator
that yields the instrument, and uses an unbuffered database connection.

This uses significantly less memory for large datasets as now only one
CommentID needs to be loaded into memory at a time.
@kongtiaowang kongtiaowang added the Passed manual tests PR has been successfully tested by at least one peer label Sep 25, 2024
Copy link
Contributor

@kongtiaowang kongtiaowang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@driusan driusan merged commit 9951a20 into aces:main Sep 25, 2024
ZhichGaming pushed a commit to ZhichGaming/Loris that referenced this pull request Nov 25, 2024
BulkLoadInstanceData currently returns an array of instruments for each
candidate. This is inefficient because every instrument needs to be
loaded into memory, when the results are only ever used in a foreach
loop. This replaces the array_map with a generator that yields the
instrument, and uses an unbuffered database connection.

This uses significantly less memory for large datasets as now only one
CommentID needs to be loaded into memory at a time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Passed manual tests PR has been successfully tested by at least one peer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants