Support configuring whether to load the entire model into memory or use mmap #4

xISSAx · 2023-04-12T11:19:48Z

Greetings,
Love the application and UX!

I noticed Llama cpp running on my M1 was flushing the memory during and after each generation causing slower-than-expected outputs.
This can be fixed by passing "-mlock" argument, which massively boosts Mac M1 performance by locking the model into the memory.

However, currently, LlamaChat has a similar issue, and I believe it can be fixed by passing a simple '-mlock' argument. In fact, I suggest leaving it ON by default for a seamless beginner's experience for M1s.

Moreover, please also consider an advanced feature to allow users to change the parameters.

alexrozanski · 2023-04-12T12:21:06Z

Thanks @xISSAx. You're right, LlamaChat sets the mlock parameter to false always, since this was touted as a big performance improvement over the previous versions (which for large models I think is true)?

I need to do some more investigation into this, but I was definitely thinking of adding a switch for this. Perhaps you're right, maybe this should be enabled by default for a good FTUE, but configurable if people need it.

alexrozanski · 2023-04-21T12:18:12Z

Added in v1.2.0

alexrozanski added the enhancement New feature or request label Apr 12, 2023

alexrozanski added this to the v1.2 milestone Apr 17, 2023

alexrozanski changed the title ~~Mac M1 Memory Flush - Llama cpp~~ Support configuring whether to load the entire model into memory or use mmap Apr 17, 2023

alexrozanski mentioned this issue Apr 17, 2023

Expose relevant, configurable properties on llama_context to Swift bindings in llama.swift alexrozanski/llama.swift#3

Open

alexrozanski self-assigned this Apr 17, 2023

alexrozanski closed this as completed Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support configuring whether to load the entire model into memory or use mmap #4

Support configuring whether to load the entire model into memory or use mmap #4

xISSAx commented Apr 12, 2023

alexrozanski commented Apr 12, 2023

alexrozanski commented Apr 21, 2023

Support configuring whether to load the entire model into memory or use mmap #4

Support configuring whether to load the entire model into memory or use mmap #4

Comments

xISSAx commented Apr 12, 2023

alexrozanski commented Apr 12, 2023

alexrozanski commented Apr 21, 2023