2024-10-29-embeddings/videoscript4.txt


Ollama is the easiest way to run large language models on your computer, whether its Mac, Windows, or Linux. And Hugging Face has the largest collection of models available on the Internet. And it's always been easy to grab any GGUF file and many other formats of models from Hugging Face and import it into Ollama, but now Hugging Face has made it even easier to get GGUF models that are hosted on HuggingFace installed and up and running on Ollama. 

Let's take a look at how to do it. Find a GGUF page on HuggingFace. Maybe you found a link in Discord or a webpage for a model you want to try. Make sure the model is in a GGUF format or none of this will work. Here is one from Arcee AI called SuperNova Medius. Arcee is a company working on some neat ideas and you can find more about them at arcee.ai. So come up to the top of the page and click the copy button by the name. Now in a terminal run `ollama run hf.co/` and paste in the hugging face repo name. And press enter. 

Now if the model creator has done the right thing and populated the GGUF metadata for tokenizer.chat_template or created a template file in the repository using a Go template, then you should be all set. In fact this was the hardest part of creating a model in the past, finding the right template. 

Now its possible, and more common than I'd like, that the model had one template, then was fine tuned on a new template but that wasn't saved to the model. or that the template never got saved to tokenizer.chat_template. There is no requirement for it to be there. So if you get all sorts of weird output, you will need to create a modelfile that corrects the template. But let's hope you don't hit that case. 

So that's pretty cool. If the repo has multiple quantizations, then in most cases you can just add a colon to the end and type in the standard quantization label. 

And you can use this model name that starts with hf.co or huggingface.com anywhere you used modelnames before. Want to remove that new model? `ollama rm hf.co/` and then that repo name we pasted in before. Want to create a new model that uses SuperNova but adds a new system prompt or temperature? Create a model file and for your FROM line, specify hf.co/arcee-ai/SuperNova-Medius-GGUF. Then add your system prompt and any other details you want to change. Now `ollama create sn -f supernova.modelfile` or whatever you called that modelfile before and you now have a new model called sn. 

And there is nothing special about the ollama cli. Any UI that uses the ollama api should work as is. I just tried it in PageAssist, a cool UI that runs as a Chrome Extension and it worked perfectly well. All the UI devs just got a super new feature yesterday without lifting a finger. 

So what is different about these models? Nothing really. The files still have to be downloaded and are still running locally. But if we take a look at the filesystem we can see one change. I'll go into my .ollama directory off of my home folder. Then Models and Manifests and you can see I have a new registry. If you used hf.co in your model name you will see hf.co here. If you used huggingface.com instead, then you will see that here. Go into the directory and then you will see arcee-ai. then supernova-medius-gguf. then in there is latest. latest is the manifest that describes each of the layers of the model. If you used a different quantization, then that will be the name of the file instead of latest. You can see there are no parameters defined in this model, just the modelweights and the template. The filenames are the sha256 strings. You can find those in .ollama/models/blobs. 

So if later on you run `ollama run huggingface.com/arcee-ai/SuperNova-Medius-GGUF` it will download a new manifest stored in the huggingface.com registry folder, but the layers will have the same sha256 values so nothing else will need to be downloaded, unless of course there is an update to the model weights. 

There is one circumstance that will not work though. If the model authors on Hugging Face require some sort of login or accepting a license or it's a private model, then you won't be able to use this new process and instead will have to follow the older process. 

The challenge here is that you have to find the template to use. Once you do, you can download the gguf file, specify the file in the FROM directive, then add the template to the file. Huggingface seems to have come up with a great way to convert jinja2 templates into what Ollama uses. Hopefully that will find it's way into Ollama as well and we don't have to specify a template if it is defined in the model weights file. But looking around, many models don't have the template defined in the model anyway. 

So how do we see if the template is defined in the weights file? let's take a look at supernova. Next to every gguf file is a little button with two g's. Scroll down and you will find tokenizer.chat_template. That is the template, but its probably going to be in jinja2 format, whereas ollama uses gotemplates. It's a different format. Optionally the maker of the model can add a template file with the template in gotemplate format and it will get added to the model. They can also add a file called system that will set the system prompt and params which will set the parameters. 

I am assuming that in order to get this to work, Huggingface has created a new ollama registry that grabs files from huggingface. It's a really cool implementation and I am super excited by this. A huge number of issues in the ollama github are for folks asking for a model to get added to the ollama registry, and this probably solves a lot of them right away. 

what do you think? is there a model you are looking forward to playing with using this new process? Share it with me in the comments below. Thanks so much for watching. Goodbye.