ollama是大模型演示的方便工具,但是有时候我们需要修改其配置(例如模型留驻GPU的时间),首先:
ollama serve -h
可以看到能够设置的环境变量:
Environment Variables:
OLLAMA_DEBUG Show additional debug information (e.g. OLLAMA_DEBUG=1)
OLLAMA_HOST IP Address for the ollama server (default 127.0.0.1:11434)
OLLAMA_KEEP_ALIVE The duration that models stay loaded in memory (default "5m")
OLLAMA_MAX_LOADED_MODELS Maximum number of loaded models (default 1)
OLLAMA_MAX_QUEUE Maximum number of queued requests
OLLAMA_MODELS The path to the models directory
OLLAMA_NUM_PARALLEL Maximum number of parallel requests (default 1)
OLLAMA_NOPRUNE Do not prune model blobs on startup
OLLAMA_ORIGINS A comma separated list of allowed origins
OLLAMA_TMPDIR Location for temporary files
如果要改驻留时间,就修改OLLAMA_KEEP_ALIVE,那这个环境变量是什么单位呢?查看一下这个网页:https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately
https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately
可以看到,指定上面的时间有几种选择:
a duration string (such as "10m" or "24h")
a number in seconds (such as 3600)
any negative number which will keep the model loaded in memory (e.g. -1 or "-1m")
'0' which will unload the model immediately after generating a response
例如我们在windows环境变量中可以把OLLAMA_KEEP_ALIVE改成1h,OLLAMA_NUM_PARALLEL改成2,就可以同时有两个并发访问,并且驻留时间为1h了(如果用ollama ps则会显示59 minutes)。就简单记录这么多。
补充一点:我发现在windows上需要重启系统后上面这个环境变量才会真正生效。
推荐本站淘宝优惠价购买喜欢的宝贝:
本文链接:https://hqyman.cn/post/9542.html 非本站原创文章欢迎转载,原创文章需保留本站地址!
打赏

微信支付宝扫一扫,打赏作者吧~
休息一下~~