Package llama.cpp: Information
Source package: llama.cpp
Version: 5753-alt1
Build time: Aug 16, 2025, 11:56 PM
Category: Sciences/Computer science
Report package bugHome page: https://github.com/ggerganov/llama.cpp
License: MIT
Summary: LLM inference in C/C++
Description:
Plain C/C++ implementation (of inference of many LLM models) without dependencies. AVX, AVX2, AVX512, and AMX support for x86 architectures. Mixed F16/F32 precision. 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use. Supports CPU, GPU, and hybrid CPU+GPU inference. Supported models: LLaMA models, Mistral 7B, Mixtral MoE, Falcon, Chinese LLaMA / Alpaca and Chinese LLaMA-2 / Alpaca-2, Vigogne (French), Koala, Baichuan 1 & 2 + derivations, Aquila 1 & 2, Starcoder models, Refact, Persimmon 8B, MPT, Bloom, Yi models, StableLM models, Deepseek models, Qwen models, PLaMo-13B, Phi models, GPT-2, Orion 14B, InternLM2, CodeShell, Gemma, Mamba, Grok-1, Xverse, Command-R models, SEA-LION, GritLM-7B + GritLM-8x7B, OLMo, GPT-NeoX + Pythia, Snowflake-Arctic MoE, Smaug, Poro 34B, Bitnet b1.58 models, Flan T5, Open Elm models, ChatGLM3-6b + ChatGLM4-9b + GLMEdge-1.5b + GLMEdge-4b, SmolLM, EXAONE-3.0-7.8B-Instruct, FalconMamba Models, Jais, Bielik-11B-v2.3, RWKV-6, QRWKV-6, GigaChat-20B-A3B, Trillion-7B-preview, Ling models Multimodal models: LLaVA 1.5 models, BakLLaVA, Obsidian, ShareGPT4V, MobileVLM 1.7B/3B models, Yi-VL, Mini CPM, Moondream, Bunny, GLM-EDGE, Qwen2-VL NOTE 1: For data format conversion script to work you will need to: pip3 install -r /usr/share/llama.cpp/requirements.txt NOTE 2: MODELS ARE NOT PROVIDED. You'll need to download them from the original sites (or Hugging Face Hub). Overall this is all raw and EXPERIMENTAL, no warranty, no support.
List of RPM packages built from this SRPM:
libllama (e2kv6, e2kv5, e2kv4, e2k)
libllama-debuginfo (e2kv6, e2kv5, e2kv4, e2k)
libllama-devel (e2kv6, e2kv5, e2kv4, e2k)
llama.cpp (e2kv6, e2kv5, e2kv4, e2k)
llama.cpp-cpu (e2kv6, e2kv5, e2kv4, e2k)
llama.cpp-cpu-debuginfo (e2kv6, e2kv5, e2kv4, e2k)
llama.cpp-vulkan (e2kv6, e2kv5, e2kv4, e2k)
llama.cpp-vulkan-debuginfo (e2kv6, e2kv5, e2kv4, e2k)
libllama (e2kv6, e2kv5, e2kv4, e2k)
libllama-debuginfo (e2kv6, e2kv5, e2kv4, e2k)
libllama-devel (e2kv6, e2kv5, e2kv4, e2k)
llama.cpp (e2kv6, e2kv5, e2kv4, e2k)
llama.cpp-cpu (e2kv6, e2kv5, e2kv4, e2k)
llama.cpp-cpu-debuginfo (e2kv6, e2kv5, e2kv4, e2k)
llama.cpp-vulkan (e2kv6, e2kv5, e2kv4, e2k)
llama.cpp-vulkan-debuginfo (e2kv6, e2kv5, e2kv4, e2k)
Maintainer: Vitaly Chikunov
Last changed
June 25, 2025 Vitaly Chikunov 1:5753-alt1
- Update to b5753 (2025-06-24). - Install an experimental rpc backend and server. The rpc code is a proof-of-concept, fragile, and insecure.
May 10, 2025 Vitaly Chikunov 1:5332-alt1
- Update to b5332 (2025-05-09), with vision support in llama-server. - Enable Vulkan backend (for GPU) in llama.cpp-vulkan package.
March 10, 2025 Vitaly Chikunov 1:4855-alt1
- Update to b4855 (2025-03-07). - Enable CUDA backend (for NVIDIA GPU) in llama.cpp-cuda package. - Disable BLAS backend (issues/12282). - Install bash-completions.