Package llama.cpp: Information

    Source package: llama.cpp
    Version: 5753-alt1
    Build time:  Aug 16, 2025, 11:56 PM
    Report package bug
    License: MIT
    Summary: LLM inference in C/C++
    Description: 
    Plain C/C++ implementation (of inference of many LLM models) without
    dependencies. AVX, AVX2, AVX512, and AMX support for x86 architectures.
    Mixed F16/F32 precision. 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and
    8-bit integer quantization for faster inference and reduced memory use.
    Supports CPU, GPU, and hybrid CPU+GPU inference.
    
    Supported models:
    
       LLaMA models, Mistral 7B, Mixtral MoE, Falcon, Chinese LLaMA /
       Alpaca and Chinese LLaMA-2 / Alpaca-2, Vigogne (French), Koala,
       Baichuan 1 & 2 + derivations, Aquila 1 & 2, Starcoder models, Refact,
       Persimmon 8B, MPT, Bloom, Yi models, StableLM models, Deepseek models,
       Qwen models, PLaMo-13B, Phi models, GPT-2, Orion 14B, InternLM2,
       CodeShell, Gemma, Mamba, Grok-1, Xverse, Command-R models, SEA-LION,
       GritLM-7B + GritLM-8x7B, OLMo, GPT-NeoX + Pythia,  Snowflake-Arctic
       MoE, Smaug, Poro 34B, Bitnet b1.58 models, Flan T5, Open Elm models,
       ChatGLM3-6b + ChatGLM4-9b + GLMEdge-1.5b + GLMEdge-4b, SmolLM,
       EXAONE-3.0-7.8B-Instruct, FalconMamba Models, Jais, Bielik-11B-v2.3,
       RWKV-6, QRWKV-6, GigaChat-20B-A3B, Trillion-7B-preview, Ling models
    
    Multimodal models:
    
       LLaVA 1.5 models, BakLLaVA, Obsidian, ShareGPT4V, MobileVLM 1.7B/3B
       models, Yi-VL, Mini CPM, Moondream, Bunny, GLM-EDGE, Qwen2-VL
    
    NOTE 1: For data format conversion script to work you will need to:
    
      pip3 install -r /usr/share/llama.cpp/requirements.txt
    
    NOTE 2:
      MODELS ARE NOT PROVIDED. You'll need to download them from the original
      sites (or Hugging Face Hub).
    
    Overall this is all raw and EXPERIMENTAL, no warranty, no support.

    List of RPM packages built from this SRPM:
    libllama (e2kv6, e2kv5, e2kv4, e2k)
    libllama-debuginfo (e2kv6, e2kv5, e2kv4, e2k)
    libllama-devel (e2kv6, e2kv5, e2kv4, e2k)
    llama.cpp (e2kv6, e2kv5, e2kv4, e2k)
    llama.cpp-cpu (e2kv6, e2kv5, e2kv4, e2k)
    llama.cpp-cpu-debuginfo (e2kv6, e2kv5, e2kv4, e2k)
    llama.cpp-vulkan (e2kv6, e2kv5, e2kv4, e2k)
    llama.cpp-vulkan-debuginfo (e2kv6, e2kv5, e2kv4, e2k)

    Maintainer: Vitaly Chikunov

    List of contributors:
    Vitaly Chikunov

      1. gcc-c++
      2. gcc12-c++
      3. glslc
      4. cmake
      5. ctest
      6. libcurl-devel
      7. libgomp-devel
      8. rpm-macros-cmake
      9. libstdc++-devel-static
      10. libvulkan-devel
      11. nvidia-cuda-devel-static
      12. tinyllamas-gguf

    Last changed


    June 25, 2025 Vitaly Chikunov 1:5753-alt1
    - Update to b5753 (2025-06-24).
    - Install an experimental rpc backend and server. The rpc code is a
      proof-of-concept, fragile, and insecure.
    May 10, 2025 Vitaly Chikunov 1:5332-alt1
    - Update to b5332 (2025-05-09), with vision support in llama-server.
    - Enable Vulkan backend (for GPU) in llama.cpp-vulkan package.
    March 10, 2025 Vitaly Chikunov 1:4855-alt1
    - Update to b4855 (2025-03-07).
    - Enable CUDA backend (for NVIDIA GPU) in llama.cpp-cuda package.
    - Disable BLAS backend (issues/12282).
    - Install bash-completions.