llama.cpp

llama.cpp#

llama.cpp is a LLM inference framework developed in C/C++. Its goal is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud.

InternLM2, InternLM2.5 and InternLM3 can be deployed with llama.cpp by following the below instructions:

Refer this guide to build llama.cpp from source
Convert the InternLM model to GGUF model and run it according to the guide