llama.cpp#
llama.cpp is a LLM inference framework developed in C/C++. Its goal is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud.
InternLM2, InternLM2.5 and InternLM3 can be deployed with llama.cpp by following the below instructions: