====== Backends ====== By itself, an [[models|LLM]] is just a block of data. The software required to perform inference with that data is typically called a "Backend". Sometimes the [[interfaces|interface]] includes a backend (e.g. [[interfaces:LocalTavern]]), other times it is a strict "Frontend" that requires a separate "Backend" to perform inference (e.g. [[interfaces:SillyTavern]]). Finally, most [[interfaces]] can be connected to 3rd-party APIs which provide backend capability for you. This can be incredibly useful when you want to run models which require more hardware than you have locally available. ===== Local Backends ===== These backends run locally. As a result, their capabilities are directly related to the quality of the hardware you have available. Additionally, each of these require a separate [[models|model]] to operate. ==== Inference Engines ==== Each of the following perform inference without needing additional software. User friendliness is not the first priority with these utilities. ^ Name ^ Notes ^ | **Oft-Recommended Engines** || | [[https://github.com/ggml-org/llama.cpp|llama.cpp]] | Reference backend. Invented GGUF format. | | [[https://github.com/LostRuins/koboldcpp|koboldcpp]] | Based on llama.cpp with RP focus. | | **Other Engines** || | [[https://github.com/ikawrakow/ik_llama.cpp|ExLlamaV3]] | Created exl3 format, focused on GPU performance. | | [[https://github.com/ikawrakow/ik_llama.cpp|ik_llama]] | Improved CPU performance. | ==== Engine Manangers ==== These tools are designed to provide a user-friendly layer which handles backend needs and engine management simultaneously. If you're not sure what to pick, this is a good place to start. ^ Name ^ Notes ^ | **Oft-Recommended Managers** || | [[https://github.com/oobabooga/text-generation-webui|text-generation-webui (Oobabooga)]] | Offers all other engines here and more. | | [[https://github.com/LostRuins/koboldcpp|koboldcpp]] | Good UI, koboldcpp engine only. | | **Other Managers** || | [[https://localai.io/|LocalAI]] | Provides OpenAI-compatible API. | | [[https://ollama.com/|ollama]] | Wraps llama.cpp. | | [[https://github.com/theroyallab/tabbyAPI|tabbyAPI]] | Official API server for ExLlama engines. | ===== 3rd-Party API providers ===== These are essentially remote backends. Everything you send and receive is, at minimum, available to the provider(s). Censorship is often encountered to varying degrees. ^ Name ^ Notes ^ | [[https://aihorde.net/|AI Horde]] | Free, with limited performance and models. | | [[https://openrouter.ai/|OpenRouter]] | Large model selection. Low(er) cost. | | [[https://mancer.tech/|mancer]] | Low/no censorship. Free tier available. | | [[https://novelai.net/|NovelAI]] | Low/no censorship. | | [[https://pollinations.ai/|Pollinations]] | Free tier available with ads. | Additionally, most commercial APIs can be utilized such as ChatGPT, Claude, Perplexity, etc. ===== Additional Resources ===== SillyTavern's page on [[https://docs.sillytavern.app/usage/api-connections/|API Connections]].