====== Backends ======
By itself, an [[models|LLM]] is just a block of data. The software required to perform inference with that data is typically called a "Backend".

Sometimes the [[interfaces|interface]] includes a backend (e.g. [[interfaces:LocalTavern]]), other times it is a strict "Frontend" that requires a separate "Backend" to perform inference (e.g. [[interfaces:SillyTavern]]).

Finally, most [[interfaces]] can be connected to 3rd-party APIs which provide backend capability for you. This can be incredibly useful when you want to run models which require more hardware than you have locally available.

===== Local Backends =====
These backends run locally. As a result, their capabilities are directly related to the quality of the hardware you have available.

Additionally, each of these require a separate [[models|model]] to operate.

==== Inference Engines ====
Each of the following perform inference without needing additional software. User friendliness is not the first priority with these utilities.

^ Name ^ Notes ^
|  **Oft-Recommended Engines**  ||
| [[https://github.com/ggml-org/llama.cpp|llama.cpp]] | Reference backend. Invented GGUF format. |
| [[https://github.com/LostRuins/koboldcpp|koboldcpp]] | Based on llama.cpp with RP focus. |
|  **Other Engines**  ||
| [[https://github.com/ikawrakow/ik_llama.cpp|ExLlamaV3]] | Created exl3 format, focused on GPU performance. |
| [[https://github.com/ikawrakow/ik_llama.cpp|ik_llama]] | Improved CPU performance. |

==== Engine Manangers ====
These tools are designed to provide a user-friendly layer which handles backend needs and engine management simultaneously. If you're not sure what to pick, this is a good place to start.

^ Name ^ Notes ^
|  **Oft-Recommended Managers**  ||
| [[https://github.com/oobabooga/text-generation-webui|text-generation-webui (Oobabooga)]] | Offers all other engines here and more. |
| [[https://github.com/LostRuins/koboldcpp|koboldcpp]] | Good UI, koboldcpp engine only. |
|  **Other Managers**  ||
| [[https://localai.io/|LocalAI]] | Provides OpenAI-compatible API. |
| [[https://ollama.com/|ollama]] | Wraps llama.cpp. |
| [[https://github.com/theroyallab/tabbyAPI|tabbyAPI]] | Official API server for ExLlama engines. |

===== 3rd-Party API providers =====
These are essentially remote backends. Everything you send and receive is, at minimum, available to the provider(s). Censorship is often encountered to varying degrees.


^ Name ^ Notes ^
| [[https://aihorde.net/|AI Horde]] | Free, with limited performance and models. |
| [[https://openrouter.ai/|OpenRouter]] | Large model selection. Low(er) cost. |
| [[https://mancer.tech/|mancer]] | Low/no censorship. Free tier available. |
| [[https://novelai.net/|NovelAI]] | Low/no censorship. |
| [[https://pollinations.ai/|Pollinations]] | Free tier available with ads. |

Additionally, most commercial APIs can be utilized such as ChatGPT, Claude, Perplexity, etc.

===== Additional Resources =====
SillyTavern's page on [[https://docs.sillytavern.app/usage/api-connections/|API Connections]].