Differences

This shows you the differences between two versions of the page.

--- models [2026/03/17 06:40] – created tys
+++ models [2026/03/18 00:57] (current) – tys
@@ Line 1: / Line 1: @@
 ====== Models ======
-There are a great numbers of models available on the internet. As a result, this does not even begin to form an exhaustive list.
+An "AI" lives in the Model - a large data file loaded by a [[backends|backend]] or [[interfaces|interface]] which is then able to perform inference. There are thousands upon thousands of models in existence, each with their own unique behavior.
+For a number of reasons, an exhaustive list of models cannot be provided. Instead, this page aims to provide a basic understanding and orientation of model type and selection.
+===== Hardware Requirements =====
+The most important element in model selection is "Can your hardware run it?" The rough rule-of-thumb is that with a 4-bit quantization (Q4*), it takes 1GB of RAM per billion parameters. It is strongly recommended to use the GPU (and therefore the associated VRAM) to perform inference, as CPU (and system RAM) is far slower.
+On mobile devices or other unified memory architectures, a better estimate is 1.5GB of RAM per billion parameters.
+===== Quantization =====
+Roughly explained, models are a collection of numbers (weights). When first created, 16 or 32-bit floating point numbers are used. These numbers can be reduced to fewer bits (2-8 bits, commonly), reducing memory footprint and increasing inference speed at the cost of reduced quality.
+Fortunately, these trade-offs are non-linear. 4-bit quantization is considered the "sweet spot" with minimal losses in model quality exchanged for large reductions in size and memory requirements. Currently, Q4_K_M is frequently recommended as an ideal quantization format.
+===== Oft-Recommended Models =====
 The most popular recommendation is Mistral Nemo 12B Instruct, with a 4-bit quantization available [[https://huggingface.co/starble-dev/Mistral-Nemo-12B-Instruct-2407-GGUF?show_file_info=Mistral-Nemo-12B-Instruct-2407-Q4_K_M.gguf|here]].