models
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revision | |||
| models [2026/03/17 06:40] – tys | models [2026/03/18 00:57] (current) – tys | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Models ====== | ====== Models ====== | ||
| - | There are a great numbers | + | An " |
| + | For a number of reasons, an exhaustive list of models cannot be provided. Instead, this page aims to provide a basic understanding and orientation of model type and selection. | ||
| + | |||
| + | ===== Hardware Requirements ===== | ||
| + | The most important element in model selection is "Can your hardware run it?" The rough rule-of-thumb is that with a 4-bit quantization (Q4*), it takes 1GB of RAM per billion parameters. It is strongly recommended to use the GPU (and therefore the associated VRAM) to perform inference, as CPU (and system RAM) is far slower. | ||
| + | |||
| + | On mobile devices or other unified memory architectures, | ||
| + | |||
| + | ===== Quantization ===== | ||
| + | Roughly explained, models are a collection of numbers (weights). When first created, 16 or 32-bit floating point numbers are used. These numbers can be reduced to fewer bits (2-8 bits, commonly), reducing memory footprint and increasing inference speed at the cost of reduced quality. | ||
| + | |||
| + | Fortunately, | ||
| + | |||
| + | ===== Oft-Recommended Models ===== | ||
| The most popular recommendation is Mistral Nemo 12B Instruct, with a 4-bit quantization available [[https:// | The most popular recommendation is Mistral Nemo 12B Instruct, with a 4-bit quantization available [[https:// | ||
| - | More information to come. | ||
models.1773729625.txt.gz · Last modified: by tys
