Tekoäly omalla koneella

embedded · 21.08.2024

En löytänyt tekoälykeskustelua täältä!?
Ajatteko tekoälyä omalla koneella ja miksi tai mitä siitä hyötyy ?
Minua kiinnostaisi sellainen tekoäly jolle voisi antaa materiaalina vaikka 400-sivuisen pdf:n / tekstitiedoston tai vaikka .epubin jonka se sitten lukisi pysyvään muistiin, ei vain nykyiseen sessioon.
Tämän jälkeen tekstiin liittyen voisi kysyä kysymyksiä.

Tiedättekö onko tällaista jo ? omalla koneella ajettavaa, ei mitään kk-maksullista pilvipalvelua. Koneesta löytyy kyllä muistia ram+gpu ja kaupastahan sitä saa lisää

Tuomass · 21.08.2024

Itsellä käytössä Ollama (Llama3) sekä Fabric AI – Copilot for all your apps, clouds and files. dokkareille. Nuo ei laskentatehoa hirveästi vaadi, joten suositten erillistä serveriä, pyörii vaikka RasPilla

Tinke-80 · 21.08.2024

LM Studio - Local AI on your computer

Run local AI models like gpt-oss, Llama, Gemma, Qwen, and DeepSeek privately on your computer.

lmstudio.ai

Vahva suositus.

totallynotrobot · 21.08.2024

Tämä kannattaa myös tsekata, jos löytyy sopiva kone: NVIDIA ChatRTX mitään suurempaa kokemusta ei itsellä tuosta ole, mutta vähän olen kokeillut ja on mm. juuri kuvailemaasi tarkoitukseen sopiva.

Jumi · 22.08.2024

Liippaa sen verta läheltä aihetta, että mainostan PC:llä paikallisesti toimivaa OpenAI Whisper neuroverkkoa. Subtitle Edit käyttää sitä ja tekee (litteroi ja kääntää) kätevästi tekstitykset leffaan ku leffaan. Isoin vika on, että japanista kääntäessä sinä ja minä on aina väärinpäin, mutta en valita. Toimii tarpeeksi hyvin. Ukraina ja Venäjä taipuu kuulemma myös hyvin.

PTohtori · 24.08.2024

totallynotrobot sanoi:
Tämä kannattaa myös tsekata, jos löytyy sopiva kone: NVIDIA ChatRTX mitään suurempaa kokemusta ei itsellä tuosta ole, mutta vähän olen kokeillut ja on mm. juuri kuvailemaasi tarkoitukseen sopiva.

Itsellä myös on käytössä tämä ja asennuksen sekä käytön helppous on juuri omiin pikku testailuihin hyvä. Toki täytyy ottaa huomioon että Suomi ei oikein taivu, mutta englanninkielistä aineistoa ja prompteja tämä hallitsee mielestäni yllättävän hyvin. Jos löytyy valmiina RTX30-sarjalainen tai parempi 8gigan muistilla niin eikun kokeilemaan!

takomo · 14.12.2025

Olipa lähellä uuden ketjun perustaminen, mutta olihan täällä tällainen viriili ketju...
Millaisia kokemuksia on tekoälymallien ajamisesta paikallisesti, varsinkin jos rauta ei ole ihan parasta mahdollista?

Asentelin tässä llama.cpp:n ja parin hutilatauksen jälkeen löytyi oikeanlainen mallikin ajoon (llama.cpp pitää GGUF-formaatissa olevista malleista). Niitä on sitten löytynyt useampiakin.
https://huggingface.co/models näyttää olevan hyvä sivu etsiä laajalti erilaisia malleja. Onkohan vahvoja suosituksia, mitkä olisivat hyviä?

Kokeiluun valikoitui lähinnä erilaisia ranskalaisen mistral.ai:n malleja. Kun odotukset eivät ole korkealla, niin jopa 2-bittinen Mistral-7B (näemmä vanha kuin taivas) tuottaa yllättävän järkeviä vastauksia

Koodi:

~/src/llama.cpp/build$ bin/llama-cli -ngl 24  -cmoe  -m /opt/ai/mistral/7B-Instruct-v0.3/Q2_K-00001-of-00001.gguf --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model... 


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7372-482211438
model      : Q2_K-00001-of-00001.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> What are the most recent GPUs you know about? How they perform in AI applications?

 As of my last training (2021-03-23), here are some of the most recent GPUs and their performance in AI applications:

1. NVIDIA Ampere Series (2020):
   - NVIDIA GeForce RTX 3080: This GPU has 10GB of GDDR6X memory, 10,240 CUDA cores, and 320 Tensor cores. It's designed for high-performance gaming and AI applications.
   - NVIDIA GeForce RTX 3090: This GPU has 24GB of GDDR6X memory, 10,240 CUDA cores, and 320 Tensor cores. It's designed for high-performance gaming, AI research, and data science.
   - NVIDIA A100: This GPU is part of NVIDIA's data center GPU series. It has 40GB of HBM2 memory, 72 Turing-SoCs, and 480 Tensor cores. It's designed for AI research, data science, and high-performance computing.

In terms of AI performance, these GPUs are designed to accelerate AI training and inference. They support NVIDIA's AI and data science acceleration technologies like Tensor Cores, CUDA-X, and NVIDIA's software-defined architecture. These technologies enable faster AI training, faster data science, and better performance in AI applications.

For example, the NVIDIA A100 delivers up to 28 TFLOPS of AI performance, 102 TFLOPS of mixed-precision Turing-SP performance, and 204 TFLOPS of mixed-precision Turing-FP16 performance. This makes it ideal for AI research, data science, and high-performance computing.

The NVIDIA GeForce RTX 3080 and 3090 are also great for AI applications, offering faster AI training and inference compared to previous generations. They're also great for high-performance gaming.

[ Prompt: 75.4 t/s | Generation: 23.0 t/s ]

> /exit


Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]     | total   free    self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CUDA0 (GTX 1650 SUPER) |  3901 =  597 + (2397 =  1836 +     384 +     177) +         906 |
llama_memory_breakdown_print: |   - Host                   |                 2740 =  2596 +     128 +      16                |

Joulukuussa julkaistu Ministral 3 on paremmin ajan hermolla mutta selvästi hitaampi. Tämän raudan kyvyt alkavat tulla vastaan.

Koodi:

~/src/llama.cpp/build$ bin/llama-cli -ngl 15 -m /opt/ai/mistral/Ministral-3-8B-Instruct-2512-Q4_K_M.gguf -c 4096 --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model... 


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7372-482211438
model      : Ministral-3-8B-Instruct-2512-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> What are the most recent consumer GPUs you know about? How they perform in AI applications?

As of my last knowledge update in **October 2023**, here are some of the most recent and high-performance **consumer-grade GPUs** (as of early 2024) and their relevance to **AI applications**, particularly for tasks like deep learning, inference, and training:

---

### **1. NVIDIA GPUs (Dominant in AI)**
NVIDIA continues to lead in AI acceleration with its **Hopper (H100)** architecture (primarily in data center) and **Ada Lovelace (RTX 40-series)** for consumer/enthusiast GPUs.

#### **Latest Consumer GPUs (2023–2024):**
- **RTX 4090**
  - **Architecture**: Ada Lovelace (4th-gen DLSS, improved Tensor Cores).
  - **Key Features**:
    - **24GB GDDR6X** (huge for AI workloads).
    - **76B Tensor Cores** (for mixed-precision AI training/inference).
    - **DLSS 3** (AI upscaling, useful for generative AI pipelines).
    - **PCIe 5.0** (faster data transfer).
  - **AI Performance**:
    - **Blazing fast for inference** (e.g., LLMs, transformers).
    - Supports **FP8/FP16/BF16** for efficient training.
    - **NVENC/H.266** for AI-generated video.
    - **CUDA Cores**: 16,384 (great for parallel workloads).
  - **Use Cases**: LLM fine-tuning, real-time AI rendering, generative AI (Stable Diffusion, etc.).

- **RTX 4080 Super / RTX 4080**
  - **Architecture**: Ada Lovelace (slightly less VRAM than 4090).
  - **Key Features**:
    - **16GB/12GB GDDR6X** (still strong for AI).
    - **DLSS 3**, **Tensor Cores**, and **NVENC**.
  - **AI Performance**:
    - Good for **inference** (e.g., running LLMs like Llama 2 on local machines).
    - Slightly slower than 4090 in training due to less VRAM.

- **RTX 4070 Ti Super**
  - **Architecture**: Ada Lovelace.
  - **Key Features**:
    - **16GB GDDR6X**.
    - **DLSS 3**, **Tensor Cores**, but fewer CUDA cores (~7,168).
  - **AI Performance**:
    - Decent for **lightweight AI tasks** (e.g., running smaller models like Mistral-7B with quantization).

- **RTX 4070**
  - **Architecture**: Ada Lovelace.
  - **Key Features**:
    - **12GB GDDR6X**.
    - Still supports **DLSS 3** and **Tensor Cores**.
  - **AI Performance**:
    - Limited by VRAM for large models (e.g., Llama 70B may require offloading).

#### **NVIDIA’s AI-Specific GPUs (Data Center):**
While not consumer-grade, these are relevant for AI workloads:
- **NVIDIA H100** (Hopper, **80GB HBM3**):
  - **Best for AI training** (FP8 support, **3x faster than A100** in some cases).
  - Used in **AI supercomputers** (e.g., NVIDIA GH200).
- **L40/L40S** (for enterprise AI inference).

---

### **2. AMD GPUs (Growing in AI Support)**
AMD’s **RDNA 3** and **CDNA 3** architectures are improving, but NVIDIA still dominates in AI acceleration.

#### **Latest Consumer GPUs (2023–2024):**
- **RX 7900 XTX / RX 7900 XT**
  - **Architecture**: RDNA 3.
  - **Key Features**:
    - **24GB/20GB GDDR6**.
    - **FSR 3** (AMD’s alternative to DLSS).
    - **No native Tensor Cores** (but supports **ROCm** for AI).
  - **AI Performance**:
    - **Not optimized for AI** like NVIDIA (no dedicated Tensor Cores).
    - Can run **ROCm-based AI frameworks** (e.g., PyTorch, TensorFlow on Linux).
    - Better for **general computing** than pure AI.

- **RX 7800 XT**
  - **Architecture**: RDNA 3.
  - **Key Features**:
    - **16GB GDDR6**.
    - **FSR 3**, but weaker AI capabilities than NVIDIA.

#### **AMD’s AI GPUs (Data Center):**
- **MI300X** (CDNA 3, **128GB HBM3**):
  - **Designed for AI training** (supports FP8, **10x faster than A100** in some benchmarks).
  - Used in **AI cloud providers** (AWS, Azure).

---

### **3. Intel GPUs (Emerging in AI)**
Intel’s **Arc Alchemist** (GPU) and **Gaudi** (AI accelerator) are gaining traction.

#### **Latest Consumer GPU:**
- **Arc A770 / A750**
  - **Architecture**: Alchemist.
  - **Key Features**:
    - **16GB GDDR6**.
    - **XMX (Xe Matrix Extensions)** for AI (but **not as mature as NVIDIA’s Tensor Cores**).
    - **No native AI acceleration** yet (early-stage support).
  - **AI Performance**:
    - **Not recommended for serious AI work** (yet).
    - May improve with **oneAPI** and **ROCm** support.

#### **Intel’s AI Accelerators:**
- **Gaudi 2** (for data centers):
  - **Specialized for AI training** (used by **Meta, Microsoft**).
  - **No consumer version yet**.

---

### **Performance Comparison for AI Workloads**
| GPU               | VRAM  | Tensor Cores | AI Optimization | Best For                          |
|-------------------|-------|--------------|------------------|-----------------------------------|
| **RTX 4090**      | 24GB  | 76B          | Excellent        | LLM training/inference, generative AI |
| **RTX 4080 Super**| 16GB  | 68B          | Very Good        | Inference, smaller models         |
| **RTX 4070 Ti**   | 16GB  | 58B          | Good             | Lightweight AI tasks              |
| **RX 7900 XTX**   | 24GB  | None         | Poor (ROCm)      | General computing, not AI         |
| **Arc A770**      | 16GB  | XMX (early)  | Limited          | Not for AI (yet)                  |
| **NVIDIA H100**   | 80GB  | 160B         | Best (FP8)       | Enterprise AI training             |

---

### **Key Takeaways for AI Applications**
1. **For AI Training/Inference**:
   - **NVIDIA RTX 4090** is the **best consumer GPU** for AI (LLMs, Stable Diffusion, etc.).
   - **RTX 4080 Super** is a good alternative if you need slightly less VRAM.
   - **NVIDIA H100** (data center) is **unmatched** for large-scale training.

2. **For Inference (Running Models)**:
   - **Tensor Cores** (NVIDIA) + **DLSS 3** (for AI-generated content) make RTX 40-series ideal.
   - **FP16/BF16/FP8** support helps with efficiency.

3. **For General AI (Non-NVIDIA)**:
   - **AMD RX 7900 XTX** can run **ROCm-based AI** but is **not optimized** like NVIDIA.
   - **Intel Arc** is **not recommended** for AI yet.

4. **Future Trends**:
   - **NVIDIA’s Blackwell (B100)** (expected late 2024) will push AI performance further.
   - **AMD’s CDNA 4** and **Intel’s future GPUs** may close the gap.

---
### **Recommendations**
- **For serious AI work (training/inference)**: **RTX 4090** (if budget allows) or **RTX 4080 Super**.
- **For lightweight AI (inference only)**: **RTX 4070 Ti** or **RTX 4070**.
- **For general computing + some AI**: **RX 7900 XTX** (but expect slower AI performance).
- **For enterprise AI**: **NVIDIA H100** or **AMD MI300X**.

Would you like benchmarks for specific AI tasks (e.g., running Llama 2, Stable Diffusion)?

[ Prompt: 302.5 t/s | Generation: 9.9 t/s ]

Kun mallin koko kasvaa moninkertaiseksi GPU:n muistiin nähden, suorituskyky hiipuu merkittävästi. Tulokset voivat kyllä olla tarkempia kuin pienemmällä mallilla.

~/src/llama.cpp/build$ bin/llama-cli -ngl 6 -cmoe -m /opt/ai/mistral/Devstral-Small-2-24B-Instruct-2512-Q4_1.gguf --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model...

▄▄ ▄▄
██ ██
██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
██ ██
▀▀ ▀▀

build : b7372-482211438
model : Devstral-Small-2-24B-Instruct-2512-Q4_1.gguf
modalities : text

available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read add a text file

> What are the most recent consumer GPUs you know about? How they perform in AI applications?

As of my last knowledge update in October 2023, the most recent consumer GPUs available were from NVIDIA's RTX 40 series and AMD's RX 7000 series. However, since my knowledge cutoff is 2023-10-01, I don't have information on any GPUs released after that date. For the most up-to-date information, I would recommend checking the latest releases from NVIDIA and AMD.

### Recent Consumer GPUs (as of 2023):
1. **NVIDIA RTX 40 Series**:
- **RTX 4090**: Flagship model with excellent performance in both gaming and AI tasks.
- **RTX 4080**: High-end performance, suitable for AI workloads.
- **RTX 4070**: Mid-range, still capable for AI tasks but with some limitations.
- **RTX 4060**: Entry-level for AI, better suited for lighter tasks.

2. **AMD RX 7000 Series**:
- **RX 7900 XTX**: Competitive with NVIDIA's offerings in raw performance.
- **RX 7900 XT**: Slightly lower performance but still strong.
- **RX 7800 XT**: Mid-range, good for AI but not as powerful as the higher-end models.

### Performance in AI Applications:
- **NVIDIA GPUs** are generally preferred for AI tasks due to their CUDA cores and strong support for AI frameworks like TensorFlow and PyTorch. The RTX 40 series, in particular, offers significant improvements in AI performance thanks to features like Tensor Cores and DLSS (Deep Learning Super Sampling).
- **AMD GPUs** have made strides in AI performance with their ROCm (Radeon Open Compute) platform, but they still lag behind NVIDIA in terms of software support and optimization for AI workloads.

For the most recent GPUs released after October 2023, I would need to use tools to fetch the latest information. Would you like me to do that?

[ Prompt: 104.1 t/s | Generation: 2.9 t/s ]

> /exit

Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | - CUDA0 (GTX 1650 SUPER) | 3901 = 57 + ( 2874 = 1987 + 96 + 791) + 968 |
llama_memory_breakdown_print: | - Host | 12750 = 12188 + 544 + 18 |

Joistakin malleista on myös pohdiskelevia, "Reasoning"-versioita. Kannattaa ehkä harkita, haluaako sellaista ainakaan yleiskäyttöön. Tulokset on ehkä tarpeettomankin monisanaisia, ja niitä saa odottaa kauan ellei raudassa riitä voima.

Koodi:

~/src/llama.cpp/build$ bin/llama-cli -ngl 10 -m /opt/ai/mistral/Ministral-3-14B-Reasoning-2512-Q4_K_M.gguf -c 4096 --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model... 


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7372-482211438
model      : Ministral-3-14B-Reasoning-2512-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> Please count from 1 to 10

Okay, the user wants me to count from 1 to 10. That's straightforward. I need to list the numbers in order from 1 to 10. I'll start by writing down the numbers one by one, ensuring that I don't skip any or go out of order. I'll double-check to make sure I've included all numbers correctly.

Let me start:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

I think that's correct. I'll format this in Markdown to make it clear and easy to read. I'll also ensure that the response is concise and to the point, as the user only asked for a simple count. I won't add any unnecessary explanations or fluff, but I'll make sure it's clear and accurate.

Here's how I'll phrase it:

"1, 2, 3, 4, 5, 6, 7, 8, 9, 10."

But perhaps it's better to list them on separate lines for clarity, especially if the user wants to see each number distinctly. I'll try that:

1
2
3
4
5
6
7
8
9
10

Yes, that looks better. It's more readable, especially if the user wants to see each number on its own line. I'll go with that format. I'll also make sure that the response is in a code block to maintain the formatting, but since it's a simple list, it might not be necessary. However, using a code block can sometimes make it clearer that this is the exact output.

But since it's a simple count, perhaps just listing them with line breaks is sufficient. I'll proceed with that. I'll also double-check that I haven't missed anything or made any mistakes in the sequence.

Let me verify the sequence:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

Yes, that's correct. I'll write it out with line breaks now.1
2
3
4
5
6
7
8
9
10

[ Prompt: 127.6 t/s | Generation: 5.8 t/s ]

> /exit


Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]     | total   free    self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CUDA0 (GTX 1650 SUPER) |  3901 =  233 + (2736 =  1785 +     160 +     791) +         931 |
llama_memory_breakdown_print: |   - Host                   |                 6562 =  6064 +     480 +      18                |

Onkohan raadilla näkemyksiä miten pakattua dataa malleissa kannattaa käyttää kotioloissa? Näissä data oli pakattu 4-bittiseksi paitsi ensimmäinen oli 2-bittinen.

Komentorivi on vähän karu käyttöliittymä. Olisiko tähän jotain hyviä GUI-pohjaisia ehdotuksia?

mailbag · 15.12.2025

takomo sanoi:

Olipa lähellä uuden ketjun perustaminen, mutta olihan täällä tällainen viriili ketju...
Millaisia kokemuksia on tekoälymallien ajamisesta paikallisesti, varsinkin jos rauta ei ole ihan parasta mahdollista?

Asentelin tässä llama.cpp:n ja parin hutilatauksen jälkeen löytyi oikeanlainen mallikin ajoon (llama.cpp pitää GGUF-formaatissa olevista malleista). Niitä on sitten löytynyt useampiakin.
https://huggingface.co/models näyttää olevan hyvä sivu etsiä laajalti erilaisia malleja. Onkohan vahvoja suosituksia, mitkä olisivat hyviä?

Kokeiluun valikoitui lähinnä erilaisia ranskalaisen mistral.ai:n malleja. Kun odotukset eivät ole korkealla, niin jopa 2-bittinen Mistral-7B (näemmä vanha kuin taivas) tuottaa yllättävän järkeviä vastauksia

Koodi:

~/src/llama.cpp/build$ bin/llama-cli -ngl 24  -cmoe  -m /opt/ai/mistral/7B-Instruct-v0.3/Q2_K-00001-of-00001.gguf --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7372-482211438
model      : Q2_K-00001-of-00001.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> What are the most recent GPUs you know about? How they perform in AI applications?

 As of my last training (2021-03-23), here are some of the most recent GPUs and their performance in AI applications:

1. NVIDIA Ampere Series (2020):
   - NVIDIA GeForce RTX 3080: This GPU has 10GB of GDDR6X memory, 10,240 CUDA cores, and 320 Tensor cores. It's designed for high-performance gaming and AI applications.
   - NVIDIA GeForce RTX 3090: This GPU has 24GB of GDDR6X memory, 10,240 CUDA cores, and 320 Tensor cores. It's designed for high-performance gaming, AI research, and data science.
   - NVIDIA A100: This GPU is part of NVIDIA's data center GPU series. It has 40GB of HBM2 memory, 72 Turing-SoCs, and 480 Tensor cores. It's designed for AI research, data science, and high-performance computing.

In terms of AI performance, these GPUs are designed to accelerate AI training and inference. They support NVIDIA's AI and data science acceleration technologies like Tensor Cores, CUDA-X, and NVIDIA's software-defined architecture. These technologies enable faster AI training, faster data science, and better performance in AI applications.

For example, the NVIDIA A100 delivers up to 28 TFLOPS of AI performance, 102 TFLOPS of mixed-precision Turing-SP performance, and 204 TFLOPS of mixed-precision Turing-FP16 performance. This makes it ideal for AI research, data science, and high-performance computing.

The NVIDIA GeForce RTX 3080 and 3090 are also great for AI applications, offering faster AI training and inference compared to previous generations. They're also great for high-performance gaming.

[ Prompt: 75.4 t/s | Generation: 23.0 t/s ]

> /exit


Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]     | total   free    self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CUDA0 (GTX 1650 SUPER) |  3901 =  597 + (2397 =  1836 +     384 +     177) +         906 |
llama_memory_breakdown_print: |   - Host                   |                 2740 =  2596 +     128 +      16                |

Joulukuussa julkaistu Ministral 3 on paremmin ajan hermolla mutta selvästi hitaampi. Tämän raudan kyvyt alkavat tulla vastaan.

Koodi:

~/src/llama.cpp/build$ bin/llama-cli -ngl 15 -m /opt/ai/mistral/Ministral-3-8B-Instruct-2512-Q4_K_M.gguf -c 4096 --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7372-482211438
model      : Ministral-3-8B-Instruct-2512-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> What are the most recent consumer GPUs you know about? How they perform in AI applications?

As of my last knowledge update in **October 2023**, here are some of the most recent and high-performance **consumer-grade GPUs** (as of early 2024) and their relevance to **AI applications**, particularly for tasks like deep learning, inference, and training:

---

### **1. NVIDIA GPUs (Dominant in AI)**
NVIDIA continues to lead in AI acceleration with its **Hopper (H100)** architecture (primarily in data center) and **Ada Lovelace (RTX 40-series)** for consumer/enthusiast GPUs.

#### **Latest Consumer GPUs (2023–2024):**
- **RTX 4090**
  - **Architecture**: Ada Lovelace (4th-gen DLSS, improved Tensor Cores).
  - **Key Features**:
    - **24GB GDDR6X** (huge for AI workloads).
    - **76B Tensor Cores** (for mixed-precision AI training/inference).
    - **DLSS 3** (AI upscaling, useful for generative AI pipelines).
    - **PCIe 5.0** (faster data transfer).
  - **AI Performance**:
    - **Blazing fast for inference** (e.g., LLMs, transformers).
    - Supports **FP8/FP16/BF16** for efficient training.
    - **NVENC/H.266** for AI-generated video.
    - **CUDA Cores**: 16,384 (great for parallel workloads).
  - **Use Cases**: LLM fine-tuning, real-time AI rendering, generative AI (Stable Diffusion, etc.).

- **RTX 4080 Super / RTX 4080**
  - **Architecture**: Ada Lovelace (slightly less VRAM than 4090).
  - **Key Features**:
    - **16GB/12GB GDDR6X** (still strong for AI).
    - **DLSS 3**, **Tensor Cores**, and **NVENC**.
  - **AI Performance**:
    - Good for **inference** (e.g., running LLMs like Llama 2 on local machines).
    - Slightly slower than 4090 in training due to less VRAM.

- **RTX 4070 Ti Super**
  - **Architecture**: Ada Lovelace.
  - **Key Features**:
    - **16GB GDDR6X**.
    - **DLSS 3**, **Tensor Cores**, but fewer CUDA cores (~7,168).
  - **AI Performance**:
    - Decent for **lightweight AI tasks** (e.g., running smaller models like Mistral-7B with quantization).

- **RTX 4070**
  - **Architecture**: Ada Lovelace.
  - **Key Features**:
    - **12GB GDDR6X**.
    - Still supports **DLSS 3** and **Tensor Cores**.
  - **AI Performance**:
    - Limited by VRAM for large models (e.g., Llama 70B may require offloading).

#### **NVIDIA’s AI-Specific GPUs (Data Center):**
While not consumer-grade, these are relevant for AI workloads:
- **NVIDIA H100** (Hopper, **80GB HBM3**):
  - **Best for AI training** (FP8 support, **3x faster than A100** in some cases).
  - Used in **AI supercomputers** (e.g., NVIDIA GH200).
- **L40/L40S** (for enterprise AI inference).

---

### **2. AMD GPUs (Growing in AI Support)**
AMD’s **RDNA 3** and **CDNA 3** architectures are improving, but NVIDIA still dominates in AI acceleration.

#### **Latest Consumer GPUs (2023–2024):**
- **RX 7900 XTX / RX 7900 XT**
  - **Architecture**: RDNA 3.
  - **Key Features**:
    - **24GB/20GB GDDR6**.
    - **FSR 3** (AMD’s alternative to DLSS).
    - **No native Tensor Cores** (but supports **ROCm** for AI).
  - **AI Performance**:
    - **Not optimized for AI** like NVIDIA (no dedicated Tensor Cores).
    - Can run **ROCm-based AI frameworks** (e.g., PyTorch, TensorFlow on Linux).
    - Better for **general computing** than pure AI.

- **RX 7800 XT**
  - **Architecture**: RDNA 3.
  - **Key Features**:
    - **16GB GDDR6**.
    - **FSR 3**, but weaker AI capabilities than NVIDIA.

#### **AMD’s AI GPUs (Data Center):**
- **MI300X** (CDNA 3, **128GB HBM3**):
  - **Designed for AI training** (supports FP8, **10x faster than A100** in some benchmarks).
  - Used in **AI cloud providers** (AWS, Azure).

---

### **3. Intel GPUs (Emerging in AI)**
Intel’s **Arc Alchemist** (GPU) and **Gaudi** (AI accelerator) are gaining traction.

#### **Latest Consumer GPU:**
- **Arc A770 / A750**
  - **Architecture**: Alchemist.
  - **Key Features**:
    - **16GB GDDR6**.
    - **XMX (Xe Matrix Extensions)** for AI (but **not as mature as NVIDIA’s Tensor Cores**).
    - **No native AI acceleration** yet (early-stage support).
  - **AI Performance**:
    - **Not recommended for serious AI work** (yet).
    - May improve with **oneAPI** and **ROCm** support.

#### **Intel’s AI Accelerators:**
- **Gaudi 2** (for data centers):
  - **Specialized for AI training** (used by **Meta, Microsoft**).
  - **No consumer version yet**.

---

### **Performance Comparison for AI Workloads**
| GPU               | VRAM  | Tensor Cores | AI Optimization | Best For                          |
|-------------------|-------|--------------|------------------|-----------------------------------|
| **RTX 4090**      | 24GB  | 76B          | Excellent        | LLM training/inference, generative AI |
| **RTX 4080 Super**| 16GB  | 68B          | Very Good        | Inference, smaller models         |
| **RTX 4070 Ti**   | 16GB  | 58B          | Good             | Lightweight AI tasks              |
| **RX 7900 XTX**   | 24GB  | None         | Poor (ROCm)      | General computing, not AI         |
| **Arc A770**      | 16GB  | XMX (early)  | Limited          | Not for AI (yet)                  |
| **NVIDIA H100**   | 80GB  | 160B         | Best (FP8)       | Enterprise AI training             |

---

### **Key Takeaways for AI Applications**
1. **For AI Training/Inference**:
   - **NVIDIA RTX 4090** is the **best consumer GPU** for AI (LLMs, Stable Diffusion, etc.).
   - **RTX 4080 Super** is a good alternative if you need slightly less VRAM.
   - **NVIDIA H100** (data center) is **unmatched** for large-scale training.

2. **For Inference (Running Models)**:
   - **Tensor Cores** (NVIDIA) + **DLSS 3** (for AI-generated content) make RTX 40-series ideal.
   - **FP16/BF16/FP8** support helps with efficiency.

3. **For General AI (Non-NVIDIA)**:
   - **AMD RX 7900 XTX** can run **ROCm-based AI** but is **not optimized** like NVIDIA.
   - **Intel Arc** is **not recommended** for AI yet.

4. **Future Trends**:
   - **NVIDIA’s Blackwell (B100)** (expected late 2024) will push AI performance further.
   - **AMD’s CDNA 4** and **Intel’s future GPUs** may close the gap.

---
### **Recommendations**
- **For serious AI work (training/inference)**: **RTX 4090** (if budget allows) or **RTX 4080 Super**.
- **For lightweight AI (inference only)**: **RTX 4070 Ti** or **RTX 4070**.
- **For general computing + some AI**: **RX 7900 XTX** (but expect slower AI performance).
- **For enterprise AI**: **NVIDIA H100** or **AMD MI300X**.

Would you like benchmarks for specific AI tasks (e.g., running Llama 2, Stable Diffusion)?

[ Prompt: 302.5 t/s | Generation: 9.9 t/s ]

Kun mallin koko kasvaa moninkertaiseksi GPU:n muistiin nähden, suorituskyky hiipuu merkittävästi. Tulokset voivat kyllä olla tarkempia kuin pienemmällä mallilla.

~/src/llama.cpp/build$ bin/llama-cli -ngl 6 -cmoe -m /opt/ai/mistral/Devstral-Small-2-24B-Instruct-2512-Q4_1.gguf --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model...

▄▄ ▄▄
██ ██
██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
██ ██
▀▀ ▀▀

build : b7372-482211438
model : Devstral-Small-2-24B-Instruct-2512-Q4_1.gguf
modalities : text

available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read add a text file

> What are the most recent consumer GPUs you know about? How they perform in AI applications?

As of my last knowledge update in October 2023, the most recent consumer GPUs available were from NVIDIA's RTX 40 series and AMD's RX 7000 series. However, since my knowledge cutoff is 2023-10-01, I don't have information on any GPUs released after that date. For the most up-to-date information, I would recommend checking the latest releases from NVIDIA and AMD.

### Recent Consumer GPUs (as of 2023):
1. **NVIDIA RTX 40 Series**:
- **RTX 4090**: Flagship model with excellent performance in both gaming and AI tasks.
- **RTX 4080**: High-end performance, suitable for AI workloads.
- **RTX 4070**: Mid-range, still capable for AI tasks but with some limitations.
- **RTX 4060**: Entry-level for AI, better suited for lighter tasks.

2. **AMD RX 7000 Series**:
- **RX 7900 XTX**: Competitive with NVIDIA's offerings in raw performance.
- **RX 7900 XT**: Slightly lower performance but still strong.
- **RX 7800 XT**: Mid-range, good for AI but not as powerful as the higher-end models.

### Performance in AI Applications:
- **NVIDIA GPUs** are generally preferred for AI tasks due to their CUDA cores and strong support for AI frameworks like TensorFlow and PyTorch. The RTX 40 series, in particular, offers significant improvements in AI performance thanks to features like Tensor Cores and DLSS (Deep Learning Super Sampling).
- **AMD GPUs** have made strides in AI performance with their ROCm (Radeon Open Compute) platform, but they still lag behind NVIDIA in terms of software support and optimization for AI workloads.

For the most recent GPUs released after October 2023, I would need to use tools to fetch the latest information. Would you like me to do that?

[ Prompt: 104.1 t/s | Generation: 2.9 t/s ]

> /exit

Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | - CUDA0 (GTX 1650 SUPER) | 3901 = 57 + ( 2874 = 1987 + 96 + 791) + 968 |
llama_memory_breakdown_print: | - Host | 12750 = 12188 + 544 + 18 |

Joistakin malleista on myös pohdiskelevia, "Reasoning"-versioita. Kannattaa ehkä harkita, haluaako sellaista ainakaan yleiskäyttöön. Tulokset on ehkä tarpeettomankin monisanaisia, ja niitä saa odottaa kauan ellei raudassa riitä voima.

Koodi:

~/src/llama.cpp/build$ bin/llama-cli -ngl 10 -m /opt/ai/mistral/Ministral-3-14B-Reasoning-2512-Q4_K_M.gguf -c 4096 --temp 0.15
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1650 SUPER, compute capability 7.5, VMM: yes

Loading model...


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7372-482211438
model      : Ministral-3-14B-Reasoning-2512-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> Please count from 1 to 10

Okay, the user wants me to count from 1 to 10. That's straightforward. I need to list the numbers in order from 1 to 10. I'll start by writing down the numbers one by one, ensuring that I don't skip any or go out of order. I'll double-check to make sure I've included all numbers correctly.

Let me start:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

I think that's correct. I'll format this in Markdown to make it clear and easy to read. I'll also ensure that the response is concise and to the point, as the user only asked for a simple count. I won't add any unnecessary explanations or fluff, but I'll make sure it's clear and accurate.

Here's how I'll phrase it:

"1, 2, 3, 4, 5, 6, 7, 8, 9, 10."

But perhaps it's better to list them on separate lines for clarity, especially if the user wants to see each number distinctly. I'll try that:

1
2
3
4
5
6
7
8
9
10

Yes, that looks better. It's more readable, especially if the user wants to see each number on its own line. I'll go with that format. I'll also make sure that the response is in a code block to maintain the formatting, but since it's a simple list, it might not be necessary. However, using a code block can sometimes make it clearer that this is the exact output.

But since it's a simple count, perhaps just listing them with line breaks is sufficient. I'll proceed with that. I'll also double-check that I haven't missed anything or made any mistakes in the sequence.

Let me verify the sequence:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

Yes, that's correct. I'll write it out with line breaks now.1
2
3
4
5
6
7
8
9
10

[ Prompt: 127.6 t/s | Generation: 5.8 t/s ]

> /exit


Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]     | total   free    self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CUDA0 (GTX 1650 SUPER) |  3901 =  233 + (2736 =  1785 +     160 +     791) +         931 |
llama_memory_breakdown_print: |   - Host                   |                 6562 =  6064 +     480 +      18                |

Onkohan raadilla näkemyksiä miten pakattua dataa malleissa kannattaa käyttää kotioloissa? Näissä data oli pakattu 4-bittiseksi paitsi ensimmäinen oli 2-bittinen.

Komentorivi on vähän karu käyttöliittymä. Olisiko tähän jotain hyviä GUI-pohjaisia ehdotuksia?

llama.cpp on kotikäyttäjälle ihan hyvä, itse pyysin vain tekoälyä luomaan minulle wrapperi sen ympärille niin voin käyttää sitä selaimessa ja vain niillä ominaisuuksilla mitä tarvitsen, ollut pitkään jo käytössä ja suosittelen. Q4 tarkkuus on itselleni huonoin millä suostun noita malleja käyttämään, yleensä yritän pysyä Q5-Q8 välillä riippuen minkä kokoinen malli että mahtuu kokonaan VRAM:iin.

Reddit - The heart of the internet

www.reddit.com

Jos sensuroimaton roolipelaaminen kiinnostaa niin täältä viikottain päivittyvästi threadista saa yleensä hyvät suositukset uusista custom malleista.

takomo · 18.12.2025

mailbag sanoi:
llama.cpp on kotikäyttäjälle ihan hyvä, itse pyysin vain tekoälyä luomaan minulle wrapperi sen ympärille niin voin käyttää sitä selaimessa ja vain niillä ominaisuuksilla mitä tarvitsen, ollut pitkään jo käytössä ja suosittelen. Q4 tarkkuus on itselleni huonoin millä suostun noita malleja käyttämään, yleensä yritän pysyä Q5-Q8 välillä riippuen minkä kokoinen malli että mahtuu kokonaan VRAM:iin.

Ei tarvinnut edes tehdä wrapperia, vaan lukea mitä kaikkea llama.cpp tekee automaattisesti:

Koodi:

llama-server --port xyz -m...

on simppeli web-käyttöliittymä, joka aukeaa porttiin xyz osoitteessa 127.0.0.1. Tämä on ilmeisesti aika tuore lisäys.

Pikaisella kokeilulla Q4 vaikuttaa yleisesti hyvinkin asialliselta - mutta niitäkin on puolen tusinaa eri varianttia. Missä bittimäärän kasvattaminen näkyy konkreettisesti, paitsi muistintarpeessa? Kun VRAMia on rajallisesti tulee aina vastaan kompromissi enemmän bittejä, pienempi malli vai vähemmän bittejä ja suurempi malli. Kontekstillekin olisi kiva jäädä tilaa.

finWeazel · 18.12.2025

takomo sanoi:
Pikaisella kokeilulla Q4 vaikuttaa yleisesti hyvinkin asialliselta - mutta niitäkin on puolen tusinaa eri varianttia. Missä bittimäärän kasvattaminen näkyy konkreettisesti, paitsi muistintarpeessa? Kun VRAMia on rajallisesti tulee aina vastaan kompromissi enemmän bittejä, pienempi malli vai vähemmän bittejä ja suurempi malli. Kontekstillekin olisi kiva jäädä tilaa.

Malli on alunperin opetettu ja optimoitu jollekin tietylle bittimäärälle. Bittimäärää karsimalla malli nopeutuu ja vie vähemmän muistia mutta tarkkuus vähenee. Aika moni uusi malli on jo lähtöönsä optimoitu fp4:een oli se sitten lokaalisti tai konesalissa ajossa.

Ollama on hyvä, jos haluaa helpon UI:n ja lokaalimallit Ollama Käytän ollamaa sekä windowsissa että mac os:ssa. Toimii toki myös linuxissa.

takomo · 18.12.2025

finWeazel sanoi:
Malli on alunperin opetettu ja optimoitu jollekin tietylle bittimäärälle. Bittimäärää karsimalla malli nopeutuu ja vie vähemmän muistia mutta tarkkuus vähenee. Aika moni uusi malli on jo lähtöönsä optimoitu fp4:een oli se sitten lokaalisti tai konesalissa ajossa.

Aika monet mallit näyttävät olevan natiivisti FP16/BP16, ja näistä on sitten kvantisoitu tiiviimpiä malleja. Ilmeisesti kvantisoinnin voi tehdä hyvin tai huonosti - mutta informaatiota hukataan aina. Se, miten paljon tällä on lopulta merkitystä on hyvä kysymys.

Ajatuksena kai on, että usein on merkittävää onko luku 1, 2 vai 3 mutta sillä onko luku 1,65 vai 1,68 on harvemmin väliä. Jos haet kaupasta kahden kilon kalaa, niin kilon tai kolmen kilon kala ei käy, mutta ei kalan tarvitse olla grammalleen 2 kiloa. Pitkät formaatit käyttävät paljon tilaa turhan kohinan säilömiseen.

finWeazel · 19.12.2025

takomo sanoi:
Aika monet mallit näyttävät olevan natiivisti FP16/BP16, ja näistä on sitten kvantisoitu tiiviimpiä malleja. Ilmeisesti kvantisoinnin voi tehdä hyvin tai huonosti - mutta informaatiota hukataan aina. Se, miten paljon tällä on lopulta merkitystä on hyvä kysymys.

Vanhat mallit usein fp16. Uusista esim openai optimoi suoraan fp4:lle https://openai.com/index/introducing-gpt-oss/ Todennäköisesti paras open source koodausneuroverkko optimoitu fp8 ja fp4 https://venturebeat.com/ai/mistral-launches-powerful-devstral-2-coding-model-including-open-source Hyvä FP4 tuki sen verran uusi asia gpu-raudassa(50x0 sarja nvidialta) ettei vanhoissa malleissa ollut järkeä edes miettiä ratkaisua missä pääasiassa optimoitaisiin fp4:lle. Esim. nvidian uusin gb300 konesaliraudan iso juttu oli, että vaikka muu suorituskyky säilyi ennallaan niin fp4 formaattiin saatiin 40% lisänopeutta versus GB200. Vanha hopper arkkitehtuuri ei edes tukenut fp4:sta. GB200:en ollut konesaleissa alle vuoden ja GB300:en kuukausia.

16bitillä voi esittää numerot 0-65535, 8 bitillä 0-255 ja 4 bitilla 0-15. Laskentatarkkuus väistämättä vähenee kun bittejä napsitaan pois, paljonko tällä on merkitystä riippuu miten malli on suunniteltu ja opetettu alunperin. Liukuluvuilla vähän vaikeampi kuin kokonaisluvuilla sanoa mikä tarkkuus on, mutta bittien vähyys vaikuttaa hyvin samalla tapaa kuin helpommin ymmärrettävillä kokonaisluvuilla.

Uutena juttuna mitä kukaan ei mun käsityksen mukaan käytä(ei rautatukea) niin logaritmisia numeroita käyttämällä olisi mahdollista saada samoilla bittimäärillä parempi laskentatarkkuus.

embedded · 19.12.2025

Aikaa on vierehtäny ja mikään ei edellisekerralla toiminut.

Eli miten saan ison tekstidokumentin esim juurikin raamatun omalla koneella (16GB näyttis) niin ettei siltä voi kysyä kysymyksiä eikä se hallusinoi.
Chatgpt väittää tietävänsä raamatun, mutta joka kerta keksii kuitenkin tyhjästä vastauksia, ihan mielettömiä jopa

finWeazel · 19.12.2025

embedded sanoi:
Aikaa on vierehtäny ja mikään ei edellisekerralla toiminut.

Eli miten saan ison tekstidokumentin esim juurikin raamatun omalla koneella (16GB näyttis) niin ettei siltä voi kysyä kysymyksiä eikä se hallusinoi.
Chatgpt väittää tietävänsä raamatun, mutta joka kerta keksii kuitenkin tyhjästä vastauksia, ihan mielettömiä jopa

Varmaankin privategpt + lokaali backend kuten ollama: GitHub - zylon-ai/private-gpt: Interact with your documents using the power of GPT, 100% privately, no data leaks ja Quickstart | PrivateGPT | Docs En tiedä riittääkö 16GB kortti, riippunee mallista.

Haluatko antaa esimerkin kysymyksestä mihin chatgpt antaa hallusinoivan vastauksen? Tuskin lokaalit pärjäävät maksulliselle chatgpt/gemini3/... Oma taikansa toki siinä miten promptaa mallia, voi olla tarpeen asettaa kaiteita ja rajotteita ja muutenkin hanskata promptin kontekstia että saa AI:n pysymään raiteillaan. Yksi tapa prompata tän tyylistä olisi pyytää tyyliin "Please cross reference the source book for correctness, quote and link the relevant places from book inside the answer". Sekin että pakottaa kyselyt kalliimmalle(paremmalle) mallill voi auttaa, halvin/ilmainen == sitä saa mistä maksaa.

https://www.perplexity.ai/ voi toimia paremmin

takomo · 19.12.2025

finWeazel sanoi:
Vanhat mallit usein fp16. Uusista esim openai optimoi suoraan fp4:lle https://openai.com/index/introducing-gpt-oss/ Todennäköisesti paras open source koodausneuroverkko optimoitu fp8 ja fp4 https://venturebeat.com/ai/mistral-launches-powerful-devstral-2-coding-model-including-open-source

Jos käy alkuperäislähteellä, niin Devstral 2:n osalta puhutaan painoissa BF16, FP8 ja FP4 -muistintarpeesta. (Devstral 2 - Mistral AI). Avoimessa jaossa se näyttää olevan FP8:na (mistralai/Devstral-2-123B-Instruct-2512 · Hugging Face). Ilmeisesti tämän pohjalta on sitten saatavilla muiden tekemiä tiiviimpiä kvantisointeja. Onko Mistralin FP4-versio jossain saatavilla avoimena?

Olettaisin, että malli on tehty 16-bittisenä ja se on tiivistetty 8/4-bittiseksi avoimeen jakeluun. Luultavasti mallia on optimoitu lisää 8/4-bittisenä, eikä vain pudotettu bittejä pois.

finWeazel sanoi:
16bitillä voi esittää numerot 0-65535, 8 bitillä 0-255 ja 4 bitilla 0-15. Laskentatarkkuus väistämättä vähenee kun bittejä napsitaan pois, paljonko tällä on merkitystä riippuu miten malli on suunniteltu ja opetettu alunperin. Liukuluvuilla vähän vaikeampi kuin kokonaisluvuilla sanoa mikä tarkkuus on, mutta bittien vähyys vaikuttaa hyvin samalla tapaa kuin helpommin ymmärrettävillä kokonaisluvuilla.

Jos muistista ei ole pulaa ja suorituskyky riittää, niin tietenkin kannattaa käyttää 16-bitin formaattia. Tämmöisessä "omalla koneella"-aiheessa VRAM sen sijaan on kortilla. Silloin olennainen kysymys on, että onko yksi 16-bittinen parametri hyödyllisempi kuin kaksi 8-bittistä tai neljä 4-bittistä (vaiko peräti 8 2-bittistä). Olen käsittänyt, että 4 bittiä on hyvä kompromissi tiukan muistibudjetin käyttöön. Enemmistä biteistä voi joskus olla hyötyä mutta useammin niihin säilötään vain turhaa kohinaa.

finWeazel sanoi:
Uutena juttuna mitä kukaan ei mun käsityksen mukaan käytä(ei rautatukea) niin logaritmisia numeroita käyttämällä olisi mahdollista saada samoilla bittimäärillä parempi laskentatarkkuus.

IQ4_NL on kai askel tähän suuntaan: epälineaarinen 4-bittinen kvantisointi.

finWeazel · 19.12.2025

takomo sanoi:
Jos käy alkuperäislähteellä, niin Devstral 2:n osalta puhutaan painoissa BF16, FP8 ja FP4 -muistintarpeesta. (Devstral 2 - Mistral AI). Avoimessa jaossa se näyttää olevan FP8:na (mistralai/Devstral-2-123B-Instruct-2512 · Hugging Face). Ilmeisesti tämän pohjalta on sitten saatavilla muiden tekemiä tiiviimpiä kvantisointeja. Onko Mistralin FP4-versio jossain saatavilla avoimena?

Olettaisin, että malli on tehty 16-bittisenä ja se on tiivistetty 8/4-bittiseksi avoimeen jakeluun. Luultavasti mallia on optimoitu lisää 8/4-bittisenä, eikä vain pudotettu bittejä pois.

Jos muistista ei ole pulaa ja suorituskyky riittää, niin tietenkin kannattaa käyttää 16-bitin formaattia. Tämmöisessä "omalla koneella"-aiheessa VRAM sen sijaan on kortilla. Silloin olennainen kysymys on, että onko yksi 16-bittinen parametri hyödyllisempi kuin kaksi 8-bittistä tai neljä 4-bittistä (vaiko peräti 8 2-bittistä). Olen käsittänyt, että 4 bittiä on hyvä kompromissi tiukan muistibudjetin käyttöön. Enemmistä biteistä voi joskus olla hyötyä mutta useammin niihin säilötään vain turhaa kohinaa.

IQ4_NL on kai askel tähän suuntaan: epälineaarinen 4-bittinen kvantisointi.

ollaman kirjastossa on jaossa 4bit versio: Tags · devstral-2 fp4:sta ei näytä olevan. En osaa sanoa minne fp4 versio on kadonnut, mistral ja julkaisun aikaan kirjoitetut artikkelit kuitenkin puhuvat fp4 mallista. Liekö sitten panttaavat fp4:sta ja tarjolla vain jossain maksullisessa pilvessä tms.

Jos haluaa hyvän fp4 mallin niin mallin arkkitehtuuri ja opetus on lähtöönsä suunniteltava pienellä tarkkuduella toimivaksi. Hyvä fp4 malli ei synny vain lopuksi kvantisoimalla.

Iso syy fp4:een pilvessäkin on se, että uusimmalla nvidian raudalla fp4 malli on puolet halvempi ajaa kuin fp8 ja 1/4 osa fp16 mallin hinnasta. fp4 antaa paremman tarkkuuden kuin int4(vrt. kuvat alhaalla). Hinta per token ja tietenkin myös konesalikapasiteetin riittävyys iso kisa openai vs. google vs. anthropic jne. Nopeesti paljon kilpailuetua jos saat puolet tai 3/4 osaa tokenin hinnasta pois ja konesalikapasiteetin riittämään paremmin. GB300:ssa on erityisesti lisätty fp4 laskennan nopeutta(40%) siinä missä muut formaatit eivät nopeudu versus gb200. Tämä fp4 juttu lienee nvidian toimesta tehty koska asiakkaat ovat pyytäneet, muuten lienee olisivat ripotelleet parannuksen tasan kaikkien formaattien kesken.

Logaritmista ei pysty tekeen kunnolla ilman, että raudassa on tuki. Asiasta on esim. nvidian tutkimuspuolen johtaja puhunut useaan otteeseen. Allelinkattu video selittää asian jos se kiinnostaa.

Linkki: https://youtu.be/4u8iMr3iXR4?t=2567

takomo · 21.12.2025

finWeazel sanoi:
ollaman kirjastossa on jaossa 4bit versio: Tags · devstral-2 fp4:sta ei näytä olevan. En osaa sanoa minne fp4 versio on kadonnut, mistral ja julkaisun aikaan kirjoitetut artikkelit kuitenkin puhuvat fp4 mallista. Liekö sitten panttaavat fp4:sta ja tarjolla vain jossain maksullisessa pilvessä tms.

Onko kuluttajaraudassa (ja -softassa) tukea FP4:lle? Tarjoaako noin lyhyt liukuluku olennaista etua verrattuna skaalattuun kokonaislukukvantifiointiin?

Ei ehkä olisi kovin yllättävää, jos kaupalliset toimijat eivät pistä vapaaseen jakoon ihan kaikkia versioita parhaista malleistaan. Isossa Devstralissa näkyy muuten olevan lisenssiehdot, joihin on syytä tutustua, jos sillä aikoo tehdä mitään oikeaa.

finWeazel sanoi:
Iso syy fp4:een pilvessäkin on se, että uusimmalla nvidian raudalla fp4 malli on puolet halvempi ajaa kuin fp8 ja 1/4 osa fp16 mallin hinnasta. fp4 antaa paremman tarkkuuden kuin int4(vrt. kuvat alhaalla). Hinta per token ja tietenkin myös konesalikapasiteetin riittävyys iso kisa openai vs. google vs. anthropic jne. Nopeesti paljon kilpailuetua jos saat puolet tai 3/4 osaa tokenin hinnasta pois ja konesalikapasiteetin riittämään paremmin. GB300:ssa on erityisesti lisätty fp4 laskennan nopeutta(40%) siinä missä muut formaatit eivät nopeudu versus gb200.

Miten konesaliteema kytkeytyy ketjun aiheeseen: "omalla koneella"?

finWeazel · 21.12.2025

takomo sanoi:
Onko kuluttajaraudassa (ja -softassa) tukea FP4:lle? Tarjoaako noin lyhyt liukuluku olennaista etua verrattuna skaalattuun kokonaislukukvantifiointiin?

Vastaukset noihin oli jo mun edeltävissä postauksissa. Nvidialla blackwell sekä kuluttaja että konesaliversiona tukee fp4:sta. Muista valmistajista en tiedä, google kertoo jos asia kiinnostaa. Tarkkuusetuun katso edellisen postauksen slaidista fp4 vs. int4 virhe laskutoimituksissa. Virhe kertautuu kun lasketaan enemmän ja enemmän.

Ei ole mitään kotikonemalleja, on vain pienempiä ja isoja malleja joista pienemmät myös toimivat kotikoneissa konesalin lisäksi. Useimmat mallit suunniteltu raskaalle konesaliraudalle ja niistä distilloidaan pienempi versio mitä voi ajaa myös kuluttajaraudalla. Yhtälailla kotikone kuin konesali hyötyy kun fp4:lla saadaan tupla suorituskyky versus fp8 ja 4x suorituskyky versus fp16. Konesali ajaa kehitystä eteenpäin. Konesalista valuu pienempiä malleja kotikoneisiin.

takomo · 21.12.2025

finWeazel sanoi:
Tarkkuusetuun katso edellisen postauksen slaidista fp4 vs. int4 virhe laskutoimituksissa. Virhe kertautuu kun lasketaan enemmän ja enemmän.

Kun lasketaan paljon, virheet suurelta osin kumoavat toisensa. Kun yksi on liian suuri ja toinen liian pieni niin summa on melko lähellä oikeaa. Teoreettisen edun vaikutus käytännössä voi jäädä yllättävän pieneksi. Koko tiukka kvantisointi perustuu siihen, että vaikka yksittäisissä laskuissa tulee isoja virheitä, se ei paljon heiluta biljoonien laskujen kokonaisuutta.

Kokonaislukuesitys on huonompi, jos samalla rivillä olevien kertoimien suuruusluokassa on isoja eroja. Jos ne on samaa luokkaa, niin skaalaamalla ne vaikka 10 keskiarvoon pienetkin erot voi esittää melko tarkasti (~5% virheellä). Tässä taitaa int4 olla parempi?

finWeazel · 21.12.2025

takomo sanoi:
Kun lasketaan paljon, virheet suurelta osin kumoavat toisensa. Kun yksi on liian suuri ja toinen liian pieni niin summa on melko lähellä oikeaa. Teoreettisen edun vaikutus käytännössä voi jäädä yllättävän pieneksi. Koko tiukka kvantisointi perustuu siihen, että vaikka yksittäisissä laskuissa tulee isoja virheitä, se ei paljon heiluta biljoonien laskujen kokonaisuutta.

Kokonaislukuesitys on huonompi, jos samalla rivillä olevien kertoimien suuruusluokassa on isoja eroja. Jos ne on samaa luokkaa, niin skaalaamalla ne vaikka 10 keskiarvoon pienetkin erot voi esittää melko tarkasti (~5% virheellä). Tässä taitaa int4 olla parempi?

Vapaasti saa olla mitä mieltä haluaa. Mä en tee muuta kuin seuraa mitä isot pelaajat miljardibudjeteillaan ja top promille tutkijoineen tekevät. Fp4 uusi juttu, int4 oli nvidialla jo vanhemmassakin ada-raudassa. Se nvidian ketjuun linkattu slaidi tai vaikka

What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware

A Blog post by Rakshit Aralimatti on Hugging Face

huggingface.co

Miten esim. nvidian tensoriytimet toimii niin siellä sisässä on jännää toteutusta esim. sisäisten laskutoimitusten suhteen voi olla parempaa tarkkuutta kuin mitä input-output formaatti antaisi olettaa.

takomo · 21.12.2025

finWeazel sanoi:
Vapaasti saa olla mitä mieltä haluaa. Mä en tee muuta kuin seuraa mitä isot pelaajat miljardibudjeteillaan ja top promille tutkijoineen tekevät. Fp4 uusi juttu, int4 oli nvidialla jo vanhemmassakin ada-raudassa. Se nvidian ketjuun linkattu slaidi tai vaikka

On varmasti hyödyllistä, että myös FP4-koodaus on olemassa ja on hienoa, että sille on suora rautatuki, mutta en mitenkään usko, että se mikään vallankumouksellinen formaatti olisi. Neljä bittiä on kuitenkin vain neljä bittiä eikä niillä kovin monta erilaista asiaa pysty ilmaisemaan uniikisti.

Tässäkin on käsillä jo kaksi eri FP4-koodausta. Tässä kuvassa puhutaan E2M1(+etumerkki)-koodista ja aiemmassa kuvassa oli E2M2-koodi, jossa on kaksinkertainen määrä numeroarvoja mutta vain positiiviset luvut.
[EDIT] E2M1-koodilla pystyy esittämään esim. luvut +-[0, 0.5, 1, 1.5, 2, 3, 4, 6]

finWeazel sanoi:
Miten esim. nvidian tensoriytimet toimii niin siellä sisässä on jännää toteutusta esim. sisäisten laskutoimitusten suhteen voi olla parempaa tarkkuutta kuin mitä input-output formaatti antaisi olettaa.

Erilaiset skaalaukset on käsittääkseni arkea (ja edellytys) kokonaislukukertoimille. Juuri niiden ansiosta matalabittiset mallit toimivat niinkin hyvin kuin toimivat.

finWeazel · 22.12.2025

takomo sanoi:
On varmasti hyödyllistä, että myös FP4-koodaus on olemassa ja on hienoa, että sille on suora rautatuki, mutta en mitenkään usko, että se mikään vallankumouksellinen formaatti olisi. Neljä bittiä on kuitenkin vain neljä bittiä eikä niillä kovin monta erilaista asiaa pysty ilmaisemaan uniikisti.

Sitä blockihärveliä mikä fp4:en kyljessä tulee ja auttaa tarkkuuden lisäämisessä ei ole olemassa esim. int4 formaatille. Jos tämä asia oikeasti sua kiinnostaa niin niissä aikaisemmissa linkeissä asia on selitetty oikein hyvin. Jos nyt vielä alleviivaten aikaisemmista linkeistä

This structure lets MXFP4 efficiently represent the wide dynamic range found in modern AI models even with only 4 bits per value while keeping storage overhead low. It’s a radical departure from uniform quantization.

ja

This initiative, backed by tech giants including AMD, NVIDIA, Microsoft, Meta, and OpenAI, set out to lower the hardware and compute barriers to cutting-edge AI.

ja

These innovations enabled direct training of massive models in MXFP4 no more need to pre-train in high precision.

Mutta kuten siinä eka viestissäni jo sanoin näitä hyötyjä ei saa täysimääräisesti vain kvantisoimalla fp4:een. Pitää suunnitella ja toteuttaa malli sopivasti että saa maksimaalisen suorituskyvyn. Vanhat mallit eivätkä näitä tietenkään tue mxfp4:sta(tai nvidian termeillä fp4:sta) koska teknologiaa eikä rautaa ollut tovi sitten. Uusistakaan malleista kaikki eivät ole vielä kerenneet mukaan.

takomo · 22.12.2025

finWeazel sanoi:
Sitä blockihärveliä mikä fp4:en kyljessä tulee ja auttaa tarkkuuden lisäämisessä ei ole olemassa esim. int4 formaatille. Jos tämä asia oikeasti sua kiinnostaa niin niissä aikaisemmissa linkeissä asia on selitetty oikein hyvin.

Aika pinnallisesti oli selitetty eikä ollenkaan verrattu int4-pohjaisiin kvantisointeihin. Se, että skaalauksen pystyy tekemään raudalla on kai uutta mutta muuten vastaava metodi on käytössä ainakin Qn_K-kvantisoinneissa:

https://medium.com/@paul.ilvez/demystifying-llm-quantization-suffixes-what-q4-k-m-q8-0-and-q6-k-really-mean-0ec2770f17d3

finWeazel sanoi:
Mutta kuten siinä eka viestissäni jo sanoin näitä hyötyjä ei saa täysimääräisesti vain kvantisoimalla fp4:een. Pitää suunnitella ja toteuttaa malli sopivasti että saa maksimaalisen suorituskyvyn.

Kova paine toki on, että mallit pystyisi kouluttamaan suoraan matalabittisinä mutta onko tässä onnistuttu käytännössäkin?

BoomerX · 24.12.2025

amd adrenalinestakin näköjään löytyy "chat" ominaisuus. joku llama 8b malli.
Tosin en saanut toimimaan. valittelee että ei ole aktiivinen ja seuraavaksi valittaa ettei ole käynnissä.

Eikä tuo taida hyväksyä kuvia ollenkaan liitteeksi. Pelkästää dokumentteja.

esim screenshotit benchmarkeista liitteeksi ja kysyy tekemään lukemista fps graafin vertailua varten ei onnistunut ollenkaan.

Mukava · 26.12.2025

LM Studion kautta tullut ajettue gpt-oss-120b:tä esim Zed:ssä ja VS Codessa. Toiminut varsin mallikkaasti.

Halpuuttaja · 26.12.2025

takomo sanoi:
Olettaisin, että malli on tehty 16-bittisenä ja se on tiivistetty 8/4-bittiseksi avoimeen jakeluun. Luultavasti mallia on optimoitu lisää 8/4-bittisenä, eikä vain pudotettu bittejä pois.

Alkuperäiset Mistralit treenattiin kai 16-bittisenä joo. Sen sijaan Mistral -> Devstral jatkotreenaus on tehty FP8 formaatissa*.

Eli tosiaan Devstralin puristaminen FP4 formaattiin heikentäisi mallia enemmän tai vähemmän. Ja 16-bittiseksi upcastaaminen ei tekisi siitä parempaa.

*) mistralai/Devstral-Small-2-24B-Instruct-2512 · Devstral 2 BF16 weight

takomo · 28.12.2025

Halpuuttaja sanoi:
Alkuperäiset Mistralit treenattiin kai 16-bittisenä joo. Sen sijaan Mistral -> Devstral jatkotreenaus on tehty FP8 formaatissa*.

Eli tosiaan Devstralin puristaminen FP4 formaattiin heikentäisi mallia enemmän tai vähemmän. Ja 16-bittiseksi upcastaaminen ei tekisi siitä parempaa.

*) mistralai/Devstral-Small-2-24B-Instruct-2512 · Devstral 2 BF16 weight

Niinpä näkyy.

Ainahan bittien vähentäminen heikentää mallia. Onkohan näkynyt vertailuja toimiiko FP4 paremmin kuin hyvä Q4-kokonaislukumalli?

finWeazel · 29.12.2025

Nvidia julkaisi uudet nemotron3 mallit. Super ja Ultra versiot opetettu nvfp4:lla. Tällä hetkellä ladattavissa vain pienin nano versio mallista.

Nemotron 3 family consists of three models: Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities

We are releasing the model weights, training recipe, and all the data for which we hold redistribution rights.

These environments and RL datasets are being made available, alongside NeMo Gym, for those interested in using the environments to train their own models.

NVIDIA Nemotron 3 Family of Models

research.nvidia.com

Nemotron 3 introduces several innovations that directly address the needs of agentic systems:

A hybrid Mamba-Transformer MoE backbone for superior test-time efficiency and long-range reasoning.

Multi-environment reinforcement learning designed around real-world agentic tasks.

A 1M-token context length supporting deep multi-document reasoning and long-running agent memory.

An open, transparent training pipeline, including data, weights, and recipes.

Immediate availability of Nemotron 3 Nano with ready-to-use cookbooks. Super and Ultra to follow.

Inside NVIDIA Nemotron 3: Techniques, Tools, and Data That Make It Efficient and Accurate | NVIDIA Technical Blog

Agentic AI systems increasingly rely on collections of cooperating agents—retrievers, planners, tool executors, verifiers—working together across large contexts and long time spans.

developer.nvidia.com

----------------

Edittiä. Testailin 5090:lla nemotron3:sta. On erittäin nopea, vie muistia noin 24GB. Lienee tehty niin että mahtuu 3090/4090:en muistiin eikä vaadi 5090:sta. Ratkaisi adventofcode 2025 luukut 1 ja 2 python koodilla. Luukku3:een teki kurantin näköisen ratkaisun, mutta ei pääse yli algoritmillisesta aidasta niin toteutus liian hidas. Ei riitä aika universumissa vastauksen saavuttamiseksi. Eka malli mikä osoittanut mun kokeiluissa sellaista suorituskykyä ja taitoa että voisi lokaalisti ajatella käyttävänsä pienemmissä koodauskysymyksissä.

finWeazel · tiistaina klo 13:48

Nvidia hypettää tammikuussa tulevaa päivitystä millä peligpu:t saavat lisää suorituskykyä lokaaleita llm:ia ajaessa. Mielenkiintoinen juttu tuo automaattinen keskusmuistiin swappaaminen mistä voi olla paljonkin iloa MoE mallien kanssa jotka eivät mahdu kokonaan muistiin. Toinen kiva juttu nvfp4 tuki kuvagenerointiin.

There are two parts of this update, first is faster LLM performance, offering up to 40% higher performance in LLMs such as GPT-OSS, Nemotron Nano V2, and Sque 3 308

The second part enables native NVFP4 support in ComfyUI Flux.1, Flux.2, and Quen Image. This yields up to a 4.6x gain in performance

NVIDIA Boosts RTX AI PCs With 35% Faster LLM & 3x Faster Creative AI Performance, NVFP4 To Reduce VRAM Usage

NVIDIA continues to add more performance to its RTX AI PCs with features such as NVFP4 and further AI/RTX optimizations.

wccftech.com

Halpuuttaja · tiistaina klo 14:57

finWeazel sanoi:
Mielenkiintoinen juttu tuo automaattinen keskusmuistiin swappaaminen mistä voi olla paljonkin iloa MoE mallien kanssa jotka eivät mahdu kokonaan muistiin.

Tossa slaidissa, jossa mainitaan tämä "RTX-optimized offload to sys mem" on pelkästään kuvamalleja. Ja toi ominaisuus tuli ComfyUI:hin Flux.2 julkkarin yhteydessä marraskuussa.

Veikkaan että Wccftech sotkee asioita yhdistellessään tämän asian LLM:iin.

Kiinnostavaa kyllä nähdä miten NVFP4 formaatin kuva- ja videomallit suoriutuu.

Edit: jep, alkuperäisen lähteen kun lukee niin selviää että Wccftech oli ymmärtänyt väärin, CPU offload asia koski nimenomaan ComfyUI:ta.

Open-Source AI Tool Upgrades Speed Up LLM and Diffusion Models on NVIDIA RTX PCs | NVIDIA Technical Blog

AI developer activity on PCs is exploding, driven by the rising quality of small language models (SLMs) and diffusion models, such as FLUX.2, GPT-OSS-20B, and Nemotron 3 Nano. At the same time…

developer.nvidia.com

takomo · eilen klo 17:25

Tulipa käytyä rautakaupassa, ja 4 GB GTX 1650 Super on nyt päivittynyt 16GB RX 9070 -korttiin. Tällähän alkaa jo tehdä jotain. 14B mallit toimivat 8-bitin kvantisoinnillakin kohtuullisen sujuvasti ja tiukemmalla saa kontekstia kasvatettua ilman, että joutuu turvautumaan GPU-CPU-hybridimalliin.

"What are key problems in EU DSA regarding freedom of speech and other human rights in ECHR context?"

The Digital Services Act (DSA) of the European Union, which entered into full force in February 2024, is a landmark regulation aimed at creating a safer digital space by imposing stricter rules on online platforms regarding illegal content, disinformation, and user rights. However, its implementation raises significant concerns regarding freedom of speech, human rights protections, and compliance with the European Convention on Human Rights (ECHR).

Here are the key problems in the EU DSA concerning freedom of speech and other human rights under the ECHR framework:

1. Overbroad Content Moderation & Chilling Effects on Speech

Problem:

The DSA requires very large online platforms (VLOPs) (e.g., Meta, X/Twitter, TikTok) to remove illegal content (e.g., hate speech, terrorism, disinformation) within 24 hours (or justify non-removal).
Proactive measures (e.g., AI-based content filtering) are encouraged, but they risk false positives—legitimate speech being mistakenly flagged and removed.
Chilling effect: Platforms may over-censor to avoid legal risks, leading to self-censorship of controversial but legal content (e.g., political debate, satire, scientific discussion).

ECHR Concerns:

Article 10 (Freedom of Expression): The ECHR requires restrictions on speech to be proportionate, necessary, and precisely defined.
- The DSA’s broad definitions (e.g., "manipulative behavior," "disinformation") risk vague enforcement, leading to arbitrary takedowns.
- No prior judicial review before removals (unlike some EU Member States, which require court orders for certain speech restrictions).
Article 6 (Right to a Fair Trial): Users have no effective appeal mechanism in many cases, violating due process rights.

Examples of Controversy:

Satire & Parody: Platforms may remove content that could be interpreted as "hate speech" (e.g., dark humor about sensitive topics).
Political Speech: Criticism of governments or institutions could be labeled "disinformation" (e.g., climate denialism debates, migration narratives).
Scientific & Academic Discourse: Controversial but evidence-based views (e.g., on gender, COVID-19 origins) may be suppressed.

2. Disproportionate Liability for Hosting User-Generated Content

Problem:

The DSA shifts liability from users to platforms, making them legally responsible for third-party content.
No "safe harbor" protection (unlike the EU’s eCommerce Directive, which allowed platforms to avoid liability if they acted as neutral intermediaries).
Risk of over-compliance: Platforms may preemptively remove content to avoid fines (up to 6% of global revenue for VLOPs).

ECHR Concerns:

Article 10 (Freedom of Expression): Platforms acting as publishers (rather than neutral hosts) may be obliged to censor more aggressively.
Article 11 (Protection of Personal Data): The DSA requires mass surveillance-like monitoring of users, raising privacy concerns (GDPR vs. DSA tensions).

3. Disinformation & "Manipulative Behavior" – Blurring Legal & Ethical Lines

Problem:

The DSA defines "manipulative behavior" (e.g., dark patterns, algorithmic amplification of divisive content) but lacks clear legal standards.
Fact-checking & labeling requirements may lead to subjective enforcement (e.g., what counts as "misinformation"?).
Risk of political bias: Governments or fact-checkers may weaponize disinformation rules against opponents.

ECHR Concerns:

Article 10 (Freedom of Expression): The ECHR allows restrictions only if necessary in a democratic society.
- Disinformation bans must be narrowly tailored—broad definitions risk suppressing legitimate debate.
Article 19 (Prohibition of Abuse of Rights): States cannot use laws to silence dissent under the guise of "protecting democracy."

Examples:

Russian & Chinese State Media: Could be labeled "disinformation" for spreading government narratives (e.g., on Ukraine war).
Journalistic Whistleblowing: Investigative reports (e.g., on corporate or government wrongdoing) might be flagged as "manipulative."

4. Lack of Transparency & Algorithmic Accountability

Problem:

The DSA requires transparency reports on content moderation but does not mandate transparency in algorithmic decisions.
No independent oversight of AI-based moderation systems, leading to opaque censorship.

ECHR Concerns:

Article 8 (Right to Privacy): Users have no way to challenge automated decisions affecting their speech.
Article 13 (Right to Effective Remedy): If content is removed, users often cannot know why or appeal effectively.

5. Conflict with Fundamental Rights in Specific Cases

Problem:

Hate Speech vs. Free Speech: The DSA aligns with Article 20 of the EU Charter of Fundamental Rights (prohibiting hate speech), but national interpretations vary.
- Some EU countries (e.g., Germany, France) have stricter hate speech laws than others (e.g., Ireland, Netherlands).
- Risk of "forum shopping"—platforms may apply the most restrictive rules globally to avoid fines.
Encryption & End-to-End Security: The DSA does not ban encryption, but some interpretations could pressure platforms to weaken security for "moderation purposes."

ECHR Concerns:

Article 10 (Freedom of Expression) vs. Article 17 (Prohibition of Abuse): Courts must balance these rights—the DSA does not provide clear guidance on how to do so.
Article 52 (Proportionality): Restrictions must be the least intrusive means—the DSA’s one-size-fits-all approach may fail this test.

6. Enforcement & Judicial Oversight Gaps

Problem:

The European Commission (not national courts) has primary enforcement power, raising concerns about political influence.
No pre-removal judicial review in most cases (unlike some EU Member States).
Fines are imposed by the Commission, not independent bodies, leading to perceived lack of impartiality.

ECHR Concerns:

Article 6 (Right to a Fair Trial): Users and platforms should have effective legal recourse before content is removed.
Article 13 (Right to an Effective Remedy): The DSA’s administrative enforcement (without court involvement) may violate this right.

7. Impact on Journalism & Public Debate

Problem:

Independent journalism may be deplatformed if accused of "disinformation" (e.g., investigative outlets critical of governments).
Citizen journalism & activism (e.g., protests, whistleblowing) could be suppressed under broad definitions of "illegal content."

ECHR Concerns:

Article 10 (Freedom of Expression): Journalists have special protections under the ECHR.
Article 11 (Freedom of Assembly & Association): Online spaces are crucial for public discourse—restrictions must be justified.

Comparison with ECHR Standards

DSA Requirement	Potential ECHR Violation	Possible Justification
24-hour removal of illegal content	Article 10 (No prior judicial review)	Necessary to prevent harm (e.g., hate speech, terrorism)
Proactive content moderation (AI filtering)	Article 8 (Privacy), Article 10 (Chilling effect)	Risk of false positives and over-censorship
Disinformation & "manipulative behavior" bans	Article 10 (Overbroad restrictions)	Must be narrowly defined to avoid suppressing debate
Transparency reports (but no algorithmic transparency)	Article 13 (No effective remedy)	Users cannot challenge automated decisions
Fines up to 6% of revenue	Article 6 (No fair trial for platforms)	Enforcement by Commission (not courts)

Possible Solutions & Ongoing Debates

Stronger Judicial Oversight:
- Require court approval before removing sensitive content (e.g., political speech).
- Allow independent appeals mechanisms for users.
Narrower Definitions of "Illegal Content":
- Avoid vague terms like "manipulative behavior" that can be weaponized.
- Ensure proportionality in disinformation rules.
Algorithmic Transparency:
- Mandate explainability of AI moderation decisions.
- Allow third-party audits of content moderation systems.
Harmonization with ECHR Standards:
- Ensure DSA enforcement aligns with Article 10 (freedom of expression) and Article 6 (fair trial).
- Provide clear legal tests for when speech can be restricted.
Exemptions for Journalism & Whistleblowing:
- Protect investigative journalism from disproportionate takedowns.

Conclusion

While the DSA aims to combat illegal online activity, its broad definitions, lack of judicial safeguards, and risk of over-censorship pose serious threats to freedom of speech and human rights under the ECHR. The European Court of Human Rights (ECtHR) may soon be called upon to interpret whether the DSA complies with fundamental rights, particularly in cases involving:

Political censorship (e.g., suppression of opposition voices).
False positives in content moderation (e.g., legitimate speech removed).
Lack of effective remedies for users whose content is taken down.

Ongoing litigation (e.g., cases before the ECtHR or EU Court of Justice) will be crucial in shaping the DSA’s human rights compatibility.

Would you like a deeper analysis on any specific aspect (e.g., disinformation, hate speech, or judicial oversight)?

Koodi:

prompt eval time =     369.80 ms /   558 tokens (    0.66 ms per token,  1508.93 tokens per second)
       eval time =   39545.67 ms /  2365 tokens (   16.72 ms per token,    59.80 tokens per second)

"What are key problems in EU DSA regarding freedom of speech and other human rights in ECHR context?"
The Digital Services Act (DSA), adopted by the European Union in 2022 and fully applicable from February 2024, aims to regulate online platforms (e.g., social media, marketplaces) to ensure safety, transparency, and accountability. While the DSA introduces important safeguards, it has sparked debates—particularly regarding freedom of speech (Article 10 ECHR) and other human rights under the European Convention on Human Rights (ECHR). Below are the key problems and tensions in this context:

1. Overbroad Moderation Powers & Chilling Effects on Speech

Problem:

The DSA grants very large online platforms (VLOPs, e.g., Meta, X/Twitter, TikTok) significant discretion in content moderation, including the removal of legal but "harmful" content (e.g., disinformation, hate speech, or "manipulative" content).
Article 3(2) DSA requires platforms to act against "illegal content" (e.g., incitement to violence, child abuse) but also encourages them to remove "harmful" but legal content (e.g., "misleading" political ads, "toxic" debates).
Risk: Platforms may over-censor to avoid liability, leading to a "chilling effect" on legitimate speech (e.g., satire, political dissent, scientific debate).

ECHR Context:

Article 10 (Freedom of Expression) requires proportionality in restrictions. The ECHR has ruled that overly broad moderation policies (e.g., vague terms like "hateful conduct") can violate free speech (e.g., Verein gegen Tierfabriken v. Switzerland, Egenberger v. Austria).
Problem: The DSA does not clearly define what constitutes "harmful" content, leaving room for arbitrary enforcement by platforms or regulators.

2. Lack of Clear Legal Standards for "Systemic Risks"

Problem:

Article 54 DSA allows the European Commission to designate platforms as posing a "systemic risk" (e.g., due to disinformation, manipulation, or harm to minors). Once designated, these platforms face even stricter obligations, including:
- Independent audits of their algorithms.
- Transparency reports on content moderation.
- Risk mitigation measures (e.g., limiting viral spread of controversial content).
Risk:
- The term "systemic risk" is vague—could it apply to platforms hosting political debates or controversial but legal speech?
- Over-reach: If a platform is labeled as a "systemic risk," it may preemptively censor to avoid penalties, stifling debate.

ECHR Context:

The ECHR has emphasized that restrictions on speech must be "necessary in a democratic society" (Handyside v. UK).
Problem: The DSA’s systemic risk framework lacks clear, objective criteria, risking disproportionate interference with free expression.

3. Transparency vs. Privacy Concerns (Articles 8 & 10 ECHR)

Problem:

Article 38 DSA requires platforms to disclose more about content moderation decisions, including:
- Transparency reports on removals and appeals.
- Explanations for algorithmic recommendations (e.g., why certain content is amplified).
Risk:
- Privacy concerns (Article 8 ECHR): If platforms must disclose user-specific moderation decisions, individuals could face reputational harm or harassment.
- Chilling effect: Users may self-censor if they fear their moderation cases will be publicly exposed.

ECHR Context:

The ECHR balances freedom of expression (Art. 10) with privacy (Art. 8) (Peep Show Productions v. UK).
Problem: The DSA does not sufficiently address how to protect users from misuse of transparency data.

4. Appeal Mechanisms & Due Process (Article 6 ECHR - Right to Fair Trial)

Problem:

Article 14 DSA requires platforms to provide appeal mechanisms for content removals.
Risk:
- Lack of independence: Many platforms use internal review boards, which may lack judicial independence (a key requirement under Article 6 ECHR for fair proceedings).
- Delays & inefficiency: Some platforms have backlogged appeals, leaving users without recourse for weeks or months.

ECHR Context:

The ECHR has ruled that automated content removal without appeal violates Article 6 (SAS Institute v. Germany).
Problem: The DSA does not mandate external, independent oversight for appeals, risking unfair enforcement.

5. Disproportionate Impact on Marginalized Voices & Political Speech

Problem:

Article 25 DSA requires platforms to combat "disinformation" and "manipulative" content, including:
- Fact-checking labels on political content.
- Reducing virality of "controversial" posts.
Risk:
- Bias in enforcement: Smaller or less-resourced voices (e.g., independent journalists, activists) may be disproportionately affected by moderation policies.
- Political bias concerns: If platforms favor mainstream narratives, it could stifle dissent (e.g., during elections or protests).

ECHR Context:

The ECHR protects political speech as a core part of democracy (Lingens v. Austria).
Problem: The DSA’s anti-disinformation measures could be weaponized to suppress legitimate but unpopular views.

6. Conflict with Fundamental Rights in National Implementations

Problem:

Member states (e.g., Germany, France) have national laws (e.g., NetzDG, Avia Law) that go beyond DSA requirements, such as:
- Mandatory removal of "hate speech" within 24 hours (Germany).
- Fines for platforms if they fail to comply with national definitions of "illegal" content.
Risk:
- Inconsistent standards across the EU could lead to forum shopping (users/expressions being censored in one country but not another).
- Over-criminalization: Some national laws criminalize speech (e.g., "denying Holocaust" in some countries), which may conflict with Article 10 ECHR.

ECHR Context:

The ECHR has struck down overly broad hate speech laws (Ezelin v. France).
Problem: The DSA delegates enforcement to member states, risking fragmentation and human rights violations.

7. Lack of Strong Safeguards for Journalists & Public Watchdogs

Problem:

Article 40 DSA requires platforms to protect journalists and independent media from deplatforming.
Risk:
- Weak enforcement: Many platforms still suspend journalists (e.g., for "misinformation" claims).
- No clear definition of what constitutes a "journalist" in the digital age (e.g., citizen journalists, fact-checkers).

ECHR Context:

The ECHR has strong protections for press freedom (Goodwin v. UK).
Problem: The DSA does not explicitly shield investigative journalism from over-moderation.

Potential Solutions & Ongoing Debates

Stronger Judicial Oversight:
- The European Court of Human Rights (ECtHR) could clarify DSA’s compatibility with Articles 6 (fair trial), 8 (privacy), and 10 (free speech).
- Independent appeals bodies (not just platform-run) should be mandated.
Narrower Definitions of "Harmful" Content:
- The EU should avoid vague terms like "manipulative" or "toxic" and instead focus on clearly illegal content (e.g., incitement to violence).
Proportionality Safeguards:
- Risk assessments before designating a platform as a "systemic risk" should be transparent and subject to judicial review.
Harmonization of National Laws:
- The EU should preempt overly restrictive national laws (e.g., Germany’s NetzDG) to avoid fragmentation.
Protection for Marginalized Voices:
- Amplification policies should ensure diverse perspectives are not silenced by algorithmic bias.

Conclusion

While the DSA is a step toward regulating harmful online behavior, its vague definitions, broad moderation powers, and lack of strong judicial safeguards pose significant risks to freedom of speech and other ECHR rights. The European Commission, ECtHR, and EU member states must ensure that the DSA’s implementation respects proportionality, transparency, and due process to avoid unintended censorship and human rights violations.

Would you like a deeper analysis on any specific aspect (e.g., case law, national implementations)?

Koodi:

prompt eval time =     366.68 ms /   558 tokens (    0.66 ms per token,  1521.78 tokens per second)
       eval time =   57894.45 ms /  2092 tokens (   27.67 ms per token,    36.13 tokens per second)

finWeazel sanoi:
Edittiä. Testailin 5090:lla nemotron3:sta. On erittäin nopea, vie muistia noin 24GB. Lienee tehty niin että mahtuu 3090/4090:en muistiin eikä vaadi 5090:sta. Ratkaisi adventofcode 2025 luukut 1 ja 2 python koodilla. Luukku3:een teki kurantin näköisen ratkaisun, mutta ei pääse yli algoritmillisesta aidasta niin toteutus liian hidas. Ei riitä aika universumissa vastauksen saavuttamiseksi. Eka malli mikä osoittanut mun kokeiluissa sellaista suorituskykyä ja taitoa että voisi lokaalisti ajatella käyttävänsä pienemmissä koodauskysymyksissä.

Kokeilinpa nemotron3:a myös RX9070:lla. 4-bittisenä (IQ4_XS) se ei aivan mahdu VRAMiin mutta hybridinä toimii yllättävän hyvin. Isosta kontekstista ei varmaan tarvitse haaveilla, tässä 13k kuten myös edellisissä esimerkeissä.

"What are key problems in EU DSA regarding freedom of speech and other human rights in ECHR context?"

Quick Take‑away

Issue	Why it matters for freedom of expression & other ECHR rights	How the DSA (as drafted/interpreted) can clash with the ECHR	What the ECHR‑compatible safeguard would look like
Broad “risk‑assessment” obligations	Platforms must assess systemic risks of illegal content, disinformation, etc. The assessment is pre‑emptive and can force over‑cautious removal.	May lead to prior censorship – removing lawful speech to avoid a “risk” that may never materialise.	Require a strict proportionality test and an independent audit before any removal, with a clear “least‑restrictive‑means” standard.
Notice‑and‑take‑down (NTC) without robust due‑process	The DSA expands NTC to “illegal content” that is not necessarily illegal under national law, but under the platform’s own policies.	Chilling effect on legitimate expression; users can be silenced before they can contest a removal.	Mandatory judicial‑style review (right to a reasoned decision, right to appeal, right to restore content) within a short, fixed period.
Algorithmic transparency & “systemic risk” mitigation	Platforms must disclose how their recommendation engines work and how they moderate content.	The requirement can be interpreted as a gate‑keeping tool that forces platforms to expose trade‑ secrets or to adopt “safe‑by‑design” filters that may over‑filter.	Targeted transparency (only for high‑risk services) and independent verification of risk‑mitigation measures, with a carve‑out for legitimate trade‑secret protection.
Liability of “intermediaries” for user‑generated content	Platforms can be held liable for any illegal content they host, even if they acted diligently.	Over‑deterrence: platforms may err on the side of removal to avoid fines, curtailing lawful speech.	Safe‑harbour that is conditional on prompt removal after a court or independent tribunal decision, not merely a notice.
User‑right to an effective remedy (Art. 13 ECHR)	The DSA provides for “redress mechanisms” but they are largely administrative and may lack independence.	If the redress body is not truly independent, users cannot effectively challenge censorship.	Independent, impartial tribunals (or an EU‑wide “Digital Ombudsman”) with binding powers to order reinstatement and compensation.
Data‑protection & privacy (Art. 8 ECHR) interplay	Platforms must process personal data to enforce content rules; the DSA references GDPR but adds new obligations.	The cross‑border flow of data for moderation can clash with national privacy safeguards, leading to surveillance‑type profiling.	Data‑minimisation and purpose‑limitation clauses that tie moderation data strictly to the removal of illegal content, with strict retention limits.
Cross‑border enforcement & “European content” definition	The DSA seeks a single EU‑wide approach to illegal content, but national definitions of “illegal” differ.	A platform may be forced to remove content that is lawful in one Member State but illegal in another, creating a de‑facto censorship across borders.	Harmonised definition of “illegal” that respects national diversity, coupled with a “least restrictive” removal principle and a right to judicial review in the user’s home Member State.

1. The Legal Landscape: DSA + ECHR

Element	Description
Digital Services Act (DSA)	EU regulation (applicable from 2024) that sets a “due‑diligence” framework for online intermediaries – from “mere conduit” to “very large online platforms” (VLOPs). It imposes transparency, risk‑assessment, content‑moderation, and redress obligations.
European Convention on Human Rights (ECHR)	Binding treaty for all Council of Europe members; the EU is a party via the EU Charter of Fundamental Rights (Article 10 = freedom of expression; Article 8 = right to respect for private/family life; Article 6 = right to a fair trial; Article 13 = right to an effective remedy).
EU Charter of Fundamental Rights	Gives the same protections as the ECHR but is directly enforceable in the EU. The Charter explicitly states that “any limitation on the exercise of the freedoms and rights recognised in this Charter must be subject to the principles of proportionality and necessity.” (Article 52).

Key point: The DSA must be interpreted and applied in a manner that respects the Charter/ECHR. The European Court of Justice (ECJ) has repeatedly held that EU law must be read in line with fundamental rights (see Digital Rights Ireland, Google Spain etc.).

2. Core Problems for Freedom of Speech & Other Human Rights

2.1. Over‑broad “risk‑assessment” and “systemic risk” obligations

What the DSA asks: Very Large Online Platforms (VLOPs) must conduct systemic risk assessments and mitigation plans for illegal content, disinformation, manipulation, etc.
Why it threatens free speech:
- The assessment is pre‑emptive – platforms must anticipate potential risks and may over‑remove lawful content to avoid penalties.
- The language is vague (“risk of manipulation”, “disinformation”) which can be interpreted to cover politically controversial but lawful speech.
ECHR clash: Article 10 protects expression unless it is “necessary in a democratic society”. A blanket risk‑mitigation approach can be disproportionate and not narrowly tailored.

2.2. Notice‑and‑Take‑Down (NTC) without robust procedural safeguards

Current DSA model: Platforms must act on “illegal content” notices, but the definition of “illegal” can be self‑determined by the platform or by a national authority that may lack independence.
Rights impact:
- Users can be silenced instantly, with only a short window to contest.
- The process often lacks transparent reasoning, right to appeal, or effective judicial review.
ECHR clash: Article 10 requires any restriction to be prescribed by law and necessary. Administrative removal without a meaningful hearing fails the “prescribed by law” and “effective remedy” (Art. 13) tests.

2.3. Lack of effective, independent redress

DSA’s redress mechanism: National “Digital Services Coordinators” (DSCs) and “online platforms” must provide a “right to an effective remedy” (Art. 33).
Problem: Many DSCs are administrative bodies that may not be independent or judicially reviewable.
ECHR clash: Article 13 obliges states to provide an effective remedy for violations of Convention rights. If the remedy is administrative and non‑binding, it may be deemed insufficient.

2.4. Algorithmic opacity & “systemic risk” mitigation

DSA requirement: Platforms must disclose how their recommendation and content‑moderation algorithms work and mitigate systemic risks.
Free‑speech concern:
- The need for full transparency can force platforms to reveal trade secrets or to implement blunt filters that over‑block lawful content.
- Conversely, over‑transparency could enable state surveillance of user behaviour under the guise of “risk mitigation”.
ECHR clash: Article 8 (privacy) and Article 10 (expression) both require any interference to be necessary and proportionate. Blanket algorithmic disclosure may not meet that test.

2.5. Liability of “intermediaries” for user‑generated content

DSA’s liability regime: Platforms can be held jointly liable for illegal content they host, even if they acted diligently.
Free‑speech impact: The fear of massive fines (up to 6 % of global turnover) pushes platforms to over‑censor to avoid liability.
ECHR clash: The “chilling effect” caused by the threat of disproportionate sanctions can infringe Article 10, especially when the standard of proof is low (e.g., “reasonable belief” of illegality).

2.6. Cross‑border enforcement & divergent national definitions of “illegal”

DSA’s aim: Create a single EU‑wide rulebook for illegal content.
Reality: Member States retain different criminal codes (e.g., hate‑speech thresholds, defamation limits).
Risk: A platform may be forced to remove content that is perfectly lawful in its home country but deemed illegal elsewhere, leading to forum shopping and over‑censorship.
ECHR clash: The principle of proportionality requires that any restriction be prescribed by law that is accessible and foreseeable. A patchwork of national definitions can make the legal basis uncertain, violating the “prescribed by law” requirement.

2.7. Data‑protection & surveillance concerns

DSA & GDPR interaction: Platforms must process personal data to enforce content rules (e.g., storing IP addresses, user‑generated metadata).
Human‑rights angle:
- Mass data collection for moderation can amount to surveillance (Art. 8).
- Lack of purpose limitation may lead to function creep – data collected for “illegal content” being later used for other purposes (e.g., political profiling).
ECHR clash: Any interference with privacy must be necessary in a democratic society. Blanket data‑retention for moderation is rarely proportionate.

3. How the ECHR Framework Can Be Used to Shape DSA Interpretation

ECHR Principle	DSA‑relevant Issue	Judicial/Policy Remedy
Article 10 – Freedom of Expression	Over‑broad removal, vague “risk” language	Proportionality test: any restriction must be prescribed by law, necessary, and the least restrictive means. Require judicial review of removal decisions.
Article 13 – Effective Remedy	Administrative NTC, limited appeal	Establish independent tribunals (or an EU‑wide “Digital Ombudsman”) with binding powers to order reinstatement and award damages.
Article 8 – Right to Private Life	Mass data collection for moderation	Impose strict data‑minimisation, purpose limitation, and limited retention (e.g., delete moderation logs after 30 days unless needed for a criminal investigation).
Article 6 – Right to a Fair Trial	Platform‑driven “court‑like” decisions	Ensure that any de‑facto adjudication (e.g., removal of content) is subject to independent judicial oversight or at least subject to an independent review body.
Principle of Legal Certainty (Charter Art. 49)	Vague definitions of “illegal content”	Require clear, accessible definitions at EU level, with explicit reference to national law and transparent criteria for removal.
Principle of Proportionality (Charter Art. 52)	Systemic risk mitigation	Mandate risk‑based, tiered obligations: only high‑risk services must conduct full assessments; smaller platforms have lighter duties.

Case‑law illustration:
*The European Court of Human Rights (ECtHR) in M.S.S. v. Belgium (2011) held that automatic blocking of a website without prior judicial review violated Article 10. The same reasoning can be applied to automatic removal of user posts under the DSA if no prior independent review is provided.

4. Concrete Recommendations for Aligning the DSA with ECHR/Freedom‑of‑Speech Safeguards

Introduce a “Judicial‑Style Review” Requirement
- Every removal of “illegal content” must be accompanied by a reasoned decision that can be appealed to an independent tribunal within a strict time‑frame (e.g., 48 hours).
- The decision must include a proportionality analysis showing that the removal is the least restrictive means to achieve the legitimate aim.
Define “Illegal Content” with Precision
- The DSA should reference the specific national criminal provisions that render content illegal, and require explicit citation in removal notices.
- A centralised EU database of “illegal content” definitions could reduce divergent interpretations, but only if it respects national legal autonomy.
Strengthen Independence of Redress Bodies
- Require statutory independence, budgetary autonomy, and judicial oversight for Digital Services Coordinators (DSCs) and any “appeal panels”.
- Provide a binding right to compensation for users whose lawful content is wrongly removed.
Limit the Scope of Systemic‑Risk Assessments
- Apply tiered obligations: only VLOPs and “high‑risk” services must conduct full risk assessments; smaller platforms have lighter reporting duties.
- Require that risk‑mitigation measures be proportionate and subject to independent audit before implementation.
Algorithmic Transparency with Safeguards
- Mandate targeted transparency: disclose only the high‑level functioning of recommendation systems that affect content visibility, not the full source code.
- Allow trade‑secret exemptions but require independent verification that the disclosed information is sufficient to assess compliance with fundamental‑rights obligations.
Data‑Protection Limits for Moderation
- Impose a maximum retention period (e.g., 30 days) for moderation‑related data, unless a judicial order justifies longer storage.
- Require purpose‑specific processing: data collected for “illegal content” removal may not be used for advertising, profiling, or any other secondary purpose.
Cross‑Border Consistency Mechanism
- Create a “mutual‑recognition” protocol: a removal decision made by a competent authority in one Member State is recognised in others only if the underlying legal basis is identical and the removal complies with EU‑wide proportionality standards.
- Provide a “right to be heard” in the user’s home Member State before a removal is enforced across borders.
Periodic Independent Audits
- Require annual audits by accredited independent bodies (e.g., national data‑protection authorities, civil‑society watchdogs) to verify that platforms’ content‑moderation practices respect ECHR standards.
- Audit results must be publicly disclosed (with redactions for legitimate confidentiality).

5. Summary Diagram

text

Koodi:

                +----------------------------+
                |   DIGITAL SERVICES ACT (DSA) |
                +----------------------------+
                         |
   +----------------------+----------------------+
   |                      |                      |
   v                      v                      v
Freedom of Expression  Right to Effective      Right to Privacy &
   (Art.10 ECHR)        Remedy (Art.13)          Data‑Protection (Art.8)
   +-----------+        +-----------+        +-----------+
   | Over‑broad |        | No real   |        | Massive   |
   | risk‑assess|        | independent|       | data‑     |
   | / NTC      |        | redress    |        | collection|
   +-----------+        +-----------+        +-----------+
   +-----------------------------------------------------------+
   |                ECHR‑compatibility checks                  |
   +-----------------------------------------------------------+
   |  • Proportionality & necessity tests                     |
   |  • Independent judicial review of removals                |
   |  • Clear, accessible definitions of “illegal” content    |
   |  • Limited, purpose‑specific data retention               |
   |  • Tiered obligations (VLOPs vs. smaller platforms)       |
   +-----------------------------------------------------------+

6. Bottom Line

The DSA’s ambition to create a coherent EU framework for online platforms is commendable, but **its current draft contains several provisions that can unduly restrict freedom of expression and other ECHR‑protected rights.
The core tension is between the need to curb illegal content and systemic manipulation and **the requirement that any restriction be prescribed by law, necessary, and proportionate—the very standards set out in Article 10, 13, and 8 of the ECHR.
To bring the DSA into compliance with the ECHR, the regulation must embed robust procedural safeguards (independent review, clear definitions, limited data retention), ensure effective and independent remedies, and apply proportional, tiered obligations that respect the diversity of national legal systems while preserving a coherent EU‑wide rights‑based approach.

By tightening these safeguards, the EU can protect the digital public sphere without sacrificing the fundamental freedoms that the ECHR guarantees.

Koodi:

prompt eval time =     188.03 ms /    39 tokens (    4.82 ms per token,   207.42 tokens per second)
       eval time =   90977.15 ms /  4332 tokens (   21.00 ms per token,    47.62 tokens per second)

GPU:lle meni 44 kaikkiaan 52 LLM-kerroksesta:

Koodi:

load_tensors: offloading 44 repeating layers to GPU
load_tensors: offloaded 45/53 layers to GPU
load_tensors:   CPU_Mapped model buffer size =  2436.64 MiB
load_tensors:        ROCm0 model buffer size = 14883.13 MiB

Kun GPU toimii myös näyttökorttina, niin 16GB-kortilta ei oikein voi paljon yli 15GB käyttää LLM:lle, koska VRAMia pitää riittää myös grafiikalle.

finWeazel · eilen klo 17:56

5090:lla nemotron3:ssa(ollama) python koodia luodessa kuvan nopeus. Tämä on oikeasti nopea ja suhteellisen kyvykäs. Ehkä 2024 loppuvuoden pilven tasoa, about samoilla tonteilla kuin claude sonnet 3.5. Käytössä nemotron-3-nano:30b b725f1117407 24 GB

MacBook Pro m4 max nemotron3 nopeus alla. 5090:en melkein 4x nopeampi, tosin varmaan 10x enempi imee virtaa 5090:en.

takomo · eilen klo 19:09

finWeazel sanoi:
5090:lla nemotron3:ssa(ollama) python koodia luodessa kuvan nopeus. Tämä on oikeasti nopea ja suhteellisen kyvykäs. Ehkä 2024 loppuvuoden pilven tasoa, about samoilla tonteilla kuin claude sonnet 3.5. Käytössä nemotron-3-nano:30b b725f1117407 24 GB

Mikä kvantisointi tuossa on? Nopeus on kyllä ihan hyvä. Advent of Code #1:n ratkaisussa 9070 tikutteli vastausta n. 46 token/s.
Jos muistiin laittaa pari kerrosta enemmän, kontekstin voi kasvattaa 64k:ksi mutta nopeus laskee 36 token/s.

Miten nopeasti Devstral-Small-2 toimii 5090:lla? Vaikka se sopii 4-bittisenä 16GB VRAMiin, se on hitaampi kuin Nemotron3 (20-30 token/s) mutta se kuroo eroa kiinni merkittävästi kompaktimmilla vastauksilla. Nemotron3 vaikuttaa olevan turhankin innokas tarinoimaan.

Tuli huomattua, että Devstral-Small-2 tiesi llama.cpp:n käyttämien 4-bittisten kvantisointien erot ja ominaisuudet mutta Nemotron3 alkoi arvailla ja arvaili väärin.

finWeazel · eilen klo 19:50

takomo sanoi:
Mikä kvantisointi tuossa on? Nopeus on kyllä ihan hyvä. Advent of Code #1:n ratkaisussa 9070 tikutteli vastausta n. 46 token/s.
Jos muistiin laittaa pari kerrosta enemmän, kontekstin voi kasvattaa 64k:ksi mutta nopeus laskee 36 token/s.

Miten nopeasti Devstral-Small-2 toimii 5090:lla? Vaikka se sopii 4-bittisenä 16GB VRAMiin, se on hitaampi kuin Nemotron3 (20-30 token/s) mutta se kuroo eroa kiinni merkittävästi kompaktimmilla vastauksilla. Nemotron3 vaikuttaa olevan turhankin innokas tarinoimaan.

Tuli huomattua, että Devstral-Small-2 tiesi llama.cpp:n käyttämien 4-bittisten kvantisointien erot ja ominaisuudet mutta Nemotron3 alkoi arvailla ja arvaili väärin.

Oli se 4bit 24GB menevä malli. about 280token/s 5090:en ja macbook pro m4 max 76token/s

Tekoäly omalla koneella

1. Overbroad Content Moderation & Chilling Effects on Speech​

Problem:​

ECHR Concerns:​

Examples of Controversy:​

2. Disproportionate Liability for Hosting User-Generated Content​

Problem:​

ECHR Concerns:​

3. Disinformation & "Manipulative Behavior" – Blurring Legal & Ethical Lines​

Problem:​

ECHR Concerns:​

Examples:​

4. Lack of Transparency & Algorithmic Accountability​

Problem:​

ECHR Concerns:​

5. Conflict with Fundamental Rights in Specific Cases​

Problem:​

ECHR Concerns:​

6. Enforcement & Judicial Oversight Gaps​

Problem:​

ECHR Concerns:​

7. Impact on Journalism & Public Debate​

Problem:​

ECHR Concerns:​

Comparison with ECHR Standards​

Possible Solutions & Ongoing Debates​

Conclusion​

1. Overbroad Moderation Powers & Chilling Effects on Speech​

Problem:​

ECHR Context:​

2. Lack of Clear Legal Standards for "Systemic Risks"​

Problem:​

ECHR Context:​

3. Transparency vs. Privacy Concerns (Articles 8 & 10 ECHR)​

Problem:​

ECHR Context:​

4. Appeal Mechanisms & Due Process (Article 6 ECHR - Right to Fair Trial)​

Problem:​

ECHR Context:​

5. Disproportionate Impact on Marginalized Voices & Political Speech​

Problem:​

ECHR Context:​

6. Conflict with Fundamental Rights in National Implementations​

Problem:​

ECHR Context:​

7. Lack of Strong Safeguards for Journalists & Public Watchdogs​

Problem:​

ECHR Context:​

Potential Solutions & Ongoing Debates​

Conclusion​

Quick Take‑away​

1. The Legal Landscape: DSA + ECHR​

2. Core Problems for Freedom of Speech & Other Human Rights​

2.1. Over‑broad “risk‑assessment” and “systemic risk” obligations​

2.2. Notice‑and‑Take‑Down (NTC) without robust procedural safeguards​

2.3. Lack of effective, independent redress​

2.4. Algorithmic opacity & “systemic risk” mitigation​

2.5. Liability of “intermediaries” for user‑generated content​

2.6. Cross‑border enforcement & divergent national definitions of “illegal”​

2.7. Data‑protection & surveillance concerns​

3. How the ECHR Framework Can Be Used to Shape DSA Interpretation​

4. Concrete Recommendations for Aligning the DSA with ECHR/Freedom‑of‑Speech Safeguards​

5. Summary Diagram​

6. Bottom Line​

Uutiset

Statistiikka

Hinta.fi

Arvostamme yksityisyyttäsi

1. Overbroad Content Moderation & Chilling Effects on Speech

Problem:

ECHR Concerns:

Examples of Controversy:

2. Disproportionate Liability for Hosting User-Generated Content

Problem:

ECHR Concerns:

3. Disinformation & "Manipulative Behavior" – Blurring Legal & Ethical Lines

Problem:

ECHR Concerns:

Examples:

4. Lack of Transparency & Algorithmic Accountability

Problem:

ECHR Concerns:

5. Conflict with Fundamental Rights in Specific Cases

Problem:

ECHR Concerns:

6. Enforcement & Judicial Oversight Gaps

Problem:

ECHR Concerns:

7. Impact on Journalism & Public Debate

Problem:

ECHR Concerns:

Comparison with ECHR Standards

Possible Solutions & Ongoing Debates

Conclusion

1. Overbroad Moderation Powers & Chilling Effects on Speech

Problem:

ECHR Context:

2. Lack of Clear Legal Standards for "Systemic Risks"

Problem:

ECHR Context:

3. Transparency vs. Privacy Concerns (Articles 8 & 10 ECHR)

Problem:

ECHR Context:

4. Appeal Mechanisms & Due Process (Article 6 ECHR - Right to Fair Trial)

Problem:

ECHR Context:

5. Disproportionate Impact on Marginalized Voices & Political Speech

Problem:

ECHR Context:

6. Conflict with Fundamental Rights in National Implementations

Problem:

ECHR Context:

7. Lack of Strong Safeguards for Journalists & Public Watchdogs

Problem:

ECHR Context:

Potential Solutions & Ongoing Debates

Conclusion

Quick Take‑away

1. The Legal Landscape: DSA + ECHR

2. Core Problems for Freedom of Speech & Other Human Rights

2.1. Over‑broad “risk‑assessment” and “systemic risk” obligations

2.2. Notice‑and‑Take‑Down (NTC) without robust procedural safeguards

2.3. Lack of effective, independent redress

2.4. Algorithmic opacity & “systemic risk” mitigation

2.5. Liability of “intermediaries” for user‑generated content

2.6. Cross‑border enforcement & divergent national definitions of “illegal”

2.7. Data‑protection & surveillance concerns

3. How the ECHR Framework Can Be Used to Shape DSA Interpretation

4. Concrete Recommendations for Aligning the DSA with ECHR/Freedom‑of‑Speech Safeguards

5. Summary Diagram

6. Bottom Line