超小型ローカルLLM のバックアップ(No.3)

[ トップ ] [ 新規 | 一覧 | 単語検索 | 最終更新 | ヘルプ ]

バックアップ一覧
差分を表示
現在との差分を表示
ソースを表示
超小型ローカルLLM へ行く。
- 1 (2026-01-19 (月) 00:05:44)
- 2 (2026-01-19 (月) 01:04:26)
- 3 (2026-01-19 (月) 06:55:04)

目次 †

目次
LFM2.5 - 超小型ローカルLLMの決定版
- スペック概要
実際に動かしてみた
ChatGPT風のGUIで使う
リンク

↑

LFM2.5 - 超小型ローカルLLMの決定版 †

2026/01の記事である。

なんと、LFM2.5という超小型LLMが出てきた。

https://www.youtube.com/watch?v=FyN7bPVTJ5M

無料＆高スピード！つまり、ローカルで監視させる用途など、AIが使いたい放題になるということ。

クオリティも、そこそこいい！これはやばい。

さらに、日本語特化版「LFM2.5-1.2B-JP」もリリースされている！！！

絶対秘密な個人プロジェクト系に差し込む一筋の光！！！

OpenAIより、オープンです。だれか、この会社に寄付してあげてください。

↑

スペック概要 †

約12億パラメータ（めちゃ軽い）
28兆トークンで学習
GTX 1660（6GB VRAM）でも余裕で動く
Qwen3やLlama 3.2の約2倍のスピード

↑

実際に動かしてみた †

Windows + GTX 1660環境で動作確認済み。

↑

必要なライブラリのインストール †

pip install transformers accelerate gradio --user

↑

GPU対応PyTorch?のインストール †

まずGPUが認識されているか確認。

nvidia-smi

CUDA Versionが表示されればOK。

CPU版PyTorch?が入っている場合は入れ直す：

pip uninstall torch torchvision torchaudio -y
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 --user

確認：

python
>>> import torch
>>> print(torch.cuda.is_available())
True

Trueが出ればGPU認識成功！

↑

まずはコマンドラインで試す †

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_id = "LiquidAI/LFM2.5-1.2B-JP"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype="float16",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "日本の首都はどこですか？"

input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)

output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.3,
    max_new_tokens=512,
    streamer=streamer,
)

初回はモデルダウンロード（約2.4GB）で数分かかる。

結果：

日本の首都は**東京**です。

動いた！

↑

ChatGPT風のGUIで使う †

Gradioを使えば、ブラウザでチャットできる。

import gradio as gr
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "LiquidAI/LFM2.5-1.2B-JP"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype="float16",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

def chat(message, history):
    messages = []
    for h in history:
        if isinstance(h, dict):
            messages.append(h)
        else:
            messages.append({"role": "user", "content": str(h[0])})
            if h[1]:
                messages.append({"role": "assistant", "content": str(h[1])})
    messages.append({"role": "user", "content": message})
    
    input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")
    output = model.generate(input_ids, max_new_tokens=512, temperature=0.3, do_sample=True)
    return tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)

gr.ChatInterface(chat, title="LFM2.5-JP Chat (GPU)").launch()

実行すると http://127.0.0.1:7860 でChatGPT風の画面が開く。

完全ローカルで動くので、機密情報を含むプロジェクトでも安心して使える！

↑

リンク †

Introducing LFM2.5: The Next Generation of On-Device AI | Liquid AI
- https://www.liquid.ai/blog/introducing-lfm2-5-the-next-generation-of-on-device-ai
LiquidAI/LFM2.5-1.2B-JP · Hugging Face
- https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP
LiquidAI/LFM2.5-1.2B-Instruct · Hugging Face
- https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct
GIGAZINE記事
- https://gigazine.net/gsc_news/en/20260107-lfm2-5-on-device-ai/