超小型ローカルLLM の変更点

[ トップ ] [ 編集 | 差分 | バックアップ | 添付 | リロード ] [ 新規 | 一覧 | 単語検索 | 最終更新 | ヘルプ ]
追加された行はこの色です。
削除された行はこの色です。
超小型ローカルLLM へ行く。
超小型ローカルLLM の差分を削除
* 目次 [#s735551b]
#contents

* LFM2.5 - 超小型ローカルLLMの決定版 [#mbcc441f]

2026/01の記事である。

なんと、LFM2.5という超小型LLMが出てきた。

https://www.youtube.com/watch?v=FyN7bPVTJ5M

無料＆高スピード！
つまり、ローカルで監視させる用途など、AIが使いたい放題になるということ。

クオリティも、そこそこいい！これはやばい。

さらに、日本語特化版「LFM2.5-1.2B-JP」もリリースされている！！！

絶対秘密な個人プロジェクト系に差し込む一筋の光！！！

OpenAIより、オープンです。だれか、この会社に寄付してあげてください。

** スペック概要 [#n5e35271]

- 約12億パラメータ（めちゃ軽い）
- 28兆トークンで学習
- GTX 1660（6GB VRAM）でも余裕で動く
- Qwen3やLlama 3.2の約2倍のスピード

* 実際に動かしてみた [#dfd0f2b4]

Windows + GTX 1660環境で動作確認済み。

** 必要なライブラリのインストール [#daf37885]

 pip install transformers accelerate gradio --user

** GPU対応PyTorchのインストール [#zfc41e34]

まずGPUが認識されているか確認。

 nvidia-smi

CUDA Versionが表示されればOK。

CPU版PyTorchが入っている場合は入れ直す：

 pip uninstall torch torchvision torchaudio -y
 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 --user

確認：

 python
 >>> import torch
 >>> print(torch.cuda.is_available())
 True

Trueが出ればGPU認識成功！

** まずはコマンドラインで試す [#i27bcff4]

 from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
 
 model_id = "LiquidAI/LFM2.5-1.2B-JP"
 model = AutoModelForCausalLM.from_pretrained(
     model_id,
     device_map="cuda",
     torch_dtype="float16",
 )
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
 
 prompt = "日本の首都はどこですか？"
 
 input_ids = tokenizer.apply_chat_template(
     [{"role": "user", "content": prompt}],
     add_generation_prompt=True,
     return_tensors="pt",
     tokenize=True,
 ).to(model.device)
 
 output = model.generate(
     input_ids,
     do_sample=True,
     temperature=0.3,
     max_new_tokens=512,
     streamer=streamer,
 )

初回はモデルダウンロード（約2.4GB）で数分かかる。

結果：
 日本の首都は**東京**です。

動いた！

* ChatGPT風のGUIで使う [#g5a3d0e2]

Gradioを使えば、ブラウザでチャットできる。

 import gradio as gr
 from transformers import AutoModelForCausalLM, AutoTokenizer
 
 model_id = "LiquidAI/LFM2.5-1.2B-JP"
 model = AutoModelForCausalLM.from_pretrained(
     model_id,
     device_map="cuda",
     torch_dtype="float16",
 )
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 
 def chat(message, history):
     messages = []
     for h in history:
         if isinstance(h, dict):
             messages.append(h)
         else:
             messages.append({"role": "user", "content": str(h[0])})
             if h[1]:
                 messages.append({"role": "assistant", "content": str(h[1])})
     messages.append({"role": "user", "content": message})
     
     input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")
     output = model.generate(input_ids, max_new_tokens=512, temperature=0.3, do_sample=True)
     return tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)
 
 gr.ChatInterface(chat, title="LFM2.5-JP Chat (GPU)").launch()

実行すると http://127.0.0.1:7860 でChatGPT風の画面が開く。

完全ローカルで動くので、機密情報を含むプロジェクトでも安心して使える！

* リンク [#i04ed8c9]

- Introducing LFM2.5: The Next Generation of On-Device AI | Liquid AI
-- https://www.liquid.ai/blog/introducing-lfm2-5-the-next-generation-of-on-device-ai
- LiquidAI/LFM2.5-1.2B-JP · Hugging Face
-- https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP
- LiquidAI/LFM2.5-1.2B-Instruct · Hugging Face
-- https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct
- GIGAZINE記事
-- https://gigazine.net/gsc_news/en/20260107-lfm2-5-on-device-ai/


* UI-TARS [#yd99aac0]
TikTokを作ったByteDanceが
AIエージェントをリリース

- PCを自動操作するAIエージェント
- VS Code設定、ホテル予約、ファイル操作などPC操作なら何でも
- Windows / macOS / Linux対応
- Apache 2.0で完全オープンソース

https://github.com/bytedance/UI-TARS-desktop

https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B