TL;DR: Got Immich running with CLIP-based semantic search on a Raspberry Pi 5 using the AXera AX8850 NPU. Chinese language search works surprisingly well thanks to the ViT-L-14-336-CN model. Setup took about 30 minutes once I figured out the ML server configuration.
What is Immich?
Immich is an open-source, self-hosted photo and video management platform. Think Google Photos, but you control the data. It supports automatic backup, intelligent search, and cross-device access.
Why This Setup?
I wanted to test AI-accelerated image search on edge hardware. The AXera AX8850 NPU on our M5Stack development board provides hardware acceleration for the CLIP models, making semantic search actually usable on a Pi.
Hardware Setup
Raspberry Pi 5
M5Stack AX8850 AI Module (provides NPU acceleration)
Standard Pi power supply and storage
Step-by-Step Deployment
1. Download the Pre-built Package
Grab the optimized Immich build from HuggingFace:
git clone https://huggingface.co/AXERA-TECH/immich
Note: You'll need git lfs installed. If you don't have it, install it first.
What you get:
m5stack@raspberrypi:~/rsp/immich $ ls -lh
total 421M
drwxrwxr-x 2 m5stack m5stack 4.0K Oct 10 09:12 asset
-rw-rw-r-- 1 m5stack m5stack 421M Oct 10 09:20 ax-immich-server-aarch64.tar.gz
-rw-rw-r-- 1 m5stack m5stack 0 Oct 10 09:12 config.json
-rw-rw-r-- 1 m5stack m5stack 7.6K Oct 10 09:12 docker-deploy.zip
-rw-rw-r-- 1 m5stack m5stack 104K Oct 10 09:12 immich_ml-1.129.0-py3-none-any.whl
-rw-rw-r-- 1 m5stack m5stack 9.4K Oct 10 09:12 README.md
-rw-rw-r-- 1 m5stack m5stack 177 Oct 10 09:12 requirements.txt
2. Load the Docker Image
cd immich
docker load -i ax-immich-server-aarch64.tar.gz
If Docker isn't installed, you'll need to set that up first.
3. Configure the Environment
unzip docker-deploy.zip
cp example.env .env
4. Start the Core Services
docker compose -f docker-compose.yml -f docker-compose.override.yml up -d
Success looks like this:
[+] Running 3/3
✔ Container immich_postgres Started 1.0s
✔ Container immich_redis Started 0.9s
✔ Container immich_server Started 0.9s
5. Set Up the ML Service (The Interesting Part)
The ML service handles the AI-powered image search. It runs separately to leverage the NPU.
Create and activate a virtual environment:
python -m venv mich
source mich/bin/activate
Install dependencies:
pip install https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc2/axengine-0.1.3-py3-none-any.whl
pip install -r requirements.txt
pip install immich_ml-1.129.0-py3-none-any.whl
Launch the ML server:
python -m immich_ml
You should see:
[10/10/25 09:50:12] INFO Listening at: http://[::]:3003 (8698)
[INFO] Available providers: ['AXCLRTExecutionProvider']
[10/10/25 09:50:16] INFO Application startup complete.
The AXCLRTExecutionProvider confirms the NPU is being used.
Web Interface Configuration
Initial Setup
Navigate to http://<your-pi-ip>:3003 (e.g., 192.168.20.27:3003)
First visit requires admin account creation - credentials are stored locally
<img src="https://m5stack.oss-cn-shenzhen.aliyuncs.com/resource/linux/ax8850_card/images/immich1.png" width="95%" />
Configure the ML Server
This is critical - the web interface needs to know where your ML service is running.
Go to Settings → Machine Learning
Set the URL to your Pi's IP and port 3003: http://192.168.20.27:3003
Choose your CLIP model based on language:
Chinese search: ViT-L-14-336-CN__axera
English search: ViT-L-14-336__axera
<img src="https://m5stack.oss-cn-shenzhen.aliyuncs.com/resource/linux/ax8850_card/images/immich4.png" width="95%" />
First-Time Index
Important: You need to manually trigger the initial indexing.
Go to Administration → Jobs
Find "SMART SEARCH"
Click "Run Job" to process your uploaded images
<img src="https://m5stack.oss-cn-shenzhen.aliyuncs.com/resource/linux/ax8850_card/images/immich6.png" width="95%" />
Testing Image Search
Upload some photos, wait for indexing to complete, then try semantic searches:
<img src="https://m5stack.oss-cn-shenzhen.aliyuncs.com/resource/linux/ax8850_card/images/immich7.png" width="95%" />
The search works conceptually - you can search for "sunset" or "dogs playing" and it'll find relevant images even if those exact words aren't in the filename.
Technical Notes
The NPU acceleration makes CLIP inference fast enough for interactive search
Chinese language support is genuinely good with the CN model
The ML server runs independently, so you can restart it without affecting the main Immich service
Docker handles PostgreSQL and Redis automatically
Why M5Stack in This Stack?
The AX8850 NPU module provides the hardware acceleration that makes this practical on a Pi. Without it, running CLIP inference would be too slow for interactive use. We're working on more edge AI applications that leverage this acceleration - this Immich setup is a good real-world test case.
Questions about the setup or the NPU integration? Happy to dig into specifics.