跪拜 Guibai
← All articles
Backend

Marvis Turns Your Phone Into a Second Computer — And It Actually Understands Your Files

By 苍何 ·
Read original on juejin.cn ↗ Google Translate ↗ Alt translation

Marvis represents a shift from remote desktop tools to intent-driven device agents. For developers who work across multiple machines or travel frequently, it collapses the gap between mobile and desktop workflows — and its local-first architecture addresses the privacy concerns that usually block adoption of such tools.

Summary

Tencent's Marvis is an AI agent that redefines remote computer access. Instead of just mirroring a screen, it lets users control their home or office computer entirely from a phone — executing commands, running apps like Codex, transferring files, and even generating PPTs — all through voice or text input.

What sets Marvis apart from other device-control agents is its local intelligence layer. After authorization, it analyzes files on the computer — categorizing images by content, recognizing faces, sorting documents by type — and builds a local knowledge base that runs entirely on-device. Users can then find files by describing what they want, not by remembering folder paths.

Privacy is handled with on-device models and a local-only mode for sensitive work, though that mode requires significant hardware (16+ CPU cores, 32 GB RAM, 16+ GB VRAM). The agent supports cross-platform connections: Android to Mac, iOS to Windows.

Takeaways
Marvis lets users control a remote computer from a phone via voice or text commands, with real-time desktop preview.
It can execute tasks like running Codex, installing plugins, transferring files, and generating PPTs.
File recognition uses on-device local models — no data leaves the computer unless explicitly authorized.
A local-only mode runs all computation on-device but requires 16+ CPU cores, 32 GB RAM, 16+ GB VRAM, and driver 535.0+.
Supports cross-platform connections: Android to Mac, iOS to Windows.
Built-in skills include system health checks, desktop cleanup, and file search by description.
Conclusions

Marvis's key innovation is decoupling user location from device location — most agents still require sitting at the computer.

The local knowledge base transforms file retrieval from 'remembering where things are' to 'describing what you want,' which is a fundamental UX shift.

Privacy-first architecture (on-device models, ephemeral cloud data) could be a competitive advantage as enterprise users grow wary of cloud-only agents.

The hardware requirements for local mode are steep — this limits adoption to high-end machines and signals that truly private on-device AI still demands serious compute.

Marvis blurs the line between remote desktop and AI assistant, suggesting a future where the OS itself becomes an agent interface.

Concepts & terms
On-device AI model
An AI model that runs entirely on the user's local hardware rather than in the cloud. Data never leaves the device, providing stronger privacy guarantees.
Local knowledge base
A searchable index of files and their contents built and stored on the local machine. Allows users to find files by describing content rather than navigating folders.
Vibe coding
A term popularized by Andrej Karpathy describing a relaxed, flow-state approach to coding where the developer describes intent and the AI handles implementation.
Source: juejin.cn ↗ Google Translate ↗ Backup ↗