Can I Run a Decent Local AI Model on My iPhone 17 Pro?

Every time I open the ChatGPT app on my iPhone I have a low key anxious feeling running over me that everything I type in the app will be saved in perpetuity and can be handed over to the authorities.

Sam Altman even mentioned in a Theo Von podcast:

“So if you go talk to ChatGPT about your most sensitive stuff, and then there is … a lawsuit or whatever, … we could be required to produce that.”

As I discussed before I don’t want to self-censor in any way as it changes how and what I feel free to think.

So imagine the feeling I had today when I found out that I am able to run a local AI LLM that is somewhat comparable to the frontier labs LLMs on my iPhone. I want to keep my conversations to myself as much as possible, so I was immediately intrigued by this. Being able to have on-device inference without any API calls gives me a feeling of control without any external server of cloud logging.

First I had to download an app on my iPhone that made these models available to download. The app is “Locally AI” in the AppStore.

After downloading the app I saw the Apple Foundation model. An on-device model by Apple, pre-installed on the iPhone. It is used by Siri to answer simple questions.

But there was a large list of open source models that could be downloaded. From the newest Qwen 3.5 models in several variants to Google’s Gemma 3n models.

I started out by downloading the Qwen 3.5 model with 2 billion parameters. It should run smoothly on the iPhone 17 Pro.

After downloading the model I was able to start asking questions. I entered a basic Hello. The model replied fast with: Hello! How can I help you today?.

I asked it: What is a good way to test the quality of a good local LLM? and the answer was pretty much what I expected of a LLM response to this question.

I am not going to actually compare the quality here as this is not what this experiment is about. The experiment is whether I am able to have a local model on my iPhone that is able to answer about 80% of the quick factual questions I ask during the day in the next 2 weeks. Not coding or brainstorming. That might come later.

And so far so good. Of course only time can tell if my muscle memory is not going to be taking me back to ChatGPT. Or how much lower quality answers I accept as being ok, knowing I run it locally on my device without any privacy concerns.

Key Insight

I hold strong principles. As long as the performance of a local LLM is comparable, holding on to them is easy. When the quality gap widens, my privacy becomes negotiable. Dependence on online models is not forced. It is chosen when the alternatives feel inferior.