Angus Hally — AI Product Management

Every founder I meet with an AI product is convinced their hard part is the model. By the time they have shipped past the demo, they have discovered that the hard part is everything around it — the clinical advisors who decide whether a response is safe, the app-store reviewer who decides whether the screenshot is misleading, the pricing decision that decides whether a number implies a medical claim, and the version pipeline that decides whether last Tuesday’s release is better than this Tuesday’s. The model is twenty percent of the work and eighty percent of the conversation.

I am the co-founder and COO of HeyLina, an emotionally intelligent AI mobile product on iOS and Android. I do the eighty percent. My co-founder builds the model and the application; I make sure everything around it — including the company itself — compounds week on week. This page is the working paper on that practice, partly so I remember what I have learned, and partly so people running comparable products can recognise themselves in it and write to me.

— Fig. 01 · The four boxes I run

The AI-PM stack at HeyLina

Clinical & safety advisors. Clinical advisor relationships I run directly, adjudicating ambiguous responses against a documented safety posture on a regular cadence.
App-store operations. Versioned screenshots, response-policy docs, and the dialogue we have with reviewers every release across two app stores.
Compliance & eval trail. Lina Lab — our prompt-evaluation engine — gives an audit-ready record: LLM-as-judge with full provenance (judge type, model, prompt version, rater id), multi-scope rubrics, version-pinned prompts, and a promotion pipeline.
Pricing & positioning. The single biggest lever on whether users perceive us as wellness or as medical — the highest-stakes framing decision we make.

— Each box has its own cadence, ritual, and document. The model is only one input to any of them.

— § 01The case for an AI-PM role at all

AI products do not fail because the model is wrong. They fail because the company around the model is not yet built. The PM role I am describing is closer to the early role of a regulatory-affairs lead at a biotech, or a head of operations at a clinical-trials site, than it is to anything in consumer software. It is a job that did not exist three years ago and that almost nobody is writing about in public — partly because the people doing it are too busy doing it.

“AI products do not fail because the model is wrong. They fail because the company around the model is not yet built.”

The discipline I draw on most is not product management as it is taught at FAANGs. It is the operating-cadence work I did at Accenture and the data-valuation work I did at Anmut — both of which are really about turning ambiguous information into a decision a team can act on in a given week. That muscle transfers almost without modification to AI product work, because the modal problem of an AI PM is exactly that: “what should we decide on Monday given a model whose behaviour we only partially understand?”

— § 02What I actually do in a week

Monday is the pipeline review — last week’s release notes, this week’s fixture additions, anything the eval harness has flagged. The middle of the week is clinical-safety work and app-store ops: flagged transcripts adjudicated against the rubric, and whatever the regulator-of-the-week has decided to write about. The back half is the long-horizon work — pricing experiments, advisor relationships, and the compliance documents nobody asks for until they suddenly do.

There is no calendar slot for “make the product better.” The product gets better because the four boxes above each get one degree more rigorous every week. That, and a measurable eval loop underneath it, is the whole game.

— § 03Why the eval engine is the credential

Most AI products are run on vibes; this one isn’t. Underneath HeyLina sits Lina Lab, a prompt-evaluation engine I personally architected and ship into — a versioned prompt catalog, a variant-comparison runtime, and a multi-scope eval framework (message, turn, conversation, variant). It exists because the commercial reasoning about iteration velocity required it, not because the engineering taste alone did. The dev work is the credential: I read PRs, write PRs, and ship PRs across mobile, backend, and the eval layer.

I am useful to a team that has shipped an AI product to real users and whose operational machinery has not caught up — that is, if you can feel the four boxes above tugging at you but have not yet named them. I am less useful if you are still trying to find product-market fit; there are people much better than me for that.

The unglamorous half of shipping AI products.

— § 01The case for an AI-PM role at all

— § 02What I actually do in a week

— § 03Why the eval engine is the credential

The shape of an AI-PM engagement, itemised.

Audit the four boxes

Install the cadence

Build the compliance trail

Stand up the eval loop

Pricing & positioning

Advisor recruitment

— References & further reading

For correspondence
or working sessions.

The unglamorous half of shipping AI products.

— § 01The case for an AI-PM role at all

— § 02What I actually do in a week

— § 03Why the eval engine is the credential

The shape of an AI-PM engagement, itemised.

Audit the four boxes

Install the cadence

Build the compliance trail

Stand up the eval loop

Pricing & positioning

Advisor recruitment

— References & further reading

For correspondenceor working sessions.

For correspondence
or working sessions.