A. Hally — Field notes on AI product
Vol. III · Issue 022026 — Q2Field notes/ai-pm
— Practice paper · AI product management

The unglamorous half of shipping AI products.

A working paper on what AI product management actually looks like when the model is the hard part — clinical advisors, compliance, versioning, app-store ops, pricing — written from inside an emotionally-intelligent AI product that ships weekly.

Every founder I meet with an AI product is convinced their hard part is the model. By the time they have shipped past the demo, they have discovered that the hard part is everything around it — the clinical advisors who decide whether a response is safe, the app-store reviewer who decides whether the screenshot is misleading, the pricing decision that decides whether a number implies a medical claim, and the version pipeline that decides whether last Tuesday’s release is better than this Tuesday’s. The model is twenty percent of the work and eighty percent of the conversation.

I am the co-founder and COO of HeyLina, an emotionally intelligent AI mobile product on iOS and Android. I do the eighty percent. My co-founder builds the model and the application; I make sure everything around it — including the company itself — compounds week on week. This page is the working paper on that practice, partly so I remember what I have learned, and partly so people running comparable products can recognise themselves in it and write to me.

— § 01The case for an AI-PM role at all

AI products do not fail because the model is wrong. They fail because the company around the model is not yet built. The PM role I am describing is closer to the early role of a regulatory-affairs lead at a biotech, or a head of operations at a clinical-trials site, than it is to anything in consumer software. It is a job that did not exist three years ago and that almost nobody is writing about in public — partly because the people doing it are too busy doing it.

“AI products do not fail because the model is wrong. They fail because the company around the model is not yet built.”

The discipline I draw on most is not product management as it is taught at FAANGs. It is the operating-cadence work I did at Accenture and the data-valuation work I did at Anmut — both of which are really about turning ambiguous information into a decision a team can act on in a given week. That muscle transfers almost without modification to AI product work, because the modal problem of an AI PM is exactly that: “what should we decide on Monday given a model whose behaviour we only partially understand?”

— § 02What I actually do in a week

Monday is the pipeline review — last week’s release notes, this week’s fixture additions, anything the eval harness has flagged. The middle of the week is clinical-safety work and app-store ops: flagged transcripts adjudicated against the rubric, and whatever the regulator-of-the-week has decided to write about. The back half is the long-horizon work — pricing experiments, advisor relationships, and the compliance documents nobody asks for until they suddenly do.

There is no calendar slot for “make the product better.” The product gets better because the four boxes above each get one degree more rigorous every week. That, and a measurable eval loop underneath it, is the whole game.

— § 03Why the eval engine is the credential

Most AI products are run on vibes; this one isn’t. Underneath HeyLina sits Lina Lab, a prompt-evaluation engine I personally architected and ship into — a versioned prompt catalog, a variant-comparison runtime, and a multi-scope eval framework (message, turn, conversation, variant). It exists because the commercial reasoning about iteration velocity required it, not because the engineering taste alone did. The dev work is the credential: I read PRs, write PRs, and ship PRs across mobile, backend, and the eval layer.

I am useful to a team that has shipped an AI product to real users and whose operational machinery has not caught up — that is, if you can feel the four boxes above tugging at you but have not yet named them. I am less useful if you are still trying to find product-market fit; there are people much better than me for that.

— § 04 · Work together

The shape of an AI-PM engagement, itemised.

i.

Audit the four boxes

Map the existing operating ritual against the four-box model, identify the two that are most under-built, and write a memo with concrete next steps.

ii.

Install the cadence

Weekly pipeline review, structured advisor session, release-notes discipline. Paired with the existing PM so the muscle transfers.

iii.

Build the compliance trail

The audit-ready document set: fixture provenance, rubric versions, model-change log, advisor adjudication record. The regulator will eventually ask.

iv.

Stand up the eval loop

LLM-as-judge with provenance, variant comparison, multi-scope rubrics, and a promotion pipeline — the Lina Lab pattern, adapted to your stack so iteration stops being guesswork.

v.

Pricing & positioning

An evidence-based working session on the most consequential framing decision of your product, informed by data-valuation work. Plus a written recommendation.

vi.

Advisor recruitment

Identifying, interviewing, and contracting an initial clinical or domain panel, using a structured interview script.

— References & further reading

  1. i. Hally, A. Harness engineer — notes on the runtime around the LLM. /harness
  2. ii. Hally, A. Developer — can he ship? /dev
  3. iii. Hally, A. Data strategist — data valuation that survives engineering reality. /strategist
  4. iv. Anthropic. Constitutional AI and the operationalisation of safety. anthropic.com · 2023.
  5. v. FDA. Artificial Intelligence/Machine Learning-Based Software as a Medical Device (SaMD) Action Plan. 2021.
  6. vi. HeyLina — emotionally intelligent AI, iOS & Android. heylina.ai
  7. vii. Hally, A. How I got here. /about
  8. viii. BSc Philosophy & Economics, First Class — London School of Economics, 2013–2016.

For correspondence
or working sessions.

I read everything that arrives, and reply within two working days. The most useful thing you can send is a one-page note of where you are stuck.

Letters  ·  angus.hally@gmail.com
LinkedIn  ·  angus-hally

What I collect: your name, email, and message — used only to reply, attributed to this page, and never sold. See the privacy notes.

Spam protection loads when you start typing.
A. Hally — Field notes on AI product
Vol. III · Issue 022026 — Q2Field notes/ai-pm
— Practice paper · AI product management

The unglamorous half of shipping AI products.

A working paper on what AI product management actually looks like when the model is the hard part — clinical advisors, compliance, versioning, app-store ops, pricing — written from inside an emotionally-intelligent AI product that ships weekly.

Every founder I meet with an AI product is convinced their hard part is the model. By the time they have shipped past the demo, they have discovered that the hard part is everything around it — the clinical advisors who decide whether a response is safe, the app-store reviewer who decides whether the screenshot is misleading, the pricing decision that decides whether a number implies a medical claim, and the version pipeline that decides whether last Tuesday’s release is better than this Tuesday’s. The model is twenty percent of the work and eighty percent of the conversation.

I am the co-founder and COO of HeyLina, an emotionally intelligent AI mobile product on iOS and Android. I do the eighty percent. My co-founder builds the model and the application; I make sure everything around it — including the company itself — compounds week on week. This page is the working paper on that practice, partly so I remember what I have learned, and partly so people running comparable products can recognise themselves in it and write to me.

— § 01The case for an AI-PM role at all

AI products do not fail because the model is wrong. They fail because the company around the model is not yet built. The PM role I am describing is closer to the early role of a regulatory-affairs lead at a biotech, or a head of operations at a clinical-trials site, than it is to anything in consumer software. It is a job that did not exist three years ago and that almost nobody is writing about in public — partly because the people doing it are too busy doing it.

“AI products do not fail because the model is wrong. They fail because the company around the model is not yet built.”

The discipline I draw on most is not product management as it is taught at FAANGs. It is the operating-cadence work I did at Accenture and the data-valuation work I did at Anmut — both of which are really about turning ambiguous information into a decision a team can act on in a given week. That muscle transfers almost without modification to AI product work, because the modal problem of an AI PM is exactly that: “what should we decide on Monday given a model whose behaviour we only partially understand?”

— § 02What I actually do in a week

Monday is the pipeline review — last week’s release notes, this week’s fixture additions, anything the eval harness has flagged. The middle of the week is clinical-safety work and app-store ops: flagged transcripts adjudicated against the rubric, and whatever the regulator-of-the-week has decided to write about. The back half is the long-horizon work — pricing experiments, advisor relationships, and the compliance documents nobody asks for until they suddenly do.

There is no calendar slot for “make the product better.” The product gets better because the four boxes above each get one degree more rigorous every week. That, and a measurable eval loop underneath it, is the whole game.

— § 03Why the eval engine is the credential

Most AI products are run on vibes; this one isn’t. Underneath HeyLina sits Lina Lab, a prompt-evaluation engine I personally architected and ship into — a versioned prompt catalog, a variant-comparison runtime, and a multi-scope eval framework (message, turn, conversation, variant). It exists because the commercial reasoning about iteration velocity required it, not because the engineering taste alone did. The dev work is the credential: I read PRs, write PRs, and ship PRs across mobile, backend, and the eval layer.

I am useful to a team that has shipped an AI product to real users and whose operational machinery has not caught up — that is, if you can feel the four boxes above tugging at you but have not yet named them. I am less useful if you are still trying to find product-market fit; there are people much better than me for that.

— § 04 · Work together

The shape of an AI-PM engagement, itemised.

i.

Audit the four boxes

Map the existing operating ritual against the four-box model, identify the two that are most under-built, and write a memo with concrete next steps.

ii.

Install the cadence

Weekly pipeline review, structured advisor session, release-notes discipline. Paired with the existing PM so the muscle transfers.

iii.

Build the compliance trail

The audit-ready document set: fixture provenance, rubric versions, model-change log, advisor adjudication record. The regulator will eventually ask.

iv.

Stand up the eval loop

LLM-as-judge with provenance, variant comparison, multi-scope rubrics, and a promotion pipeline — the Lina Lab pattern, adapted to your stack so iteration stops being guesswork.

v.

Pricing & positioning

An evidence-based working session on the most consequential framing decision of your product, informed by data-valuation work. Plus a written recommendation.

vi.

Advisor recruitment

Identifying, interviewing, and contracting an initial clinical or domain panel, using a structured interview script.

— References & further reading

  1. i. Hally, A. Harness engineer — notes on the runtime around the LLM. /harness
  2. ii. Hally, A. Developer — can he ship? /dev
  3. iii. Hally, A. Data strategist — data valuation that survives engineering reality. /strategist
  4. iv. Anthropic. Constitutional AI and the operationalisation of safety. anthropic.com · 2023.
  5. v. FDA. Artificial Intelligence/Machine Learning-Based Software as a Medical Device (SaMD) Action Plan. 2021.
  6. vi. HeyLina — emotionally intelligent AI, iOS & Android. heylina.ai
  7. vii. Hally, A. How I got here. /about
  8. viii. BSc Philosophy & Economics, First Class — London School of Economics, 2013–2016.

For correspondence
or working sessions.

I read everything that arrives, and reply within two working days. The most useful thing you can send is a one-page note of where you are stuck.

Letters  ·  angus.hally@gmail.com
LinkedIn  ·  angus-hally

What I collect: your name, email, and message — used only to reply, attributed to this page, and never sold. See the privacy notes.

Spam protection loads when you start typing.