Skip to main content

Cookpit v3.2 — Source-Content Categorisation Rules

The chef-detective handles five distinct source-content categories consistently across every v3.2 file. This document publishes the categorisation rules so the AI Chef does not have to re-derive them per recipe.

Referenced by bundle/v3.2/prompt.md (in the deductive working order) and by bundle/v3.2/rules.md (as part of source-faithful handling under rules A2 / K1 / O).


1. Why this document exists

Source recipes routinely carry non-method content — recipe tips, sponsored adverts, paywall chrome, source typos, make-ahead notes, and post-cook hints — that the chef-detective handles case-by-case. Each is handled consistently, but the categorisation rule was only in the chef-detective's head. This document publishes the rule.

The rule applies in stage C of the canonical fingerprint normalisation (filtering before tokenisation) and in the prompt's phase-1 resource-selection step (deciding what enters recipeInstructions[], what enters prereqs, and what is silently filtered).


2. The five categories

Every piece of source content not in the explicit method body falls into one of five categories. The handling differs per category.

2.1 Sponsored content / brand placement

Examples seen in the active corpus:

  • Try our app (BBC Good Food sources — carbonara, boeuf bourguignon)
  • BECOMEAMEMBER (Great British Chefs paywall — pork-fillet-braised-cheeks-and-pork-belly)

Handling:

  • Filter from recipeInstructions[] entirely. Sponsored content is not a culinary instruction; including it misleads schema.org consumers into treating advertising as part of the recipe.
  • Filter from the active-number sequence in stage C of fingerprint normalisation.
  • Document the exclusion in a prerequisites.notes[] item so the detective's filter is auditable.

Detection heuristic: Match the source line against the published v3.2.0 sponsored-content allowlist (see canonical-fingerprint-normalisation.md §5):

BECOMEAMEMBER
Try our app
You have <N> remaining read(s) today
Finish Ultimate Plus

Future revisions extend this list. Implementations SHOULD allow configuration with brand-specific patterns.


2.2 Paywall chrome / website navigation

Examples seen in the active corpus:

  • You have two remaining reads today (Great British Chefs — pork-fillet-braised-cheeks-and-pork-belly)
  • Page-number/nav references that bleed into the extracted text
  • Timestamp metadata captured by PDF extraction in some sources

Handling: identical to sponsored content — filter from recipeInstructions[] and from the fingerprint sequence; document the exclusion in a prereq note.

Detection heuristic: lines that are clearly UI text rather than recipe content. The patterns vary per source platform. A productised detector should:

  • match against the published v3.2.0 chrome-pattern allowlist
  • accept user-configured allowlists for new sources
  • err on the side of preservation when ambiguous (false-positive filtering removes culinary content; false-negative filtering only adds harmless noise)

2.3 Culinary explanation tip

Examples seen in the active corpus:

  • "The classic red wine to use in beef bourguignon is a burgundy (pinot noir), but any dry red that you would happily drink works." (boeuf)
  • "Beef bourguignon benefits from getting good-quality, well-marbled meat from the butcher's shop." (boeuf)

Handling:

  • Preserve verbatim in recipeInstructions[] as a 'Recipe tip: <text>' HowToStep. The Recipe tip: prefix distinguishes them from regular method steps for downstream consumers.
  • No structural encoding — these are advisory, not actionable.
  • Filter from the active-number sequence if they appear in a tips-block segment (per stage C of normalisation).

2.4 Structurally-actionable tip (make-ahead, leadTime,

post-cook hint) Examples seen in the active corpus:

  • "Press the belly with the garlic, parsley and salt for 8 hours between two trays" (pork-fillet-braised-cheeks-and-pork-belly) → encoded as prereq.ingredients[].leadTime: "PT8H", and the 8-hour press itself is realised as a cookpit.prepCook phase

Handling:

  • Encode structurally wherever the schema offers a primitive (prereq.leadTime, the cookpit.preCook phase block, the cookpit.prepCook phase block).
  • Preserve in recipeInstructions[] as a 'Recipe tip: <text>' HowToStep so the source's own words are auditable.
  • Filter from the active-number sequence as tips-block content.

2.5 Source typo

Examples seen in the active corpus:

  • "sweat of the carrots, onions and onions is a large saucepan" (pork-fillet-braised-cheeks-and-pork-belly, Step 1) — three errors: "of" should be omitted, "onions and onions" duplicates, "is" should be "in"

Handling:

  • Silently correct in tasks[].action — the chef-detective's job is to deduce CULINARY truth, not perpetuate source defects.
  • Preserve verbatim in recipeInstructions[] — the source's words are kept for source-faithful pass-through.
  • Document the correction in a prerequisites.notes[] item so the silent correction is auditable.
  • For numeric typos (none observed in the active corpus, but the pattern matters): the active-number sequence reflects what the source SAYS, not what the chef thinks the source MEANT. Numeric typos enter the fingerprint as written.

3. Decision tree

Source content not in the explicit method body

├─ Is it advertising or brand placement?
│ YES → 2.1 sponsored content
│ [filter from recipeInstructions, fingerprint;
│ document in prereq notes]

├─ Is it website chrome / paywall / nav text?
│ YES → 2.2 paywall chrome
│ [filter; document]

├─ Is it a typo (semantic or factual)?
│ YES → 2.5 source typo
│ [correct in tasks[].action; preserve verbatim in
│ recipeInstructions[]; document in prereq note]

├─ Is it actionable as make-ahead / leadTime / post-cook?
│ YES → 2.4 structurally-actionable tip
│ [encode structurally; preserve verbatim in
│ recipeInstructions[]; filter from fingerprint]

└─ Otherwise (advisory culinary explanation, ratings, attribution etc.)
→ 2.3 culinary explanation tip
[preserve verbatim in recipeInstructions[];
no structural encoding;
filter from fingerprint if in tips-block]

4. Worked example: pork-fillet-braised-cheeks-and-pork-belly

The Stephen Crane source (pork_three_ways.pdf) contains:

Source contentCategoryHandling
BECOMEAMEMBER (×3, paywall)2.1 sponsoredFiltered from recipeInstructions; documented in prereq note q82e211144f
You have two remaining reads today (×3, paywall)2.2 chromeFiltered; documented
sweat of the carrots, onions and onions is a large saucepan (Step 1 typo)2.5 source typoCorrected to "carrots and onions in a large saucepan" in tasks[].action; verbatim in recipeInstructions[]; documented in prereq note q23389da5bd
Press the belly with the garlic, parsley and salt for 8 hours (Step 2)2.4 structurally-actionableEncoded as prereq.ingredients[].leadTime: "PT8H" (q15141f2498); verbatim in recipeInstructions[]; documented in prereq note q44554d5ffd
(no culinary explanation tips in the pork source)

Each prereq note in the active corpus documents its own filter decision; this rule consolidates the categorisation so future files follow the same pattern without re-derivation.


5. Conformance

A v3.2 file conforms to this rule when:

  1. Sponsored content and paywall chrome are absent from recipeInstructions[].
  2. Source typos are silently corrected in tasks[].action and preserved verbatim in recipeInstructions[].
  3. Each filter / correction is documented in a prerequisites.notes[] item.
  4. Structurally-actionable tips use the schema's existing primitives (prereq.leadTime, the cookpit.prepCook and cookpit.preCook phase blocks) rather than free-text prose.
  5. The cookpit-active-number-sequence-v3.2.0 fingerprint omits all five non-method categories.

The validator's V-LEX-FORBIDDEN and V-LEX-PERSONA-DRIFT checks catch the lexicon side; the categorisation itself is editorial and not (yet) machine-checkable beyond the published allowlists.