Cookpit v3.2 — Source-Content Categorisation Rules
The chef-detective handles five distinct source-content categories consistently across every v3.2 file. This document publishes the categorisation rules so the AI Chef does not have to re-derive them per recipe.
Referenced by
bundle/v3.2/prompt.md(in the deductive working order) and bybundle/v3.2/rules.md(as part of source-faithful handling under rules A2 / K1 / O).
1. Why this document exists
Source recipes routinely carry non-method content — recipe tips, sponsored adverts, paywall chrome, source typos, make-ahead notes, and post-cook hints — that the chef-detective handles case-by-case. Each is handled consistently, but the categorisation rule was only in the chef-detective's head. This document publishes the rule.
The rule applies in stage C of the canonical fingerprint
normalisation (filtering before tokenisation) and in the prompt's
phase-1 resource-selection step (deciding what enters
recipeInstructions[], what enters prereqs, and what is silently
filtered).
2. The five categories
Every piece of source content not in the explicit method body falls into one of five categories. The handling differs per category.
2.1 Sponsored content / brand placement
Examples seen in the active corpus:
Try our app(BBC Good Food sources — carbonara, boeuf bourguignon)BECOMEAMEMBER(Great British Chefs paywall — pork-fillet-braised-cheeks-and-pork-belly)
Handling:
- Filter from
recipeInstructions[]entirely. Sponsored content is not a culinary instruction; including it misleads schema.org consumers into treating advertising as part of the recipe. - Filter from the active-number sequence in stage C of fingerprint normalisation.
- Document the exclusion in a
prerequisites.notes[]item so the detective's filter is auditable.
Detection heuristic: Match the source line against the published v3.2.0 sponsored-content allowlist (see canonical-fingerprint-normalisation.md §5):
BECOMEAMEMBER
Try our app
You have <N> remaining read(s) today
Finish Ultimate Plus
Future revisions extend this list. Implementations SHOULD allow configuration with brand-specific patterns.
2.2 Paywall chrome / website navigation
Examples seen in the active corpus:
You have two remaining reads today(Great British Chefs — pork-fillet-braised-cheeks-and-pork-belly)- Page-number/nav references that bleed into the extracted text
- Timestamp metadata captured by PDF extraction in some sources
Handling: identical to sponsored content — filter from
recipeInstructions[] and from the fingerprint sequence; document the
exclusion in a prereq note.
Detection heuristic: lines that are clearly UI text rather than recipe content. The patterns vary per source platform. A productised detector should:
- match against the published v3.2.0 chrome-pattern allowlist
- accept user-configured allowlists for new sources
- err on the side of preservation when ambiguous (false-positive filtering removes culinary content; false-negative filtering only adds harmless noise)
2.3 Culinary explanation tip
Examples seen in the active corpus:
- "The classic red wine to use in beef bourguignon is a burgundy (pinot noir), but any dry red that you would happily drink works." (boeuf)
- "Beef bourguignon benefits from getting good-quality, well-marbled meat from the butcher's shop." (boeuf)
Handling:
- Preserve verbatim in
recipeInstructions[]as a'Recipe tip: <text>'HowToStep. TheRecipe tip:prefix distinguishes them from regular method steps for downstream consumers. - No structural encoding — these are advisory, not actionable.
- Filter from the active-number sequence if they appear in a tips-block segment (per stage C of normalisation).
2.4 Structurally-actionable tip (make-ahead, leadTime,
post-cook hint) Examples seen in the active corpus:
- "Press the belly with the garlic, parsley and salt for 8 hours
between two trays"
(pork-fillet-braised-cheeks-and-pork-belly) → encoded as
prereq.ingredients[].leadTime: "PT8H", and the 8-hour press itself is realised as acookpit.prepCookphase
Handling:
- Encode structurally wherever the schema offers a primitive
(
prereq.leadTime, thecookpit.preCookphase block, thecookpit.prepCookphase block). - Preserve in
recipeInstructions[]as a'Recipe tip: <text>'HowToStep so the source's own words are auditable. - Filter from the active-number sequence as tips-block content.
2.5 Source typo
Examples seen in the active corpus:
- "sweat of the carrots, onions and onions is a large saucepan" (pork-fillet-braised-cheeks-and-pork-belly, Step 1) — three errors: "of" should be omitted, "onions and onions" duplicates, "is" should be "in"
Handling:
- Silently correct in
tasks[].action— the chef-detective's job is to deduce CULINARY truth, not perpetuate source defects. - Preserve verbatim in
recipeInstructions[]— the source's words are kept for source-faithful pass-through. - Document the correction in a
prerequisites.notes[]item so the silent correction is auditable. - For numeric typos (none observed in the active corpus, but the pattern matters): the active-number sequence reflects what the source SAYS, not what the chef thinks the source MEANT. Numeric typos enter the fingerprint as written.
3. Decision tree
Source content not in the explicit method body
│
├─ Is it advertising or brand placement?
│ YES → 2.1 sponsored content
│ [filter from recipeInstructions, fingerprint;
│ document in prereq notes]
│
├─ Is it website chrome / paywall / nav text?
│ YES → 2.2 paywall chrome
│ [filter; document]
│
├─ Is it a typo (semantic or factual)?
│ YES → 2.5 source typo
│ [correct in tasks[].action; preserve verbatim in
│ recipeInstructions[]; document in prereq note]
│
├─ Is it actionable as make-ahead / leadTime / post-cook?
│ YES → 2.4 structurally-actionable tip
│ [encode structurally; preserve verbatim in
│ recipeInstructions[]; filter from fingerprint]
│
└─ Otherwise (advisory culinary explanation, ratings, attribution etc.)
→ 2.3 culinary explanation tip
[preserve verbatim in recipeInstructions[];
no structural encoding;
filter from fingerprint if in tips-block]
4. Worked example: pork-fillet-braised-cheeks-and-pork-belly
The Stephen Crane source (pork_three_ways.pdf) contains:
| Source content | Category | Handling |
|---|---|---|
BECOMEAMEMBER (×3, paywall) | 2.1 sponsored | Filtered from recipeInstructions; documented in prereq note q82e211144f |
You have two remaining reads today (×3, paywall) | 2.2 chrome | Filtered; documented |
sweat of the carrots, onions and onions is a large saucepan (Step 1 typo) | 2.5 source typo | Corrected to "carrots and onions in a large saucepan" in tasks[].action; verbatim in recipeInstructions[]; documented in prereq note q23389da5bd |
Press the belly with the garlic, parsley and salt for 8 hours (Step 2) | 2.4 structurally-actionable | Encoded as prereq.ingredients[].leadTime: "PT8H" (q15141f2498); verbatim in recipeInstructions[]; documented in prereq note q44554d5ffd |
| (no culinary explanation tips in the pork source) | – | – |
Each prereq note in the active corpus documents its own filter decision; this rule consolidates the categorisation so future files follow the same pattern without re-derivation.
5. Conformance
A v3.2 file conforms to this rule when:
- Sponsored content and paywall chrome are absent from
recipeInstructions[]. - Source typos are silently corrected in
tasks[].actionand preserved verbatim inrecipeInstructions[]. - Each filter / correction is documented in a
prerequisites.notes[]item. - Structurally-actionable tips use the schema's existing primitives
(
prereq.leadTime, thecookpit.prepCookandcookpit.preCookphase blocks) rather than free-text prose. - The
cookpit-active-number-sequence-v3.2.0fingerprint omits all five non-method categories.
The validator's V-LEX-FORBIDDEN and V-LEX-PERSONA-DRIFT checks
catch the lexicon side; the categorisation itself is editorial and
not (yet) machine-checkable beyond the published allowlists.