Tool‑Call Parsing: Safe Multi‑Format Fallback (Tags / JSON / Brackets)

In production, models often intend to call tools but emit non-standard syntax. Strict parsers undercount capability; permissive parsers cause accidental execution. This playbook gives a fail-closed, auditable, multi-format parser design.

Opinionated stance

Your parser is part of your model.

If your evaluation says “the model can’t do tool calling,” first confirm you’re not measuring a brittle parser.

The recent community benchmark explicitly calls out parser sensitivity: some models looked “bad” until the parser recognized their tool syntax; others looked “better” because parsing was overly forgiving. See: https://www.reddit.com/r/LocalLLaMA/comments/1r4ie8z/i_tested_21_small_llms_on_toolcalling_judgment/


Goals

  • Support a primary tool-call format (what you require)
  • Provide safe fallbacks for common non-standard outputs
  • Fail closed: never execute tools unless a candidate passes strict validation
  • Make behavior measurable via telemetry

Two-phase architecture (do not skip)

  1. Extract candidates (string parsing only; no execution)
  2. Validate & normalize (allowlist tool names + schema validation)

Only phase (2) can produce an executable call.


Normalized tool call contract

Everything should normalize into:

{
  "name": "tool_name",
  "arguments": { "k": "v" }
}

Supported formats

Primary: explicit tags

Example:

<tool name="search_web">{"query":"..."}</tool>

Rule: tag payload must be JSON.

Fallback A: strict JSON block

Example:

{"tool":"search_web","arguments":{"query":"..."}}

Rules:

  • output must be exactly a JSON object after trimming
  • parse JSON → validate schema

Fallback B: bracket/function style

Example:

[get_weather(city="Antwerp")]

Rule: treat as candidate only; validate against schema.


Fallback gates (must all pass)

Enable fallbacks only if:

  1. The model explicitly indicates tool intent (need_tool: yes or an internal token like CALL_TOOL).
  2. Candidate snippet is small (e.g., ≤2KB).
  3. Tool name is allowlisted.
  4. Args pass strict schema validation (no guessing).
  5. If multiple candidates are found → fail and request re-emit in primary format.

This is the practical way to balance compatibility with safety.


Telemetry (mandatory)

Log these fields for every tool call:

  • parse_mode: primary|json|bracket|none
  • fallback_used: boolean
  • candidate_count: integer
  • schema_validation: pass|fail + reason
  • tool_result_status: ok|empty|error

Without this, parser improvements will be mistaken for model improvements.


Common pitfalls

  1. Regex-grabbing JSON from long text (high false positives)
  2. Auto-filling missing args (creates “ghost successes”)
  3. No tool allowlist (expands execution surface)
  4. Fallback parsers with no telemetry (destroys interpretability)

References