IMICostPulse
Insight. Driving. Profit.

Overview

AI token usage & cost intelligence

Total Cost

last 30 days

Events

Asks

Input Tokens

Output Tokens

Error Rate

avg 0ms

Daily Cost (USD)

No data yet — start sending events to the ingest API

Top Models by Cost

No data

Pipeline Optimization Recommendations

Ranked by estimated cost savings across your Pulse AI pipeline

Critical

interpret-narrative

claude-sonnet-4-6

92%

est. savings

Accounts for 68% of total pipeline spend. Sonnet 4.6 at $0.003/$0.015 per 1K tokens is your dominant cost driver — 3.2K tokens per invocation.

Recommended Action

Fine-tune a 7B local model (Mistral / Qwen) on gold narrative examples. Deploy via llama.cpp on local hardware. Target cost: $0/call.

Critical

vanna-sql

gpt-4o

95%

est. savings

gpt-4o at $0.005/$0.015 per 1K with 4K+ context window makes this the highest per-event cost in the heavy pipeline variant.

Recommended Action

Replace with SQLCoder-7B fine-tuned on your schema and query history. Local inference eliminates this cost entirely. Phase 1 priority.

High

methodology

claude-sonnet-4-6

82%

est. savings

Second-largest cost step at 2K+ input tokens. Strong candidate for immediate model downgrade before full local replacement in Phase 2.

Recommended Action

Migrate to Haiku 4.5 or gpt-4o-mini now — 83% cost reduction with no fine-tuning. Queue local replacement after narrative baseline is proven.

Medium

matcher-dispatch

gpt-4o-mini

40%

est. savings

Many queries are semantically similar. A pgvector similarity cache at 0.92 cosine threshold could intercept 40–60% of calls before they hit the API.

Recommended Action

Add semantic response caching layer using pgvector. Eliminates redundant API calls on repeat brand methodology dispatches.