I’ve just published an Opencode plugin I made since I couldn’t find one that fulfilled the exact use case I had myself. Publishing and posting in case it’s useful for someone else too.

When working with local models I have the need to know why there’s suddenly nothing happening (besides the blue cylon-bar) but the plugins I found only showed token generation data when something was passed into Opencode. That meant that during >1 minute prefills there was no output at all.

This plugin uses the /slots endpoint (enabled by default) in llama-server to deduce whether it’s currently generating tokens or doing prompt processing, and also the current tps for that activity. Now I can just run llama-server as a daemon and I no longer feel the need to go inspect its output just to see what’s up.

It’s likely only useful in a single-user scenario, but it has been tested with both single and multiple parallel slots.

Installation:

opencode plugin @troed/oc-ls-stats@latest --global

  • Schilling2304
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 day ago

    Yesterday I needed this. Will install this. Thanks.

    May I ask: have you noticed if the prompt processing speeds shown in llama-bench are vastly different from llama-server ? I have hundreds of tokens of difference.