• 2 Posts
  • 105 Comments
Joined 2 年前
cake
Cake day: 2023年6月8日

help-circle






  • What I’ve ultimately converged to without any rigorous testing is:

    • using Q6 if it fits in VRAM+RAM (anything higher is a waste of memory and compute for barely any gain), otherwise either some small quant (rarely) or ignoring the model altogether;
    • not really using IQ quants - afair they depend on a dataset and I don’t want the model’s behaviour to be affected by some additional dataset;
    • other than the Q6 thing, in any trade-offs between speed and quality I choose quality - my usage volumes are low and I’d better wait for a good result;
    • I load as much as I can into VRAM, leaving 1-3GB for the system and context.


  • My intuition:

    • There’re “genuine” instances of hapax legomena which probably have some semantic sense, e.g. a rare concept, a wordplay, an artistic invention, an ancient inside joke.
    • There’s various noise because somebody let their cat on the keyboard, because OCR software failed in one small spot, because somebody was copying data using a noisy channel without error correction, because somebody had a headache and couldn’t be bothered, because whatever.
    • Once a dataset is too big to be manually reviewed by experts, the amount of general noise is far far far larger than what you’re looking for. At the same time you can’t differentiate between the two using statistics alone. And if it was manually reviewed, the experts have probably published their findings, or at least told a few colleagues.
    • Transformers are VERY data-hungry. They need enormous datasets.

    So I don’t think this approach will help you a lot even for finding words and phrases. And everything I’ve said can be extended to semantic noise too, so your extended question also seems a hopeless endeavour when approached specifically with LLMs or big data analysis of text.








  • Because we have tons of ground-level sensors, but not a lot in the upper layers of the atmosphere, I think?

    Why is this important? Weather processes are usually modelled as a set of differential equations, and you want to know the border conditions in order to solve them and obtain the state of the entire atmosphere. The atmosphere has two boundaries: the lower, which is the planet’s surface, and the upper, which is where the atmosphere ends. And since we don’t seem to have a lot of data from the upper layers, it reduces the quality of all predictions.