For Tolkien’s work, there is the twelve volume “The Complete History of Middle Earth” which is about as inside baseball as you can get for Tolkien.
I’d replace HoME with Parma Eldalamberon, Vinyar Tengwar and other journals publishing his early materials here.
My intuition:
So I don’t think this approach will help you a lot even for finding words and phrases. And everything I’ve said can be extended to semantic noise too, so your extended question also seems a hopeless endeavour when approached specifically with LLMs or big data analysis of text.