I don’t think AI is actually that good at summarizing. It doesn’t understand the text and is prone to hallucinate. I wouldn’t trust an AI summary for anything important.
Also search just seems like overkill. If I type in “population of london”, i just want to be taken to a reputable site like wikipedia. I don’t want a guessing machine to tell me.
Other use cases maybe. But there are so many poor uses of AI, it’s hard to take any of it seriously.
I don’t think AI is actually that good at summarizing. It doesn’t understand the text and is prone to hallucinate. I wouldn’t trust an AI summary for anything important.
This right here. Whenever I’ve tried using an LLM to summarize, I spent more time fact-checking it (and finding the inevitable misunderstandings and outright hallucinations—they’re always there for anything of substance!) than I’d spend writing my own damned summary.
There is, however, one use case I’ve found where LLMs work better than alternatives … provided you do due diligence. To put it bluntly, Google Translate and its ilk of similar slop from Bing, Baidu, etc. suck. They are god-awful at translation of anything but straightforward technical writing or the most tediously dull prose. LLMs are far better translators (and can be instructed to highlight cultural artifacts, possible transcription errors, etc.) …
… as long as you back-translate in a separate session to check for hallucination.
Oh, and Google Translate-style translators really suck at Classical Chinese. LLMs do much better (provided you do the back-translation check for hallucination).
If I understand how AI works (predictive models), kinda seems perfectly suited for translating text. Also exactly how I have been using it with Gemini, translate all the memes in ich_iel 🤣. Unironically it works really well, and the only ones that aren’t understandable are cultural not linguistic.
I guess this really depends on the solution you’re working with.
I’ve built a voting system that relays the same query to multiple online and offline LLMs and uses a consensus to complete a task. I chunk a task into smaller more manageable components, and pass those through the system. So one abstract, complex single query becomes a series of simpler asks with a higher chance of success. Is this system perfect? No, but I am not relying on a single LLM to complete it. Deficiencies in one LLM are usually made up for in at least one other LLM, so the system works pretty well. I’ve also reduced the possible kinds of queries down to a much more limited subset, so testing and evaluation of results is easier / possible. This system needs to evaluate the topic and sensitivity of millions of websites. This isn’t something I can do manually, in any reasonable amount of time. A human will be reviewing websites we flag under very specific conditions, but this cuts down on a lot of manual review work.
When I said search, I meant offline document search. Like "find all software patents related to fly-by-wire aircraft embedded control systems” from a folder of patents. Something like elastic search would usually work well here too, but then I can dive further and get it to reason about results surfaced from the first query. I absolutely agree that AI powered search is a shitshow.
I don’t think AI is actually that good at summarizing.
It really depends on the type and size of text you want it to summarize.
For instance, it’ll only give you a very, very simplistic overview of a large research paper that uses technical terms, but if you want to to compress down a bullet point list, or take one paragraph and turn it into some bullet points, it’ll usually do that without any issues.
Edit: I truly don’t understand why I’m getting downvoted for this. LLMs are actually relatively good at summarizing small, low-context-necessary pieces of information into bullet points. They’re quite literally made as code that interprets the likelihood of text based on an input. Giving it a small amount of text to rewrite or recontextualize is one of its best strengths. That’s why it was originally mostly implemented as a tool to reword small isolated sections in articles, emails, and papers, before the technology was improved.
It’s when they get to larger pieces of information, like meetings, books, wikipedia articles, etc, that they begin to break down, due to the nature of the technology itself. (context windows, lack of external resources that humans are able to integrate into their writing, but LLMs can’t fully incorporate on the same level)
But if the text you’re working on is small, you could just do it yourself. You don’t need an expensive guessing machine.
Like, if I built a rube-goldberg machine using twenty rubber ducks, a diesel engine, and a blender to tie my shoes, and it gets it right most of the time, that’s impressive. but also kind of a stupid waste, because I could’ve just tied them with my hands.
Personally, I think that wholly depends on the context.
For example, if someone’s having part of their email rewritten because they feel the tone was a bit off, they’re usually doing that because their own attempts to do so weren’t working for them, and they wanted a secondary… not exactly opinion, since it’s a machine obviously, but at least an attempt that’s outside whatever their brain might currently be locked into trying to do.
I know I’ve gotten stuck for way too long wondering why my writing felt so off, only to have someone give me a quick suggestion that cleared it all up, so I can see how this would be helpful, while also not always being something they can easily or quickly do themselves.
Also, there are legitimately just many use cases for applications using LLMs to parse small pieces of data on behalf of an application better than simple regex equations, for instance.
For example, Linkwarden, a popular open source link management software, (on an opt-in basis) uses LLMs to just automatically tag your links based on the contents of the page. When I’m importing thousands of bookmarks for the first time, even though each individual task is short to do, in terms of just looking at the link and assigning the proper tags, and is not something that takes significant mental effort on its own, I don’t want to do that thousands of times if the LLM will get it done much faster with accuracy that’s good enough for my use case.
I can definitely agree with you in a broader sense though, since at this point I’ve seen people write 2 sentence emails and short comments using AI before, using prompts even longer than the output, and that I can 100% agree is entirely pointless.
It can, but I don’t see that happen often in most places I see it used, at least by the average person, although I will say I’ve deliberately insulated myself a bit from the very AI bro type of people who use it regularly throughout their day, and mostly interact with people who are using it occasionally during research for an assignment, rewriting part of their email, etc, so I recognize that my opinion here might just be influenced by the type of uses I personally see it used for.
In my experience, when it’s used to summarize, say, 4-6 sentences of text, in a general-audience readable text (i.e. not a research paper in a journal) that doesn’t explicitly rely on a high level of context from the rest of the text (e.g. a news article relies on information it doesn’t currently have, so a paragraph out of context would be bad, vs instructions on how to use a tool, which are general knowledge) then it seems to do pretty well, especially within the confines of an existing conversation about the topic where the intent and context has been established already.
For example, a couple months back, I was having a hard time understanding subnetting, but I decided to give it a shot, and by giving it a bit of context on what was tripping me up, it was successfully able to reword and re-explain the topic in such a way that I was able to better understand it, and could then continue researching it.
Broad topic that’s definitely in the training data + doesn’t rely on lots of extra context for the specific example = reasonably good output.
But again, I also don’t frequently interact with the kind of people that like having AI in everything, and am mostly just around very casual users that don’t use it for anything very high stakes or complex, and I’m quite sure that anything more than extremely simple summaries of basic information or very well-known topics would probably have a lot of hallucinations.
See, when I have 4-6 sentences to summarize, I don’t see the value-add of a machine doing the summarizing for me.
Oh I completely understand, I don’t often see it as useful either. I’m just saying that a lot of people I see using LLMs occasionally are usually just shortening their own replies to things, converting a text based list of steps to a numbered list for readability, or just rewording a concept because the original writer didn’t word it in a way their brain could process well, etc.
Things that don’t necessarily require a huge amount of effort on their part, but still save them a little bit of time, which in my conversations with them, seems to prove valuable to them, even if it’s in a small way.
I feel like letting your skills in reading and communicating in writing atrophy is a poor choice. And skills do atrophy without use. I used to be able to read a book and write an essay critically analyzing it. If I tried to do that now, it would be a rough start.
I don’t think people are going to just up and forget how to write, but I do think they’ll get even worse at it if they don’t do it.
However, I think there’s certainly a point at which the usage of a given tool is too small to meaningfully impact your actual retention of a skill, and I do think that when these people are just, say, occasionally firing off an email and they feel like the tone is a bit off, having it partially rewrite it could possibly even help them then do better in the future at changing their tone on their own, so personally I think it’s a bit of a mixed bag.
But of course, when I look at all the people foregoing things like learning programming languages to ask ChatGPT to just vibe code everything for them, then talk about how they’re gonna get a job in tech… yeah, that’s 100% past the point of skills atrophying in my opinion.
Our plant manager likes to use it to summarize meetings (Copilot).
It in fact does not summarize to a bullet point list in any useful way.
Breakes the notes into a headers for each topic then bullet points
The header is a brief summary. The bullet points? The exact same summary but now broken by sentences as individual points.
Truly stunning work. Even better with a “Please review the meeting transcript yourself as AI might not be 100% accurate” disclaimer.
Truely worthless.
That being said, I’ve a few vision systems using an “AI” to recognize product that doesn’t meet the pre taught pattern. It’s very good at this
This is precisely why I don’t think anybody should be using it for meeting summaries. I know someone who does at his job, and even he only uses it for the boring, never acted upon meetings that everyone thinks is unnecessary but the managers think should be done anyways, because it just doesn’t work well enough to justify use on anything even remotely important.
Even just from a purely technical standpoint, the context windows of LLMs are so small relative to the scale of meetings, that they will almost never be able to summarize it in its entirety without repeating points, over-explaining some topics and under-explaining others because it doesn’t have enough external context to judge importance, etc.
But if you give it a single small paragraph from an article, it will probably summarize that small piece of information relatively well, and if you give it something already formatted like bullet points, it can usually combine points without losing much context, because it’s inherently summarizing a small, contextually isolated piece of information.
I think your manager has a skill issue if his output is being badly formatted like that. I’d tell him to include a formatting guideline in his prompt. It won’t solve his issues but I’ll gain some favor. Just gotta make it clear I’m no damn prompt engineer. lol
I didn’t think we should be using it at all, from a security standpoint. Let’s run potentially business critical information through the plagiarism machine that Microsoft has unrestricted access to. So I’m not going to attempt to help make it’s use better at all.
Hopefully if it’s trash enough, it’ll blow over once no one reasonable uses it.
Besides, the man’s derided by production operators and non-kool aid drinking salaried folk
He can keep it up. Lol
Nobody is a “prompt engineer”. There is no such job, for all practical purposes, and can’t be one given that the degenerative AI pushers change their models more often than healthy people change their underwear.
I don’t think AI is actually that good at summarizing. It doesn’t understand the text and is prone to hallucinate. I wouldn’t trust an AI summary for anything important.
Also search just seems like overkill. If I type in “population of london”, i just want to be taken to a reputable site like wikipedia. I don’t want a guessing machine to tell me.
Other use cases maybe. But there are so many poor uses of AI, it’s hard to take any of it seriously.
This right here. Whenever I’ve tried using an LLM to summarize, I spent more time fact-checking it (and finding the inevitable misunderstandings and outright hallucinations—they’re always there for anything of substance!) than I’d spend writing my own damned summary.
There is, however, one use case I’ve found where LLMs work better than alternatives … provided you do due diligence. To put it bluntly, Google Translate and its ilk of similar slop from Bing, Baidu, etc. suck. They are god-awful at translation of anything but straightforward technical writing or the most tediously dull prose. LLMs are far better translators (and can be instructed to highlight cultural artifacts, possible transcription errors, etc.) …
… as long as you back-translate in a separate session to check for hallucination.
Oh, and Google Translate-style translators really suck at Classical Chinese. LLMs do much better (provided you do the back-translation check for hallucination).
If I understand how AI works (predictive models), kinda seems perfectly suited for translating text. Also exactly how I have been using it with Gemini, translate all the memes in ich_iel 🤣. Unironically it works really well, and the only ones that aren’t understandable are cultural not linguistic.
I guess this really depends on the solution you’re working with.
I’ve built a voting system that relays the same query to multiple online and offline LLMs and uses a consensus to complete a task. I chunk a task into smaller more manageable components, and pass those through the system. So one abstract, complex single query becomes a series of simpler asks with a higher chance of success. Is this system perfect? No, but I am not relying on a single LLM to complete it. Deficiencies in one LLM are usually made up for in at least one other LLM, so the system works pretty well. I’ve also reduced the possible kinds of queries down to a much more limited subset, so testing and evaluation of results is easier / possible. This system needs to evaluate the topic and sensitivity of millions of websites. This isn’t something I can do manually, in any reasonable amount of time. A human will be reviewing websites we flag under very specific conditions, but this cuts down on a lot of manual review work.
When I said search, I meant offline document search. Like "find all software patents related to fly-by-wire aircraft embedded control systems” from a folder of patents. Something like elastic search would usually work well here too, but then I can dive further and get it to reason about results surfaced from the first query. I absolutely agree that AI powered search is a shitshow.
It really depends on the type and size of text you want it to summarize.
For instance, it’ll only give you a very, very simplistic overview of a large research paper that uses technical terms, but if you want to to compress down a bullet point list, or take one paragraph and turn it into some bullet points, it’ll usually do that without any issues.
Edit: I truly don’t understand why I’m getting downvoted for this. LLMs are actually relatively good at summarizing small, low-context-necessary pieces of information into bullet points. They’re quite literally made as code that interprets the likelihood of text based on an input. Giving it a small amount of text to rewrite or recontextualize is one of its best strengths. That’s why it was originally mostly implemented as a tool to reword small isolated sections in articles, emails, and papers, before the technology was improved.
It’s when they get to larger pieces of information, like meetings, books, wikipedia articles, etc, that they begin to break down, due to the nature of the technology itself. (context windows, lack of external resources that humans are able to integrate into their writing, but LLMs can’t fully incorporate on the same level)
But if the text you’re working on is small, you could just do it yourself. You don’t need an expensive guessing machine.
Like, if I built a rube-goldberg machine using twenty rubber ducks, a diesel engine, and a blender to tie my shoes, and it gets it right most of the time, that’s impressive. but also kind of a stupid waste, because I could’ve just tied them with my hands.
Personally, I think that wholly depends on the context.
For example, if someone’s having part of their email rewritten because they feel the tone was a bit off, they’re usually doing that because their own attempts to do so weren’t working for them, and they wanted a secondary… not exactly opinion, since it’s a machine obviously, but at least an attempt that’s outside whatever their brain might currently be locked into trying to do.
I know I’ve gotten stuck for way too long wondering why my writing felt so off, only to have someone give me a quick suggestion that cleared it all up, so I can see how this would be helpful, while also not always being something they can easily or quickly do themselves.
Also, there are legitimately just many use cases for applications using LLMs to parse small pieces of data on behalf of an application better than simple regex equations, for instance.
For example, Linkwarden, a popular open source link management software, (on an opt-in basis) uses LLMs to just automatically tag your links based on the contents of the page. When I’m importing thousands of bookmarks for the first time, even though each individual task is short to do, in terms of just looking at the link and assigning the proper tags, and is not something that takes significant mental effort on its own, I don’t want to do that thousands of times if the LLM will get it done much faster with accuracy that’s good enough for my use case.
I can definitely agree with you in a broader sense though, since at this point I’ve seen people write 2 sentence emails and short comments using AI before, using prompts even longer than the output, and that I can 100% agree is entirely pointless.
Even there it will hallucinate. Or it will get confused by some complicated sentences and reverse the conclusion.
It can, but I don’t see that happen often in most places I see it used, at least by the average person, although I will say I’ve deliberately insulated myself a bit from the very AI bro type of people who use it regularly throughout their day, and mostly interact with people who are using it occasionally during research for an assignment, rewriting part of their email, etc, so I recognize that my opinion here might just be influenced by the type of uses I personally see it used for.
In my experience, when it’s used to summarize, say, 4-6 sentences of text, in a general-audience readable text (i.e. not a research paper in a journal) that doesn’t explicitly rely on a high level of context from the rest of the text (e.g. a news article relies on information it doesn’t currently have, so a paragraph out of context would be bad, vs instructions on how to use a tool, which are general knowledge) then it seems to do pretty well, especially within the confines of an existing conversation about the topic where the intent and context has been established already.
For example, a couple months back, I was having a hard time understanding subnetting, but I decided to give it a shot, and by giving it a bit of context on what was tripping me up, it was successfully able to reword and re-explain the topic in such a way that I was able to better understand it, and could then continue researching it.
Broad topic that’s definitely in the training data + doesn’t rely on lots of extra context for the specific example = reasonably good output.
But again, I also don’t frequently interact with the kind of people that like having AI in everything, and am mostly just around very casual users that don’t use it for anything very high stakes or complex, and I’m quite sure that anything more than extremely simple summaries of basic information or very well-known topics would probably have a lot of hallucinations.
See, when I have 4-6 sentences to summarize, I don’t see the value-add of a machine doing the summarizing for me.
(Note: the above sentence is literally a summary of about a dozen sentences I wrote elsewhere that contained more details.)
Oh I completely understand, I don’t often see it as useful either. I’m just saying that a lot of people I see using LLMs occasionally are usually just shortening their own replies to things, converting a text based list of steps to a numbered list for readability, or just rewording a concept because the original writer didn’t word it in a way their brain could process well, etc.
Things that don’t necessarily require a huge amount of effort on their part, but still save them a little bit of time, which in my conversations with them, seems to prove valuable to them, even if it’s in a small way.
I feel like letting your skills in reading and communicating in writing atrophy is a poor choice. And skills do atrophy without use. I used to be able to read a book and write an essay critically analyzing it. If I tried to do that now, it would be a rough start.
I don’t think people are going to just up and forget how to write, but I do think they’ll get even worse at it if they don’t do it.
I definitely agree.
However, I think there’s certainly a point at which the usage of a given tool is too small to meaningfully impact your actual retention of a skill, and I do think that when these people are just, say, occasionally firing off an email and they feel like the tone is a bit off, having it partially rewrite it could possibly even help them then do better in the future at changing their tone on their own, so personally I think it’s a bit of a mixed bag.
But of course, when I look at all the people foregoing things like learning programming languages to ask ChatGPT to just vibe code everything for them, then talk about how they’re gonna get a job in tech… yeah, that’s 100% past the point of skills atrophying in my opinion.
Our plant manager likes to use it to summarize meetings (Copilot). It in fact does not summarize to a bullet point list in any useful way. Breakes the notes into a headers for each topic then bullet points The header is a brief summary. The bullet points? The exact same summary but now broken by sentences as individual points. Truly stunning work. Even better with a “Please review the meeting transcript yourself as AI might not be 100% accurate” disclaimer.
Truely worthless.
That being said, I’ve a few vision systems using an “AI” to recognize product that doesn’t meet the pre taught pattern. It’s very good at this
This is precisely why I don’t think anybody should be using it for meeting summaries. I know someone who does at his job, and even he only uses it for the boring, never acted upon meetings that everyone thinks is unnecessary but the managers think should be done anyways, because it just doesn’t work well enough to justify use on anything even remotely important.
Even just from a purely technical standpoint, the context windows of LLMs are so small relative to the scale of meetings, that they will almost never be able to summarize it in its entirety without repeating points, over-explaining some topics and under-explaining others because it doesn’t have enough external context to judge importance, etc.
But if you give it a single small paragraph from an article, it will probably summarize that small piece of information relatively well, and if you give it something already formatted like bullet points, it can usually combine points without losing much context, because it’s inherently summarizing a small, contextually isolated piece of information.
I think your manager has a skill issue if his output is being badly formatted like that. I’d tell him to include a formatting guideline in his prompt. It won’t solve his issues but I’ll gain some favor. Just gotta make it clear I’m no damn prompt engineer. lol
I didn’t think we should be using it at all, from a security standpoint. Let’s run potentially business critical information through the plagiarism machine that Microsoft has unrestricted access to. So I’m not going to attempt to help make it’s use better at all. Hopefully if it’s trash enough, it’ll blow over once no one reasonable uses it. Besides, the man’s derided by production operators and non-kool aid drinking salaried folk He can keep it up. Lol
Okay, then self host an open model. Solves all of the problems you highlighted.
Or, you know, don’t use LLMs. That also solves all those problems too, costs less, and won’t hallucinate your way into lawsuits or whatever.
Nobody is a “prompt engineer”. There is no such job, for all practical purposes, and can’t be one given that the degenerative AI pushers change their models more often than healthy people change their underwear.
Right, I just don’t want him to think that, or he’d have me tailor the prompts for him and give him an opportunity to micromanage me.