• 2 Posts
  • 925 Comments
Joined 2 年前
cake
Cake day: 2023年7月14日

help-circle
  • I think the best way to handle this would be to just encode everything and upload all files. If I wanted some amount of history, I’d use some file system with automatic snapshots, like ZFS.

    If I wanted to do what you’ve outlined, I would probably use rclone with filtering for the extension types or something along those lines.

    If I wanted to do this with Git specifically, though, this is what I would try first:

    First, add lossless extensions (*.flac, *.wav) to my repo’s .gitignore

    Second, schedule a job on my local machine that:

    1. Watches for changes to the local file system (e.g., with inotifywait or fswatch)
    2. For any new lossless files, if there isn’t already an accompanying lossy files (i.e., identified by being collocated, having the exact same filename, sans extension, with an accepted extension, e.g., .mp3, .ogg - possibly also with a confirmation that the codec is up to my standards with a call to ffprobe, avprobe, mediainfo, exiftool, or something similar), it encodes the file to your preferred lossy format.
    3. Use git status --porcelain to if there have been any changes.
    4. If so, run git add --all && git commit --message "Automatic commit" && git push
    5. Optionally, automatically craft a better commit message by checking which files have been changed, generating text like Added album: "Satin Panthers - EP" by Hudson Mohawke or Removed album: "Brat" by Charli XCX; Added album "Brat and it's the same but there's three more songs so it's not" by Charli XCX

    Third, schedule a job on my remote machine server that runs git pull at regular intervals.

    One issue with this approach is that if you delete a file (as opposed to moving it), the space is not recovered on your local or your server. If space on your server is a concern, you could work around that by running something like the answer here (adjusting the depth to an appropriate amount for your use case):

    git fetch --depth=1
    git reflog expire --expire-unreachable=now --all
    git gc --aggressive --prune=all
    

    Another potential issue is that what I described above involves having an intermediary git to push to and pull from, e.g., running on a hosted Git forge, like GitHub, Codeberg, etc… This could result in getting copyright complaints or something along those lines, though.

    Alternatively, you could use your server as the git server (or check out forgejo if you want a Git forge as well), but then you can’t use the above trick to prune file history and save space from deleted files (on the server, at least - you could on your local, I think). If you then check out your working copy in a way such that Git can use hard links, you should at least be able to avoid needing to store two copies on your server.

    The other thing to check out, if you take this approach, is git lfs. EDIT: Actually, I take that back - you probably don’t want to use Git LFS.




  • It was already known before the whistleblower that:

    1. Siri inputs (all STT at that time, really) were processed off device
    2. Siri had false activations

    The “sinister” thing that we learned was that Apple was reviewing those activations to see if they were false, with the stated intent (as confirmed by the whistleblower) of using them to reduce false activations.

    There are also black box methods to verify that data isn’t being sent and that particular hardware (like the microphone) isn’t being used, and there are people who look for vulnerabilities as a hobby. If the microphones on the most/second most popular phone brand (iPhone, Samsung) were secretly recording all the time, evidence of that would be easy to find and would be a huge scoop - why haven’t we heard about it yet?

    Snowden and Wikileaks dumped a huge amount of info about governments spying, but nothing in there involved always on microphones in our cell phones.

    To be fair, an individual phone is a single compromise away from actually listening to you, so it still makes sense to avoid having sensitive conversations within earshot of a wirelessly connected microphone. But generally that’s not the concern most people should have.

    Advertising tracking is much more sinister and complicated and harder to wrap your head around than “my phone is listening to me” and as a result makes for a much less glamorous story, but there are dozens, if not hundreds or thousands, of stories out there about how invasive advertising companies’ methods are, about how they know too much, etc… Think about what LLMs do with text. The level of prediction that they can do. That’s what ML algorithms can do with your behavior.

    If you’re misattributing what advertisers know about you to the phone listening and reporting back, then you’re not paying attention to what they’re actually doing.

    So yes - be vigilant. Just be vigilant about the right thing.


  • proven by a whistleblower from apple

    Assuming you have an iPhone. And even then, the whistleblower you’re referencing was part of a team who reviewed utterances by users with the “Hey Siri” wake word feature enabled. If you had Siri disabled entirely or had the wake word feature disabled, you weren’t impacted at all.

    This may have been limited to impacting only users who also had some option like “Improve Siri and Dictation” enabled, but it’s not clear. Today, the Privacy Policy explicitly says that Apple can have employees review your interactions with Siri and Dictation (my understanding is the reason for the settlement is that they were not explicit that human review was occurring). I strongly recommend disabling that setting, particularly if you have a wake word enabled.

    If you have wake words enabled on your phone or device, your phone has to listen to be able to react to them. At that point, of course the phone is listening. Whether it’s sending the info back somewhere is a different story, and there isn’t any evidence that I’m aware of that any major phone company does this.


  • Sure - Wikipedia says it better than I could hope to:

    As English-linguist Larry Andrews describes it, descriptive grammar is the linguistic approach which studies what a language is like, as opposed to prescriptive, which declares what a language should be like.[11]: 25  In other words, descriptive grammarians focus analysis on how all kinds of people in all sorts of environments, usually in more casual, everyday settings, communicate, whereas prescriptive grammarians focus on the grammatical rules and structures predetermined by linguistic registers and figures of power. An example that Andrews uses in his book is fewer than vs less than.[11]: 26  A descriptive grammarian would state that both statements are equally valid, as long as the meaning behind the statement can be understood. A prescriptive grammarian would analyze the rules and conventions behind both statements to determine which statement is correct or otherwise preferable. Andrews also believes that, although most linguists would be descriptive grammarians, most public school teachers tend to be prescriptive.[11]: 26










  • From the Slashdot comments, by Rei:

    Or, you can, you know, not fall for clickbait. This is one of those…

    Ultimately, we found that the common understanding of AI’s energy consumption is full of holes.

    “Everyone Else Is Wrong And I Am Right” articles, which starts out with…

    The latest reports show that 4.4% of all the energy in the US now goes toward data centers.

    without bothering to mention that AI is only a small percentage of data centre power consumption (Bitcoin alone is an order of magnitude higher), and…

    In 2017, AI began to change everything. Data centers started getting built with energy-intensive hardware designed for AI, which led them to double their electricity consumption by 2023.

    What a retcon. AI was *nothing* until the early 2020s. Yet datacentre power consumption did start skyrocketing in 2017 - having nothing whatsoever to do with AI. Bitcoin was the big driver.

    At that point, AI alone could consume as much electricity annually as 22% of all US households.

    Let’s convert this from meaningless hype numbers to actual numbers. First off, notice the fast one they just pulled - global AI usage to just the US, and just households. US households use about 1500 TWh of the world’s 24400 TWh/yr, or about 6%. 22% of 6% is ~1,3% of electricity (330 TWh/yr). Electricity is about 20% of global energy, so in this scenario AI would be 0,3% of global energy. We’re just taking at face value their extreme numbers for now (predicting an order of magnitude growth from today’s AI consumption), and ignoring that even a single AI application alone could entirely offset the emissions of all AI combined. Let’s look first at the premises behind what they’re arguing for this 0,3% of global energy usage (oh, I’m sorry, let’s revert to scary numbers: “22% OF US HOUSEHOLDS!”):

    • It’s almost all inference, so that simplifies everything to usage growth
    • But usage growth is offset by the fact that AI efficiency is simultaneously improving at faster than Moore’s Law on three separate axes, which are multiplicative with each other (hardware, inference, and models). You can get what used to take insanely expensive, server-and-power-hungry GPT-4 performance (1,5T parameters) on a model small enough to run on a cell phone that, run on efficient modern servers, finishes its output in a flash. So you have to assume not just one order of magnitude of inference growth (due to more people using AI), but many orders of magnitude of inference growth.   * You can try to Jevon at least part of that away by assuming that people will always want the latest, greatest, most powerful models for their tasks, rather than putting the efficiency gains toward lower costs. But will they? I mean, to some extent, sure. LRMs deal with a lot more tokens than non-LRMs, AI video is just starting to take off, etc. But at the same time, for example, today LRMs work in token space, but in the future they’ll probably just work in latent space, which is vastly more efficient. To be clear, I’m sure Jevon will eat a lot of the gains - but all of them? I’m not so sure about that.   * You need the hardware to actually consume this power. They’re predicting by - three years from now - to have an order of magnitude more hardware out there than all the AI servers combined to this point. Is the production capacity for that huge level of increase in AI silicon actually in the works? I don’t see it.

  • There’s a difference between a tool being available to you and a tool being misused by your students.

    That said, I wouldn’t trust AI assessments of students to determine if they’re on track right now, either. Whatever means the AI would use needs to be better than grading quizzes, homework, etc., and while I’m not a teacher, I would be very surprised if it were better than any halfway competent teacher’s assessments (thinking in terms of high school and younger, at least - in university IME the expectation is that you self assess during the term and it’s up to you to seek out learning opportunities outside class if you need them, like going to office hours for your prof or TA).

    AI isn’t useless, though! It’s just being used wrong. For example, AI can improve OCR, making it more feasible for students to hand in submissions that can be automatically graded, or to improve accessibility for graders. But for that to actually be helpful we need better options on the hardware front and for better integration of those options into grading systems, like affordable batch scanners that you can just drop a stack of 50 assignments into, each a variable number of pages, with software that will automatically sort out the results by assignment and submitter, and automatically organize them into the same place that you put all the digital submissions.




  • Though… If a computer has a real biological brain in it doing the thinking, is it artificial intelligence?

    The person who came up with the Chinese Room Argument argued that if a brain was completely synthetic, even if it were a perfect simulation of a real brain, it would not think - it would not have a genuine understanding of anything, only a simulation of an understanding. I don’t agree (though I would still say it’s “artificial”), but I’ll let you draw your own conclusions.

    From section 4.3:

    Consider a computer that operates in quite a different manner than an AI program with scripts and operations on sentence-like strings of symbols. The Brain Simulator reply asks us to suppose instead the program parallels the actual sequence of nerve firings that occur in the brain of a native Chinese language speaker when that person understands Chinese – every nerve, every firing. Since the computer then works the very same way as the brain of a native Chinese speaker, processing information in just the same way, it will understand Chinese. Paul and Patricia Churchland have set out a reply along these lines, discussed below.

    In response to this, Searle argues that it makes no difference. He suggests a variation on the brain simulator scenario: suppose that in the room the man has a huge set of valves and water pipes, in the same arrangement as the neurons in a native Chinese speaker’s brain. The program now tells the man which valves to open in response to input. Searle claims that it is obvious that there would be no understanding of Chinese. (Note however that the basis for this claim is no longer simply that Searle himself wouldn’t understand Chinese – it seems clear that now he is just facilitating the causal operation of the system and so we rely on our Leibnizian intuition that water-works don’t understand (see also Maudlin 1989).) Searle concludes that a simulation of brain activity is not the real thing.

    However, following Pylyshyn 1980, Cole and Foelber 1984, and Chalmers 1996, we might wonder about gradually transitioning cyborg systems. Pylyshyn writes:

    If more and more of the cells in your brain were to be replaced by integrated circuit chips, programmed in such a way as to keep the input-output function each unit identical to that of the unit being replaced, you would in all likelihood just keep right on speaking exactly as you are doing now except that you would eventually stop meaning anything by it. What we outside observers might take to be words would become for you just certain noises that circuits caused you to make.

    These cyborgization thought experiments can be linked to the Chinese Room. Suppose Otto has a neural disease that causes one of the neurons in his brain to fail, but surgeons install a tiny remotely controlled artificial neuron, a synron, alongside his disabled neuron. The control of Otto’s artificial neuron is by John Searle in the Chinese Room, unbeknownst to both Searle and Otto. Tiny wires connect the artificial neuron to the synapses on the cell-body of his disabled neuron. When his artificial neuron is stimulated by neurons that synapse on his disabled neuron, a light goes on in the Chinese Room. Searle then manipulates some valves and switches in accord with a program. That, via the radio link, causes Otto’s artificial neuron to release neuro-transmitters from its tiny artificial vesicles. If Searle’s programmed activity causes Otto’s artificial neuron to behave just as his disabled natural neuron once did, the behavior of the rest of his nervous system will be unchanged. Alas, Otto’s disease progresses; more neurons are replaced by synrons controlled by Searle. Ex hypothesi the rest of the world will not notice the difference; will Otto? If so, when? And why?

    Under the rubric “The Combination Reply”, Searle also considers a system with the features of all three of the preceding: a robot with a digital brain simulating computer in its aluminum cranium, such that the system as a whole behaves indistinguishably from a human. Since the normal input to the brain is from sense organs, it is natural to suppose that most advocates of the Brain Simulator Reply have in mind such a combination of brain simulation, Robot, and Systems or Virtual Mind Reply. Some (e.g. Rey 1986) argue it is reasonable to attribute intentionality to such a system as a whole. Searle agrees that it would indeed be reasonable to attribute understanding to such an android system – but only as long as you don’t know how it works. As soon as you know the truth – it is a computer, uncomprehendingly manipulating symbols on the basis of syntax, not meaning – you would cease to attribute intentionality to it.