• JakenVeina@lemm.ee
    link
    fedilink
    English
    arrow-up
    28
    arrow-down
    2
    ·
    7 days ago

    It’s far more often stored in a word, so 32-64 bytes, depending on the target architecture. At least in most languages.

    • timhh@programming.dev
      link
      fedilink
      arrow-up
      5
      ·
      edit-2
      6 days ago

      No it isn’t. All statically typed languages I know of use a byte. Which languages store it in an entire 32 bits? That would be unnecessarily wasteful.

      • JakenVeina@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 hours ago

        C, C++, C#, to name the main ones. And quite a lot of languages are compiled similarly to these.

        To be clear, there’s a lot of caveats to the statement, and it depends on architecture as well, but at the end of the day, it’s rare for a byte or bool to be mapped directly to a single byte in memory.

        Say, for example, you have this function…

        public void Foo()
        {
            bool someFlag = false;
            int counter = 0;
        
            ...
        }
        

        The someFlag and counter variables are getting allocated on the stack, and (depending on architecture) that probably means each one is aligned to a 32-bit or 64-bit word boundary, since many CPUs require that for whole-word load and store instructions, or only support a stack pointer that increments in whole words. If the function were to have multiple byte or bool variables allocated, it might be able to pack them together, if the CPU supports single-byte load and store instructions, but the next int variable that follows might still need some padding space in front of it, so that it aligns on a word boundary.

        A very similar concept applies to most struct and object implementations. A single byte or bool field within a struct or object will likely result in a whole word being allocated, so that other variables and be word-aligned, or so that the whole object meets some optimal word-aligned size. But if you have multiple less-than-a-word fields, they can be packed together. C# does this, for sure, and has some mechanisms by which you can customize field packing.

        • timhh@programming.dev
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          3 hours ago

          No, in C and C++ a bool is a byte.

          since many CPUs require that for whole-word load and store instructions

          All modern architectures (ARM, x86 RISC-V) support byte load/store instructions.

          or only support a stack pointer that increments in whole words

          IIRC the stack pointer is usually incremented in 16-byte units. That’s irrelevant though. If you store a single bool on the stack it would be 1 byte for the bool and 15 bytes of padding.

          A single byte or bool field within a struct or object will likely result in a whole word being allocated, so that other variables and be word-aligned

          Again, no. I think you’ve sort of heard about this subject but haven’t really understood it.

          The requirement is that fields are naturally aligned (up to the machine word size). So a byte needs to be byte-aligned, 2-bytes needs to be 2-byte aligned, etc.

          Padding may be inserted to achieve that but that is padding it doesn’t change the size of the actual bool, and it isn’t part of the bool.

          But if you have multiple less-than-a-word fields, they can be packed together.

          They will be, if it fits the alignment requirements. Create a struct with 8 bools. It will take up 8 bytes no matter what your packing setting is. They even give an example:

          If you specify the default packing size, the size of the structure is 8 bytes. The two bytes occupy the first two bytes of memory, because bytes must align on one-byte boundaries.

          They used byte here but it’s the same for bool because a bool is one byte.

          I’m really surprised how common this misconception is.

      • Aux@feddit.uk
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 days ago

        It’s not wasteful, it’s faster. You can’t read one byte, you can only read one word. Every decent compiler will turn booleans into words.

        • timhh@programming.dev
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          2 days ago

          You can’t read one byte

          lol what. You can absolutely read one byte: https://godbolt.org/z/TeTch8Yhd

          On ARM it’s ldrb (load register byte), and on RISC-V it’s lb (load byte).

          Every decent compiler will turn booleans into words.

          No compiler I know of does this. I think you might be getting confused because they’re loaded into registers which are machine-word sized. But in memory a bool is always one byte.

              • Aux@feddit.uk
                link
                fedilink
                English
                arrow-up
                1
                ·
                5 hours ago

                Internally it will still read a whole word. Because the CPU cannot read less than a word. And if you read the ARM article you linked, it literally says so.

                Thus any compiler worth their salt will align all byte variables to words for faster memory access. Unless you specifically disable such behaviour. So yeah, RTFM :)

                • timhh@programming.dev
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  4 hours ago

                  Wrong again. It depends on the CPU. They can absolutely read a single byte and they will do if you’re reading from non-idempotent memory.

                  If you’re reading from idempotent memory they won’t read a byte or a word. They’ll likely read a whole cache line (usually 64 bytes).

                  And if you read the ARM article you linked, it literally says so.

                  Where?

                  Thus any compiler worth their salt will align all byte variables to words for faster memory access.

                  No they won’t because it isn’t faster. The CPU will read the whole cache line that contains the byte.

                  RTFM

                  Well, I would but no manual says that because it’s wrong!