• ViatorOmnium@piefed.social
      link
      fedilink
      English
      arrow-up
      9
      ·
      7 days ago

      Especially when two of the named languages (German and French) are around 20th in L1 speakers.

      I’m also interested in knowing how they decide what language a URL is in when lots of languages share words, even more so when you remove diacritics like it’s common in URIs. For example, is something like https://example.org/noticia/n-12345.html a Portuguese or Spanish URL?

      • emb@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        7 days ago

        I wonder that too. How to separate cross-language homonyms and nonsense words in URLs?

        For any individual page, I guess you base it on the page content if the URL language is ambiguous. Like anything with language, feels like it’d be fuzzy and hard to determine.

        Not that I necessarily doubt someone has collected the data, just not sure how internet statistics are figured out.