Most commonly used languages for URLs compared with their share of native speakers worldwide (2025)

Innerworld@lemmy.world · 7 days ago

Most commonly used languages for URLs compared with their share of native speakers worldwide (2025)

emb@lemmy.world · 7 days ago

Reminds me of the stuff on this wiki page: https://en.wikipedia.org/wiki/Languages_used_on_the_Internet

Idk anything about how the data is collected here or there, but it seems like just basing on URL amplifies the English disproportionality.

davidgro@lemmy.world · 7 days ago

Another one with much too much Other.

ViatorOmnium@piefed.social · 7 days ago

Especially when two of the named languages (German and French) are around 20th in L1 speakers.

I’m also interested in knowing how they decide what language a URL is in when lots of languages share words, even more so when you remove diacritics like it’s common in URIs. For example, is something like https://example.org/noticia/n-12345.html a Portuguese or Spanish URL?

emb@lemmy.world · 7 days ago

I wonder that too. How to separate cross-language homonyms and nonsense words in URLs?

For any individual page, I guess you base it on the page content if the URL language is ambiguous. Like anything with language, feels like it’d be fuzzy and hard to determine.

Not that I necessarily doubt someone has collected the data, just not sure how internet statistics are figured out.