Shamar@feddit.it

Shamar@feddit.it

Machine Learning

machinelearning@lemmy.ml

PostsComments

Shamar@feddit.itEnglish · 5 days ago

A community statement supporting the Open Source Definition (OSD)

osd.fyi

A community statement supporting the Open Source Definition (OSD)

osd.fyi

Shamar@feddit.itEnglish · 5 days ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month ago

How ‘Embeddings’ Encode What Words Mean

www.quantamagazine.org

How ‘Embeddings’ Encode What Words Mean

www.quantamagazine.org

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

New AI model “learns” how to simulate Super Mario Bros. from video footage

arstechnica.com

New AI model “learns” how to simulate Super Mario Bros. from video footage

arstechnica.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o)

huggingface.co

Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o)

huggingface.co

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

It’s Not Intelligent If It Always Halts: A Critical Perspective on Current Approaches to AGI

www.lifeiscomputation.com

It’s Not Intelligent If It Always Halts: A Critical Perspective on Current Approaches to AGI

www.lifeiscomputation.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

The Difference Between Speaking and Thinking

www.theatlantic.com

The Difference Between Speaking and Thinking

www.theatlantic.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

Diffusion Models Are Real-Time Game Engines

gamengen.github.io

Diffusion Models Are Real-Time Game Engines

gamengen.github.io

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%.

github.com

Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%.

github.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

Transformer Explainer

poloclub.github.io

Transformer Explainer

poloclub.github.io

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

Alibaba claims no. 1 spot in AI math models with Qwen2-Math

venturebeat.com

Alibaba claims no. 1 spot in AI math models with Qwen2-Math

venturebeat.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

yboutros@infosec.pub

yboutros@infosec.pubEnglish · 3 months ago

How to convert a positionally encoded predicted embedding from a decoder to its matching token?

yboutros@infosec.pubEnglish · 3 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

New Open-Source AI Image Generator Beats Midjourney, SD3 and Auraflow

decrypt.co

New Open-Source AI Image Generator Beats Midjourney, SD3 and Auraflow

decrypt.co

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

AI models collapse when trained on recursively generated data

www.nature.com

AI models collapse when trained on recursively generated data

www.nature.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months ago

RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing

lmsys.org

RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing

lmsys.org

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months ago

Alibaba's Qwen LLM model leading open source rankings

huggingface.co

Alibaba's Qwen LLM model leading open source rankings

huggingface.co

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 5 months ago

By using the same techniques Google used to solve Go (MTCS and backprop), Llama8B gets 96.7% on math benchmark GSM8K. That’s better than GPT-4, Claude and Gemini, with 200x fewer parameters!

arxiv.org

By using the same techniques Google used to solve Go (MTCS and backprop), Llama8B gets 96.7% on math benchmark GSM8K. That’s better than GPT-4, Claude and Gemini, with 200x fewer parameters!

arxiv.org

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 5 months ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 5 months ago

Mixture of Agents (MoA) leverages several open-source LLM agents to achieve a score of 65.1% on AlpacaEval 2.0

www.together.ai

Mixture of Agents (MoA) leverages several open-source LLM agents to achieve a score of 65.1% on AlpacaEval 2.0

www.together.ai

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 5 months ago

ylai@lemmy.ml

ylai@lemmy.mlEnglish · 5 months ago

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

huggingface.co

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

huggingface.co

ylai@lemmy.mlEnglish · 5 months ago

keepthepace@slrpnk.net

keepthepace@slrpnk.net · 5 months ago

Torrent tracker for open models

aitracker.art

Torrent tracker for open models

aitracker.art

keepthepace@slrpnk.net · 5 months ago

wargreymon@sh.itjust.works

wargreymon@sh.itjust.works · 5 months ago

Can gpt generate a gpt model?

wargreymon@sh.itjust.works · 5 months ago