r/mlscaling

7.3K members

View on Reddit

r/mlscaling is a subreddit with 7.3K members. Its distinguishing qualities are that the community is size-medium.

ML/AI/DL research on approaches using large models, datasets, and compute: "more is different"

Popular Topics in r/mlscaling

null

: "AlphaGeometry: An Olympiad-level AI system for geometry"

17 posts

: "MambaByte: Token-free Selective State Space Model"

1 posts

Hardware, FB

: "Zuckerberg: "...[W]e're building massive compute infrastructure to support our future roadmap, including 350k H100s by the end of this year -- and overall almost 600k H100s equivalents of compute if you include other GPUs""

1 posts

OP, Hist, Hardware, RL

: "Minsky on abandoning DL in 1952: "I decided either this was a bad idea or it'd take thousands/millions of neurons to make it work, & I couldn’t afford to try to build a machine like that.""

1 posts

T, Econ, Emp

: ""Estimating efficiency improvements in LLM pre-training", Daan (how much do all the improvements to GPT-3-style LLM training stack up to? >400x?)"

1 posts

R, T, RL

: "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models"

1 posts

N, RNN, Code, MD

: "🦅 Eagle 7B : Soaring past Transformers with 1 Trillion Tokens Across 100+ Languages"

1 posts

Smol

: "Chess-GPT, 1000x smaller than GPT-4, plays 1500 ELO chess. We can visualize its internal board state, and it accurately estimates the ELO rating of the players in a game."

1 posts

R, Emp, Code, MD

: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model"

1 posts

#10

R, T, Emp, Data

: ""I am a Strange Dataset: Metalinguistic Tests for Language Models", Thrush et al 2024 (only GPT-4 beats chance; parameter-scaling)"

1 posts

#11

Hist

: "Two very interesting articles by Yuxi Liu on historical resistance to connectionism and scaling"

1 posts

#12

: ""When Might AI Outsmart Us? It Depends Who You Ask", TIME"

1 posts

#13

T, R, Emp

: ""Large Language Models Struggle to Learn Long-Tail Knowledge, Kandpal et al 2022 (BLOOM models show smooth log-scaling of memorization of long-tail knowledge & larger models more sample-efficient)"

1 posts

#14

R, Theory

: ""What's Hidden in a Randomly Weighted Neural Network?", Ramanujan et al 2019 (even random nets contain, with increasing probability in size, an accurate sub-net)"

1 posts

#15

OP, Forecast, RL

: ""My AI Timelines Have Sped Up (Again) [since 2020]", Alex Irpan"

1 posts

#16

Emp, R, T, OA

: ""The Effect of Sampling Temperature on Problem Solving in Large Language Models", Renze & Guven 2024 (Johns Hopkins) (changes in temperature in the range 0.0 to 1.0 do not have a statistically significant impact on LLM performance for problem-solving tasks)"

1 posts

#17

Data, R

: ""TabLib: A Dataset Of 627 Million Tables With Context", Eggert et al 2023 (69TB + 0.87t tokens descriptions)"

1 posts

#18

R, T, Emp, MD, Code

: "Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data - ("We highlight the value of data scaling-up of massive, cheap, and diverse unlabeled images for MDE.")"

1 posts

#19

R, Theory, Forecast

: ""AI capabilities can be significantly improved without expensive retraining" - survey and analysis of post-training enhancements"

1 posts

#20

RL, T, Safe, Theory, Emp, Code

: "Direct Preference Optimization: Your Language Model is Secretly a Reward Model"

1 posts

#21

R, T, A, RL, Safe

: ""Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training", Hubinger et al 2024 (larger models better at hiding backdoors from safety training)"

1 posts

#22

R, T, Emp

: ""MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts", Lu et al., 2023"

1 posts

#23

R, T, MD, Code, FB

: "[N] Meta open-sourced a wav2vec2 model pre-trained on 4.5M hours (Barrault et al 2023)"

1 posts

#24

R, T, Emp, Bio, Smol

: ""Are Vision Transformers More Data Hungry Than Newborn [Chick] Visual Systems?", Pandey et al 2023"

1 posts

#25

RL, Emp, Code, MD

: ""TD-MPC2: Scalable, Robust World Models for Continuous Control" - Scaling behavior again noted in model-based reinforcement learning"

1 posts

#26

OP, N, D

: ""Learning human actions on computer applications" {rabbit} - ("We share the view that the scaling law continues to permeate all aspects of neural systems research .... We hope to continue this trend with our action model ....")"

1 posts

#27

Smol, Code, Hist, MLP

: ""Neural Network on a Commodore 64", Walker 1987"

1 posts

#28

MD, Emp, R, T, MLP

: "Scalable Pre-training of Large Autoregressive Image Models"

1 posts

#29

Hist, R, MLP, Hardware

: ""Large-scale Deep Unsupervised Learning using Graphics Processors", Raina et al 2009"

1 posts

#30

R, T, MLP, Emp, G

: ""The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers", Li et al 2022 (MLP/Transformer increasingly sparse with scale)"

1 posts

#31

: "What's the largest existing LLM that an individual can feasibly run privately?"

1 posts

#32

Hardware

: "Fastest implementation of Mixtral 8x7b-32k"

1 posts

#33

N, Hardware

: ""China's military and government acquire [a very few] Nvidia chips despite US ban""

1 posts

#34

Forecast, R, Econ

: ""Thousands of AI Authors on the Future of AI", Grace et al 2024 {AIImpacts}"

1 posts

#35

Forecast

: "What do you think about Yann Lecun's controversial opinions about ML?"

1 posts

Member Growth in r/mlscaling

Daily

+3 members(0.0%)

Monthly

+285 members(4.0%)

Yearly

+4K members(154.6%)

Similar Subreddits to r/mlscaling

r/mlscaling

Popular Topics in r/mlscaling

Popular Posts in r/mlscaling

Member Growth in r/mlscaling

Similar Subreddits to r/mlscaling

r/aipromptprogramming

r/artificial

r/singularity

r/LocalLLaMA

r/Automate

r/ArtificialInteligence

r/LanguageTechnology

r/GPT3

r/LLMDevs

r/MachineLearning