r/mlscaling

r/mlscaling

7.3K members
r/mlscaling is a subreddit with 7.3K members. Its distinguishing qualities are that the community is size-medium.
ML/AI/DL research on approaches using large models, datasets, and compute: "more is different"

Popular Topics in r/mlscaling

#1
null
: "AlphaGeometry: An Olympiad-level AI system for geometry"
17 posts
#2
R
: "MambaByte: Token-free Selective State Space Model"
1 posts
#3
Hardware, FB
: "Zuckerberg: "...[W]e're building massive compute infrastructure to support our future roadmap, including 350k H100s by the end of this year -- and overall almost 600k H100s equivalents of compute if you include other GPUs""
1 posts
#4
OP, Hist, Hardware, RL
: "Minsky on abandoning DL in 1952: "I decided either this was a bad idea or it'd take thousands/millions of neurons to make it work, & I couldn’t afford to try to build a machine like that.""
1 posts
#5
T, Econ, Emp
: ""Estimating efficiency improvements in LLM pre-training", Daan (how much do all the improvements to GPT-3-style LLM training stack up to? >400x?)"
1 posts
#6
R, T, RL
: "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models"
1 posts
#7
N, RNN, Code, MD
: "🦅 Eagle 7B : Soaring past Transformers with 1 Trillion Tokens Across 100+ Languages"
1 posts
#8
Smol
: "Chess-GPT, 1000x smaller than GPT-4, plays 1500 ELO chess. We can visualize its internal board state, and it accurately estimates the ELO rating of the players in a game."
1 posts
#9
R, Emp, Code, MD
: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model"
1 posts
#10
R, T, Emp, Data
: ""I am a Strange Dataset: Metalinguistic Tests for Language Models", Thrush et al 2024 (only GPT-4 beats chance; parameter-scaling)"
1 posts
#11
Hist
: "Two very interesting articles by Yuxi Liu on historical resistance to connectionism and scaling"
1 posts
#12
OP
: ""When Might AI Outsmart Us? It Depends Who You Ask", TIME"
1 posts
#13
T, R, Emp
: ""Large Language Models Struggle to Learn Long-Tail Knowledge, Kandpal et al 2022 (BLOOM models show smooth log-scaling of memorization of long-tail knowledge & larger models more sample-efficient)"
1 posts
#14
R, Theory
: ""What's Hidden in a Randomly Weighted Neural Network?", Ramanujan et al 2019 (even random nets contain, with increasing probability in size, an accurate sub-net)"
1 posts
#15
OP, Forecast, RL
: ""My AI Timelines Have Sped Up (Again) [since 2020]", Alex Irpan"
1 posts
#16
Emp, R, T, OA
: ""The Effect of Sampling Temperature on Problem Solving in Large Language Models", Renze & Guven 2024 (Johns Hopkins) (changes in temperature in the range 0.0 to 1.0 do not have a statistically significant impact on LLM performance for problem-solving tasks)"
1 posts
#17
Data, R
: ""TabLib: A Dataset Of 627 Million Tables With Context", Eggert et al 2023 (69TB + 0.87t tokens descriptions)"
1 posts
#18
R, T, Emp, MD, Code
: "Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data - ("We highlight the value of data scaling-up of massive, cheap, and diverse unlabeled images for MDE.")"
1 posts
#19
R, Theory, Forecast
: ""AI capabilities can be significantly improved without expensive retraining" - survey and analysis of post-training enhancements"
1 posts
#20
RL, T, Safe, Theory, Emp, Code
: "Direct Preference Optimization: Your Language Model is Secretly a Reward Model"
1 posts
#21
R, T, A, RL, Safe
: ""Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training", Hubinger et al 2024 (larger models better at hiding backdoors from safety training)"
1 posts
#22
R, T, Emp
: ""MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts", Lu et al., 2023"
1 posts
#23
R, T, MD, Code, FB
: "[N] Meta open-sourced a wav2vec2 model pre-trained on 4.5M hours (Barrault et al 2023)"
1 posts
#24
R, T, Emp, Bio, Smol
: ""Are Vision Transformers More Data Hungry Than Newborn [Chick] Visual Systems?", Pandey et al 2023"
1 posts
#25
RL, Emp, Code, MD
: ""TD-MPC2: Scalable, Robust World Models for Continuous Control" - Scaling behavior again noted in model-based reinforcement learning"
1 posts
#26
OP, N, D
: ""Learning human actions on computer applications" {rabbit} - ("We share the view that the scaling law continues to permeate all aspects of neural systems research .... We hope to continue this trend with our action model ....")"
1 posts
#27
Smol, Code, Hist, MLP
: ""Neural Network on a Commodore 64", Walker 1987"
1 posts
#28
MD, Emp, R, T, MLP
: "Scalable Pre-training of Large Autoregressive Image Models"
1 posts
#29
Hist, R, MLP, Hardware
: ""Large-scale Deep Unsupervised Learning using Graphics Processors", Raina et al 2009"
1 posts
#30
R, T, MLP, Emp, G
: ""The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers", Li et al 2022 (MLP/Transformer increasingly sparse with scale)"
1 posts
#31
D
: "What's the largest existing LLM that an individual can feasibly run privately?"
1 posts
#32
Hardware
: "Fastest implementation of Mixtral 8x7b-32k"
1 posts
#33
N, Hardware
: ""China's military and government acquire [a very few] Nvidia chips despite US ban""
1 posts
#34
Forecast, R, Econ
: ""Thousands of AI Authors on the Future of AI", Grace et al 2024 {AIImpacts}"
1 posts
#35
Forecast
: "What do you think about Yann Lecun's controversial opinions about ML?"
1 posts

Popular Posts in r/mlscaling

Member Growth in r/mlscaling

Daily
+3 members(0.0%)
Monthly
+285 members(4.0%)
Yearly
+4K members(154.6%)

Similar Subreddits to r/mlscaling

/r/aipromptprogramming

r/aipromptprogramming

20K members
5.15% / mo
/r/artificial

r/artificial

709K members
4.65% / mo
/r/singularity

r/singularity

2M members
7.52% / mo
/r/LocalLLaMA

r/LocalLLaMA

117K members
11.42% / mo
/r/Automate

r/Automate

131K members
0.66% / mo
/r/ArtificialInteligence

r/ArtificialInteligence

415K members
10.10% / mo
/r/LanguageTechnology

r/LanguageTechnology

46K members
0.81% / mo
/r/GPT3

r/GPT3

709K members
6.00% / mo
/r/LLMDevs

r/LLMDevs

5K members
12.50% / mo
/r/MachineLearning

r/MachineLearning

3M members
0.30% / mo