Temporal patterns examine how model training resources have changed over time, how those resources are used efficiently or otherwise, and how public access to a model changes in relation to compute as a training resource.First, how have training compute, parameter count, power draw, training cost, and training time scaled over time, and how do these patterns differ across domains, countries, and organization types?
Over time, it's clear that models have scaled across all of the attributes examined, particularly in terms of training compute required, number of trainable parameters, and training time. It is notably more difficult to examine patterns over time for power draw and training cost, as these metrics were generally not recorded until after 2010.
The deep learning era (2010-2025) contains the highest concentration of model releases, as expected, and also shows an increase in usage for the metrics examined. This implies the evident significance of the deep learning era as an inflection point in AI development, likely due to other factors such as the introduction of graphical processing units (GPUs) for general purpose parallel computing.
With an understanding of the overall scaling of key training resources, we then examine,how has model resource efficiency (i.e. cost per FLOP, cost per parameter, compute per dollar, compute per watt) evolved over time, and do metrics exhibit diminishing returns?
Seeing how overall model scale has grown over time, it is significant to note that the use of these resources required to train AI models has become generally more efficient. In many domains, scaling leads to increased expenses on a per-unit basis. Even in an algorithmic complexity sense, for example, more complex algorithms tend to have worse asymptotic constraints. The fact that larger AI models yield more efficient usage, e.g. lower cost per parameter and cost per FLOP, thus runs counter to traditional intuitions about scaling.
In other words, while most technologies show diminishing returns at high scale, there's no apparent saturation point yet for the large-scale AI models we're currently at. This could suggest that improving models by scaling up is a more efficient path.
Considering compute as a key training resource,how have model accessibility levels changed over time, and how does this correlate with training compute?
Models have changed in how they can be accessed and used by the public, with more institutions offering open weights, i.e. where a model's trained parameters are publicly shared, yet not having access be a significant factor in the scale of a model's compute. This means that some of the largest-scale models in terms of compute that have been trained to date generally have some level of transparency and accessibility. However, given this large scale, it also means that users would need the necessary computational power to actually use these weights efficiently, which may not be feasible.