训练一个大型语言模型需要多少 GPU 小时（LLM）？-HQY 一个和谐有爱的空间

21

2025
03
01:16:40

训练一个大型语言模型需要多少 GPU 小时（LLM）？

In 2023, language models are taking center stage. Many new models have been developed from scratch, and thousands of variations of these models have been fine-tuned on domain sets and made available to the public. Generative AI is a hot topic, and more and more companies are interested in creating their own models. One way to do this is by fine-tuning one of the existing open models, such as BLOOM, LLaMA, LLaMA 2, or Mistral, or by training a new model from scratch. However, training a new model from scratch can be expensive. To get an idea of how much computational resources are required, we have gathered information about the training time of the LLM models.
2023 年，语言模型将成为中心舞台。许多新模型都是从头开始开发的，这些模型的数千种变体已经在域集上进行了微调并向公众提供。生成式人工智能是一个热门话题，越来越多的公司对创建自己的模型感兴趣。一种方法是微调现有的开放模型之一，例如 BLOOM、LLaMA、LLaMA 2 或 Mistral，或者从头开始训练新模型。但是，从头开始训练新模型可能很昂贵。为了了解需要多少计算资源，我们收集了有关LLM模型训练时间的信息。

The chart below shows the total GPU training time for each model. If a model was trained on 100 GPUs for 2 hours, then the total GPU time is counted as 100*2=200 hours.
下图显示了每个模型的总 GPU 训练时间。如果一个模型在 100 个 GPU 上训练了 2 小时，则 GPU 总时间计为 100*2=200 小时。

The chart includes only the models for which information about their training time was publicly available. It does not contain information about ChatGPT&zhida_source=entity" target="_blank" style="text-decoration-line: none; color: rgb(9, 64, 142); cursor: pointer;">ChatGPT, GPT-4, or Mistral. If you know of any sources that describe the training time of another model, please share them in the comments.
该图表仅包括有关其训练时间信息公开的模型。它不包含有关 ChatGPT、GPT-4 或 Mistral 的信息。如果您知道任何描述另一个模型训练时间的来源，请在评论中分享。

BLOOM (2022, July) BLOOM（2022 年 7 月）

384 (NVIDIA A100 80GB GPUs) [1] * 3.5 (months) = 999,936 GPU hours
384 （NVIDIA A100 80GB GPU） [1] * 3.5（个月）= 999,936 GPU 小时

https://huggingface.co/blog/bloom-megatron-deepspeed

PaLM (2023, March) PaLM（2023 年 3 月）

6144 (v4 TPU)[1] * 1200 (hours)[1] * 1.2 (TPU/GPU)[2] = 8,847,360 GPU hours
6144 （v4 TPU）[1] * 1200（小时）[1] * 1.2 （TPU/GPU）[2] = 8,847,360 GPU 小时

LLaMa 1 (2023, February) LLaMa 1（2023 年 2 月）

LLaMA-7B — 82,432 GPU hours [1]
LLaMA-7B — 82,432 GPU 小时 [1]
LLaMA-13B — 135,168 GPU hours [1]
LLaMA-13B — 135,168 个 GPU 小时 [1]
LLaMA-33B — 530,432 GPU hours [1]
LLaMA-33B — 530,432 个 GPU 小时 [1]
LLaMA-65B — 1,022,362 GPU hours [1]
LLaMA-65B — 1,022,362 GPU 小时 [1]

https://arxiv.org/abs/2302.13971

BloomberGPT (2023, May) BloomberGPT（2023 年 5 月）

512 (A100 GB GPU 40GB)[2] * 49 (days)[2] = 1,300,00 GPU hours
512 （A100 GB GPU 40GB）[2] * 49（天）[2] = 1,300,00 GPU 小时

LLaMa 2 (2023, July) LLaMa 2（2023 年 7 月）

Llama 2 7B — 184,320 GPU hours [1]
Llama 2 7B — 184,320 GPU 小时 [1]
Llama 2 13B — 368,640 GPU hours [1]
Llama 2 13B — 368,640 GPU 小时 [1]
Llama 2 34B — 1,038,336 GPU hours [1]
Llama 2 34B — 1,038,336 GPU 小时 [1]
Llama 2 70B — 1,720,320 GPU hours [1]
Llama 2 70B — 1,720,320 GPU 小时 [1]

https://arxiv.org/abs/2307.0928

推荐本站淘宝优惠价购买喜欢的宝贝:

本文链接：https://hqyman.cn/post/9553.html 非本站原创文章欢迎转载，原创文章需保留本站地址！

分享到：

休息一下~~

作者:hqy | 分类:编程&AI | 浏览:153 | 评论:0

« 上一篇下一篇 »

发表评论:

◎欢迎参与讨论，请在这里发表您的看法、交流您的观点。

« 2025年7月 »
一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

本站推荐小工具: MSDN ISO 磁力地址版本1

MSDN ISO 磁力地址版本2

Windows系统下载仓储站

微软原版软件官方镜像下载列表

Windows kms激活

Office kms激活

VMware ESXi8.0 补丁免费下载

Vmpatch镜像站

领淘宝优惠券

在线小工具

BING精品壁纸图片

360精品4K壁纸图片，每日词霸

Today今日热点

Unlock Music 音乐解锁 (React)

在线查IP |WhatIsMyIPAddress

在线下载测速

百家姓暗号

在线fc小游戏

圈住猫的游戏

2048的游戏

找色差小游戏

今天吃什么呢

毒鸡汤网页

每日笑话精选

在线查QQ价值

html在线编辑预览器

在线it-tools工具箱

在线it-tools工具箱(备站)

微软密钥在线检测

在线WinXP虚拟机

whois在线查询

开发者资源的宝库

Linux工具箱一键脚本

Linux工具箱

城通网盘

宝塔服务器面板

阿里云特价VPS服务器

腾讯云特价VPS服务器

华为云特价VPS服务器

京东云特价VPS服务器

cloudcone特价VPS服务器

racknerd特价VPS服务器

恒创VPS特价服务器

简云免费虚拟云主机

PrivacyPolicy

控制面板: 您好，欢迎到访网站！
登录后台查看权限
个人中心修改密码

随心随性: 沧海月明珠有泪，蓝田日暖玉生烟。

网站分类

搜索

最新留言

文章归档

网站收藏

一个和谐有爱的空间

友情链接

请先登录再评论，若不是会员请先注册！

您的IP地址是: