找回密码
 立即注册

QQ登录

只需一步,快速开始

搜索
热搜: 活动 交友 discuz
查看: 33|回复: 0

How to enable BigDL-LLM on Intel® Arc GPU

[复制链接]

115

主题

75

回帖

764

积分

高级会员

积分
764
发表于 2026-2-3 21:11:40 | 显示全部楼层 |阅读模式
本帖最后由 rafavi 于 2026-2-3 21:15 编辑

Enabling BigDL-LLM on Intel® Arc™ graphics is a crucial make space unlocking their full potential. To do so, you’ll need the following:


Choose an Intel® Arc GPU: Ensure your server is equipped with an Intel® Arc GPU, such as the Intel® Arc ™ A770 GPU, or you can apply one from Intel Developer Cloud. These GPUs are designed to handle complex AI workloads and are ideal for accelerating LLMs.

Prepare your environment: Review the recommended requirements to install the Intel® oneAPI Base Toolkit and configure oneAPI environment variables as well as other environment variables.

Install the BigDL-LLM library: BigDL-LLM can be easily installed by executing this line of command:

  1. pip install - pre - upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
复制代码

Accelerating large models with BigDL-LLM is as straightforward as using it on an Intel® laptop — read more about it here or see an example of a LLaMa 2 model, as it is using the BigDL-LLM Transformer-style API which involves changing the model loading part, and the subsequent usage process is identical to native Transformers APIs. The way to load the model using the BigDL-LLM API is almost the same as the Transformers API. Users only need to change the import statement and set load_in_4bit=True in the from_pretrained parameter. You can also use the load_in_low_bit API to support other low-bit types.


# Load Hugging Face Transformers model with int4 optimizations

  1. from bigdl.llm.transformers import AutoModelForCausalL
复制代码
  1. model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True).to("xpu")
复制代码

BigDL-LLM converts the model into 4-bit precision during model loading and optimizes its execution using various software and hardware acceleration techniques in the subsequent inference process.


  1. output = model.generate(input_ids.to("xpu"))
复制代码
  1. output = output.cpu()
复制代码

回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则


QQ|Archiver|手机版|小黑屋|Bluetooth-UWB 联盟论坛 ( 京ICP备19003900号-5 )

GMT+8, 2026-2-13 22:14 , Processed in 0.017406 second(s), 26 queries .

Powered by Discuz! X3.5

© 2001-2025 Discuz! Team.

快速回复 返回顶部 返回列表