How to enable BigDL-LLM on Intel® Arc GPU

rafavi · 发表于 2026-2-3 21:11:40

本帖最后由 rafavi 于 2026-2-3 21:15 编辑

Enabling BigDL-LLM on Intel® Arc™ graphics is a crucial make space unlocking their full potential. To do so, you’ll need the following:

Choose an Intel® Arc GPU: Ensure your server is equipped with an Intel® Arc GPU, such as the Intel® Arc ™ A770 GPU, or you can apply one from Intel Developer Cloud. These GPUs are designed to handle complex AI workloads and are ideal for accelerating LLMs.

Prepare your environment: Review the recommended requirements to install the Intel® oneAPI Base Toolkit and configure oneAPI environment variables as well as other environment variables.

Install the BigDL-LLM library: BigDL-LLM can be easily installed by executing this line of command:

pip install - pre - upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu

复制代码

Accelerating large models with BigDL-LLM is as straightforward as using it on an Intel® laptop — read more about it here or see an example of a LLaMa 2 model, as it is using the BigDL-LLM Transformer-style API which involves changing the model loading part, and the subsequent usage process is identical to native Transformers APIs. The way to load the model using the BigDL-LLM API is almost the same as the Transformers API. Users only need to change the import statement and set load_in_4bit=True in the from_pretrained parameter. You can also use the load_in_low_bit API to support other low-bit types.

# Load Hugging Face Transformers model with int4 optimizations

from bigdl.llm.transformers import AutoModelForCausalL

复制代码

model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True).to("xpu")

复制代码

BigDL-LLM converts the model into 4-bit precision during model loading and optimizes its execution using various software and hardware acceleration techniques in the subsequent inference process.

output = model.generate(input_ids.to("xpu"))

复制代码

output = output.cpu()

复制代码

		自动登录	找回密码
密码			立即注册