本帖最后由 rafavi 于 2026-2-3 21:15 编辑
Enabling BigDL-LLM on Intel® Arc™ graphics is a crucial make space unlocking their full potential. To do so, you’ll need the following:
Choose an Intel® Arc GPU: Ensure your server is equipped with an Intel® Arc GPU, such as the Intel® Arc ™ A770 GPU, or you can apply one from Intel Developer Cloud. These GPUs are designed to handle complex AI workloads and are ideal for accelerating LLMs. Prepare your environment: Review the recommended requirements to install the Intel® oneAPI Base Toolkit and configure oneAPI environment variables as well as other environment variables. Install the BigDL-LLM library: BigDL-LLM can be easily installed by executing this line of command: - pip install - pre - upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
复制代码
Accelerating large models with BigDL-LLM is as straightforward as using it on an Intel® laptop — read more about it here or see an example of a LLaMa 2 model, as it is using the BigDL-LLM Transformer-style API which involves changing the model loading part, and the subsequent usage process is identical to native Transformers APIs. The way to load the model using the BigDL-LLM API is almost the same as the Transformers API. Users only need to change the import statement and set load_in_4bit=True in the from_pretrained parameter. You can also use the load_in_low_bit API to support other low-bit types.
# Load Hugging Face Transformers model with int4 optimizations - from bigdl.llm.transformers import AutoModelForCausalL
复制代码- model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True).to("xpu")
复制代码
BigDL-LLM converts the model into 4-bit precision during model loading and optimizes its execution using various software and hardware acceleration techniques in the subsequent inference process.
- output = model.generate(input_ids.to("xpu"))
复制代码
|