High-end versions (34B) require significant VRAM—up to 80GB+ per GPU for full fine-tuning.

It is highly optimized for both English and Chinese instructions.

It matches GPT-3.5 quality while remaining more cost-effective for developers.

Available in 4-bit and 8-bit versions to run on consumer hardware like local GPUs.