Project Icon

ByteTransformer

Achieve Enhanced Transformer Speed with Optimized Inference Techniques

Product DescriptionProvides efficient inference for BERT-like models with Python and C++ APIs using advanced architectural optimizations. Compatible with both fixed and variable-length transformers, the library surpasses other frameworks, as highlighted in IEEE IPDPS 2023. Implemented at ByteDance, it exceeds PyTorch and TensorFlow performance on NVIDIA GPUs. The setup is straightforward, requiring CUDA 11.6, CMake 3.13+, and PyTorch 1.8+.
Project Details