attention-is-all-you-need-pytorch
The PyTorch implementation of the Transformer model uses self-attention for advanced translation, achieving leading results in the 2014 WMT English-German task. It supports training and translation with the model, with byte pair encoding features under development. The project is suitable for those interested in Transformer architecture without convolutional or recurrent layers. Contributions and suggestions are welcome.