---
references:
- id: bahdanau_neural_2015
accessed:
- year: 2017
month: 9
day: 6
author:
- family: Bahdanau
given: Dzmitry
- family: Cho
given: Kyunghyun
- family: Bengio
given: Yoshua
citation-key: bahdanau_neural_2015
container-title: arXiv:1409.0473 [cs, stat]
event-title: ICLR
issued:
- year: 2015
source: arXiv.org
title: Neural Machine Translation by Jointly Learning to Align and Translate
type: paper-conference
URL: http://arxiv.org/abs/1409.0473
- id: BrownLanguage2020
accessed:
- year: 2020
month: 6
day: 7
author:
- family: Brown
given: Tom B.
- family: Mann
given: Benjamin
- family: Ryder
given: Nick
- family: Subbiah
given: Melanie
- family: Kaplan
given: Jared
- family: Dhariwal
given: Prafulla
- family: Neelakantan
given: Arvind
- family: Shyam
given: Pranav
- family: Sastry
given: Girish
- family: Askell
given: Amanda
- family: Agarwal
given: Sandhini
- family: Herbert-Voss
given: Ariel
- family: Krueger
given: Gretchen
- family: Henighan
given: Tom
- family: Child
given: Rewon
- family: Ramesh
given: Aditya
- family: Ziegler
given: Daniel M.
- family: Wu
given: Jeffrey
- family: Winter
given: Clemens
- family: Hesse
given: Christopher
- family: Chen
given: Mark
- family: Sigler
given: Eric
- family: Litwin
given: Mateusz
- family: Gray
given: Scott
- family: Chess
given: Benjamin
- family: Clark
given: Jack
- family: Berner
given: Christopher
- family: McCandlish
given: Sam
- family: Radford
given: Alec
- family: Sutskever
given: Ilya
- family: Amodei
given: Dario
citation-key: BrownLanguage2020
container-title: arXiv:2005.14165 [cs]
issued:
- year: 2020
month: 6
day: 1
source: arXiv.org
title: Language Models are Few-Shot Learners
type: article-journal
URL: http://arxiv.org/abs/2005.14165
- id: CelikyilmazScaffolding2017
author:
- family: Celikyilmaz
given: Asli
- family: Deng
given: Li
- family: Li
given: Lihong
- family: Wang
given: Chong
citation-key: CelikyilmazScaffolding2017
container-title: arXiv:1702.08653 [cs]
issued:
- year: 2017
month: 2
day: 28
source: arXiv.org
title: Scaffolding Networks for Teaching and Learning to Comprehend
type: article-journal
URL: http://arxiv.org/abs/1702.08653
- id: ChoyUniversal2016
author:
- family: Choy
given: Christopher B
- family: Gwak
given: JunYoung
- family: Savarese
given: Silvio
- family: Chandraker
given: Manmohan
citation-key: ChoyUniversal2016
container-title: Advances in Neural Information Processing Systems 29
editor:
- family: Lee
given: D. D.
- family: Sugiyama
given: M.
- family: Luxburg
given: U. V.
- family: Guyon
given: I.
- family: Garnett
given: R.
issued:
- year: 2016
page: 2406–2414
publisher: Curran Associates, Inc.
source: Neural Information Processing Systems
title: Universal Correspondence Network
type: paper-conference
URL: http://papers.nips.cc/paper/6487-universal-correspondence-network.pdf
- id: FreemanHow2019
accessed:
- year: 2020
month: 3
day: 28
author:
- family: Freeman
given: Alexandra L J
citation-key: FreemanHow2019
container-title: Drug and Therapeutics Bulletin
container-title-short: DTB
DOI: 10.1136/dtb.2019.000008
ISSN: 0012-6543, 1755-5248
issue: '8'
issued:
- year: 2019
month: 8
language: en
page: 119-124
source: DOI.org (Crossref)
title: How to communicate evidence to patients
type: article-journal
volume: '57'
- id: HuangMusic2018
accessed:
- year: 2019
month: 1
day: 22
author:
- family: Huang
given: Cheng-Zhi Anna
- family: Vaswani
given: Ashish
- family: Uszkoreit
given: Jakob
- family: Shazeer
given: Noam
- family: Simon
given: Ian
- family: Hawthorne
given: Curtis
- family: Dai
given: Andrew M.
- family: Hoffman
given: Matthew D.
- family: Dinculescu
given: Monica
- family: Eck
given: Douglas
citation-key: HuangMusic2018
issued:
- year: 2018
month: 9
day: 12
language: en
source: arxiv.org
title: Music Transformer
type: article-journal
URL: https://arxiv.org/abs/1809.04281v3
- id: HuangMusic2018a
accessed:
- year: 2019
month: 9
day: 26
author:
- family: Huang
given: Cheng-Zhi Anna
- family: Vaswani
given: Ashish
- family: Uszkoreit
given: Jakob
- family: Simon
given: Ian
- family: Hawthorne
given: Curtis
- family: Shazeer
given: Noam
- family: Dai
given: Andrew M.
- family: Hoffman
given: Matthew D.
- family: Dinculescu
given: Monica
- family: Eck
given: Douglas
citation-key: HuangMusic2018a
issued:
- year: 2018
month: 9
day: 27
source: openreview.net
title: 'Music Transformer: Generating Music with Long-Term Structure'
title-short: Music Transformer
type: article-journal
URL: https://openreview.net/forum?id=rJe4ShAcF7
- id: KatharopoulosTransformers2020
accessed:
- year: 2020
month: 9
day: 16
author:
- family: Katharopoulos
given: Angelos
- family: Vyas
given: Apoorv
- family: Pappas
given: Nikolaos
- family: Fleuret
given: François
citation-key: KatharopoulosTransformers2020
container-title: arXiv:2006.16236 [cs, stat]
issued:
- year: 2020
month: 8
day: 31
source: arXiv.org
title: >-
Transformers are RNNs: Fast Autoregressive Transformers with Linear
Attention
title-short: Transformers are RNNs
type: article-journal
URL: http://arxiv.org/abs/2006.16236
- id: LiTrain2020
accessed:
- year: 2020
month: 3
day: 28
author:
- family: Li
given: Zhuohan
- family: Wallace
given: Eric
- family: Shen
given: Sheng
- family: Lin
given: Kevin
- family: Keutzer
given: Kurt
- family: Klein
given: Dan
- family: Gonzalez
given: Joseph E.
citation-key: LiTrain2020
container-title: arXiv:2002.11794 [cs]
issued:
- year: 2020
month: 2
day: 26
source: arXiv.org
title: >-
Train Large, Then Compress: Rethinking Model Size for Efficient Training and
Inference of Transformers
title-short: Train Large, Then Compress
type: article-journal
URL: http://arxiv.org/abs/2002.11794
- id: OrtegaShaking2021
accessed:
- year: 2021
month: 10
day: 26
author:
- family: Ortega
given: Pedro A.
- family: Kunesch
given: Markus
- family: Delétang
given: Grégoire
- family: Genewein
given: Tim
- family: Grau-Moya
given: Jordi
- family: Veness
given: Joel
- family: Buchli
given: Jonas
- family: Degrave
given: Jonas
- family: Piot
given: Bilal
- family: Perolat
given: Julien
- family: Everitt
given: Tom
- family: Tallec
given: Corentin
- family: Parisotto
given: Emilio
- family: Erez
given: Tom
- family: Chen
given: Yutian
- family: Reed
given: Scott
- family: Hutter
given: Marcus
- family: Freitas
given: Nando
non-dropping-particle: de
- family: Legg
given: Shane
citation-key: OrtegaShaking2021
container-title: arXiv:2110.10819 [cs]
issued:
- year: 2021
month: 10
day: 20
language: en
source: arXiv.org
title: >-
Shaking the foundations: delusions in sequence models for interaction and
control
title-short: Shaking the foundations
type: article-journal
URL: http://arxiv.org/abs/2110.10819
- id: PhuongFormal2022
accessed:
- year: 2022
month: 8
day: 5
author:
- family: Phuong
given: Mary
- family: Hutter
given: Marcus
citation-key: PhuongFormal2022
issued:
- year: 2022
month: 7
day: 19
language: en
number: arXiv:2207.09238
publisher: arXiv
source: arXiv.org
title: Formal Algorithms for Transformers
type: article
URL: http://arxiv.org/abs/2207.09238
- id: RadfordLanguage2019
author:
- family: Radford
given: Alec
- family: Wu
given: Jeffrey
- family: Child
given: Rewon
- family: Luan
given: David
- family: Amodei
given: Dario
- family: Sutskever
given: Ilya
citation-key: RadfordLanguage2019
issued:
- year: 2019
language: en
page: '24'
source: Zotero
title: Language Models are Unsupervised Multitask Learners
type: article-journal
- id: RamsauerHopfield2020
accessed:
- year: 2020
month: 8
day: 13
author:
- family: Ramsauer
given: Hubert
- family: Schäfl
given: Bernhard
- family: Lehner
given: Johannes
- family: Seidl
given: Philipp
- family: Widrich
given: Michael
- family: Gruber
given: Lukas
- family: Holzleitner
given: Markus
- family: Pavlović
given: Milena
- family: Sandve
given: Geir Kjetil
- family: Greiff
given: Victor
- family: Kreil
given: David
- family: Kopp
given: Michael
- family: Klambauer
given: Günter
- family: Brandstetter
given: Johannes
- family: Hochreiter
given: Sepp
citation-key: RamsauerHopfield2020
container-title: arXiv:2008.02217 [cs, stat]
issued:
- year: 2020
month: 7
day: 16
source: arXiv.org
title: Hopfield Networks is All You Need
type: article-journal
URL: http://arxiv.org/abs/2008.02217
- id: VaswaniAttention2017
accessed:
- year: 2017
month: 9
day: 6
author:
- family: Vaswani
given: Ashish
- family: Shazeer
given: Noam
- family: Parmar
given: Niki
- family: Uszkoreit
given: Jakob
- family: Jones
given: Llion
- family: Gomez
given: Aidan N.
- family: Kaiser
given: Lukasz
- family: Polosukhin
given: Illia
citation-key: VaswaniAttention2017
container-title: arXiv:1706.03762 [cs]
issued:
- year: 2017
month: 6
day: 12
source: arXiv.org
title: Attention Is All You Need
type: article-journal
URL: http://arxiv.org/abs/1706.03762
- id: YangFeature2020
accessed:
- year: 2021
month: 1
day: 4
author:
- family: Yang
given: Greg
- family: Hu
given: Edward J.
citation-key: YangFeature2020
container-title: arXiv:2011.14522 [cond-mat]
issued:
- year: 2020
month: 11
day: 29
source: arXiv.org
title: Feature Learning in Infinite-Width Neural Networks
type: article-journal
URL: http://arxiv.org/abs/2011.14522
- id: ZhangUnveiling2022
accessed:
- year: 2022
month: 6
day: 22
author:
- family: Zhang
given: Yi
- family: Backurs
given: Arturs
- family: Bubeck
given: Sébastien
- family: Eldan
given: Ronen
- family: Gunasekar
given: Suriya
- family: Wagner
given: Tal
citation-key: ZhangUnveiling2022
DOI: 10.48550/arXiv.2206.04301
issued:
- year: 2022
month: 6
day: 9
number: arXiv:2206.04301
publisher: arXiv
source: arXiv.org
title: 'Unveiling Transformers with LEGO: a synthetic reasoning task'
title-short: Unveiling Transformers with LEGO
type: article
URL: http://arxiv.org/abs/2206.04301
...