Seq2seq
Seq2seq is a family of machine learning approaches used for natural language processing.[1] Applications include language translation, image captioning, conversational models, and text summarization.[2]
History
The algorithm was developed by Google for use in machine translation.[2]
Similar earlier work includes Tomáš Mikolov's 2012 PhD thesis.[3]
In 2019, Facebook announced its use in symbolic integration and resolution of differential equations. The company claimed that it could solve complex equations more rapidly and with greater accuracy than commercial solutions such as Mathematica, MATLAB and Maple. First, the equation is parsed into a tree structure to avoid notational idiosyncrasies. An LSTM neural network then applies its standard pattern recognition facilities to process the tree.[4]
In 2020, Google released Meena, a 2.6 billion parameter seq2seq-based chatbot trained on a 341 GB data set. Google claimed that the chatbot has 1.7 times greater model capacity than OpenAI's GPT-2,[5] whose May 2020 successor, the 175 billion parameter GPT-3, trained on a "45TB dataset of plaintext words (45,000 GB) that was ... filtered down to 570 GB."[6]
In 2022, Amazon introduced AlexaTM 20B, a moderate-sized (20 billion parameter) seq2seq language model. It uses an encoder-decoder to accomplish few-shot learning. The encoder outputs a representation of the input that the decoder uses as input to perform a specific task, such as translating the input into another language. The model outperforms the much larger GPT-3 in language translation and summarization. Training mixes denoising (appropriately inserting missing text in strings) and causal-language-modeling (meaningfully extending an input text). It allows adding features across different languages without massive training workflows. AlexaTM 20B achieved state-of-the-art performance in few-shot-learning tasks across all Flores-101 language pairs, outperforming GPT-3 on several tasks.[7]
Related software
Software adopting similar approaches includes OpenNMT (Torch), Neural Monkey (TensorFlow) and NEMATUS (Theano).[8]
See also
References
- Sutskever, Ilya; Vinyals, Oriol; Le, Quoc Viet (2014). "Sequence to sequence learning with neural networks". arXiv:1409.3215 [cs.CL].
- Wadhwa, Mani (2018-12-05). "seq2seq model in Machine Learning". GeeksforGeeks. Retrieved 2019-12-17.
- p. 94 of https://www.fit.vut.cz/study/phd-thesis-file/283/283.pdf, https://www.fit.vut.cz/study/phd-thesis-file/283/283_o2.pdf
- "Facebook has a neural network that can do advanced math". MIT Technology Review. December 17, 2019. Retrieved 2019-12-17.
- Mehta, Ivan (2020-01-29). "Google claims its new chatbot Meena is the best in the world". The Next Web. Retrieved 2020-02-03.
- Gage, Justin. "What's GPT-3?". Retrieved August 1, 2020.
- Rodriguez, Jesus. "🤘Edge#224: AlexaTM 20B is Amazon's New Language Super Model Also Capable of Few-Shot Learning". thesequence.substack.com. Retrieved 2022-09-08.
- "Overview - seq2seq". google.github.io. Retrieved 2019-12-17.
External links
- "A ten-minute introduction to sequence-to-sequence learning in Keras". blog.keras.io. Retrieved 2019-12-19.
- Dugar, Pranay (2019-11-24). "Attention — Seq2Seq Models". Medium. Retrieved 2019-12-19.
- Nag, Dev (2019-04-24). "seq2seq: the clown car of deep learning". Medium. Retrieved 2019-12-19.
- Adiwardana, Daniel; Luong, Minh-Thang; So, David R.; Hall, Jamie; Fiedel, Noah; Thoppilan, Romal; Yang, Zi; Kulshreshtha, Apoorv; Nemade, Gaurav; Lu, Yifeng; Le, Quoc V. (2020-01-31). "Towards a Human-like Open-Domain Chatbot". arXiv:2001.09977 [cs.CL].