Vision Transformers in 2022: An Update on Tiny ImageNet

AI-generated keywords: Vision Transformer Tiny ImageNet Transfer Learning Swin Transformers Accuracy

AI-generated Key Points

  • Image transformers have made significant progress in closing the gap between traditional CNN architectures and modern transformer models.
  • Tiny ImageNet, a subset of ImageNet-1k with 100,000 images and 200 classes, is often overlooked by researchers.
  • This study evaluates the performance of four popular transformer models (ViT, DeiT, CaiT, and Swin) on Tiny ImageNet using transfer learning techniques.
  • Swin Transformers outperformed all other models with a validation accuracy rate of 91.35%, beating the current state-of-the-art result.
  • The code used in this study is available at https://github.com/ehuynh1106/TinyImageNet-Transformers.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Ethan Huynh

License: CC BY 4.0

Abstract: The recent advances in image transformers have shown impressive results and have largely closed the gap between traditional CNN architectures. The standard procedure is to train on large datasets like ImageNet-21k and then finetune on ImageNet-1k. After finetuning, researches will often consider the transfer learning performance on smaller datasets such as CIFAR-10/100 but have left out Tiny ImageNet. This paper offers an update on vision transformers' performance on Tiny ImageNet. I include Vision Transformer (ViT) , Data Efficient Image Transformer (DeiT), Class Attention in Image Transformer (CaiT), and Swin Transformers. In addition, Swin Transformers beats the current state-of-the-art result with a validation accuracy of 91.35%. Code is available here: https://github.com/ehuynh1106/TinyImageNet-Transformers

Submitted to arXiv on 21 May. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2205.10660v1

In recent years, image transformers have made significant strides in closing the gap between traditional convolutional neural network (CNN) architectures and modern transformer models. The standard procedure for training these models involves using large datasets such as ImageNet-21k and then fine-tuning on ImageNet-1k. However, researchers often overlook Tiny ImageNet, a subset of ImageNet-1k with 100,000 images and 200 classes. This paper offers an update on the performance of vision transformers on Tiny ImageNet. The study includes four popular transformer models: Vision Transformer (ViT), Data Efficient Image Transformer (DeiT), Class Attention in Image Transformer (CaiT), and Swin Transformers. Previous studies have evaluated transfer learning performance on smaller datasets such as CIFAR-10/100 but this research addresses the gap in modern research by evaluating vision transformers' accuracy on Tiny ImageNet. The ViT paper demonstrated that transformers could be applied to image classification tasks but was pre-trained on Google's internal dataset of 300 million images. DeiT addressed the data-hungry nature of transformers by using a rigorous training schedule and knowledge distillation to train a vision transformer using ImageNet-21k. Subsequent image transformers like CaiT and Swin closely followed DeiT's blueprint. Lee et al. proposed modifications to vision transformers to improve their accuracy when trained from scratch on Tiny ImageNet. However, transfer learning is a more common and stronger technique for achieving high accuracy rates. This study reports the accuracy of ViT, DeiT, CaiT, and Swin trans models trained using transfer learning techniques on Tiny ImageNet. Swin Transformers outperformed all other models with a validation accuracy rate of 91.35%, beating the current state-of-the-art result. Researchers can access the code used in this study at https://github.com/ehuynh1106/TinyImageNet-Transformers . In conclusion, this study fills a gap in modern research by evaluating the accuracy of popular vision transformer models on Tiny ImageNet and demonstrates that Swin Transformers outperform other models for accurate image classification tasks.
Created on 11 May. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.