Readings for the ICDAR2019 Deep Learning Tutorial

Original Convolutional Networks

1995-lecun-convolutional
- convolutional networks, sigmoid, average pooling
- precursor of RCNN for multi-object recognition
- digits and handwriting

Convolutional Networks on GPUs

2013-krizhevsky-imagenet
- ReLU, GPU training, local response normalization, pooling layers, dropout
- Imagenet dataset
2014-srivastava-dropout
- dropouts as ensembles of networks
- intended to prevent overtraining, improve generalization
- standard test cases (CIFAR, MNIST, etc.)
2014-simonyan-maxpool-very-deep
- 19 weight layers, multicrop evaluation, "VGG team" ILSVRC-2014 challenge
2015-ioffe-batch-normalization
- introduces batch normalization for faster training
2015-szegedy-rethinking-inception
- label smoothing, separable convolutions
2015-szegedy-going-deeper
- "inception modules", modular construction
2016-szegedy-inception
- "inception modules", modular construction
2015-he-resnet
- Introduces Resnet architecture
2015-jaderberg-spatial-transformer
- adds spatial transformations/distortions to learnable primitives
2017-dai-deformable
- adds deformable convolutions to learnable primitives

OCR:

2013-breuel-high-performance-ocr-lstm
- LSTM for printed OCR
2013-goodfellow-multidigit
- Google SVHN digits, 200k numbers with bounding boxes
- 8 layer convnet, ad-hoc sequence modeling
2017-breuel-lstm-ocr
- comparison of different convnet+LSTM architectures for OCR

Segmentation, Superresolution with Convolutional Networks

2015-dong-superresolution
- explicit upscaling of images
2015-ronneberger-unet
- general U-net architecture for image-to-image mappings
2015-byeon-mdlstm-segmentation
- MDLSTM for image segmentation
2015-stollenga-pyramid-lstm
- pyramid LSTM architecture
2015-long-convnet-semantic-segmentation
- semantic segmentation with convolutional networks
2015-girshick-rich-feature-hierarchies
- semantic segmentation with convolutional networks (multitask)
2015-noh-deconvolutional-networks
- atrous convolutions
2017-blogpost-semantic-segmentation
- survey of semantic segmentation architectures
2016-chen-deeplab
2017-chen-deeplab-atrous
2017-chen-rethinking-atrous
- atrous convolutions to learnable primitives, deeplab v3

OCR:

2015-afzal-binarization-mdlstm
- MDLSTM for binarization (image-to-image transformation)
2017-breuel-mdlstm-layout
- layout analysis with MDLSTM
2017-chen-convnet-page-segmentation
- layout analysis with convolutional nteworks
2017-he-semantic-page-segmentation
- layout analysis with convolutional nteworks
2018-mohan-layout-error-correction-using-dnn
- layout analysis with convolutional nteworks

RCNN and Overfeat

2014-lecun-overfeat
- convolutional network, generic feature extraction
- sliding window at multiple scales across image
- regression network
2015-liu-multibox
- input image and ground truth boxes
2015-ren-faster-rcnn-v3
- region proposal network (object/not object, box coords at each loc)
- translation invariant anchors

OCR:

2014-jaderberg-convnet-ocr-wild
- convnet, R-CNN, bounding box regression
- synthetic, ICDAR scene text, IIT Scene Text, IIT 5k words, IIT Sports-10k, BBC News
- no bounding boxes in general; initial detector trained on positive word samples, negative images
- 10k proposals per image

Saliency, Attention, Visualization

2014-jiang-saliency
- explicit computation of salience
2015-zhou-class-attention-mapping
- gradient-based mapping of class-related features
2016-selvaraju-gradient-mapping
- gradient-based mapping of class-related features
2013-zeiler-visualizing-cnns
- learns inverses to layers via unpooling, transposed convolutions
2016-yu-visualizing-vgg
- applied to VGG16
2018-li-pyramid-attention
- combines multiresolution and attention

LSTM, CTC, GRU

1999-gers-lstm
- introduces the LSTM architecture
2005-graves-bdlstm
- introduces bidirectional LSTM
2006-graves-ctc
- introduces CTC alignment (a kind of forward-backward algorithm)

OCR:

2012-elaguni-ocr-in-video
- manually labeled training data on small dataset
- multiscale, convnet features, BLSTM, CTC
2014-bluche-comparison-sequence-trained
- HMM, GMM-HMM, MLP-HMM, LSTM
- Rimes, IAM; decoding with Kaldi (ASR toolkit)
2016-he-reading-scene-text
- large CNN, Maxout units, LSTM, CTC
- Street View Text, IIT 5k-word, PhotoOCR, etc., using bounding boxes for training
2017-wang-gru-ocr

2D LSTM

2009-graves-multidimensional
- applies LSTM to multidimensional problems
2014-byeon-supervised-texture
- supervised image segmentation using multidimensional LSTM
2016-visin-reseg
- separable multidimensional LSTMs for image segmentation
2015-sonderby-convolutional
- convolutional LSTM architecture and attention
2016-shi-convolutional-lstm
- convolutional LSTM architecture

OCR:

2015-visin-renet
- separable multidimensional LSTMs for OCR

Seq2Seq, Attention

2012-graves-sequence-transduction
- introduces sequence transduction as an alternative to CTC
2015-bahdanau-attention
- content-based attention mechanisms for sequence to sequence tasks
2015-zhang-character-level-convnets-text
- simple use of convolutional networks as alternatives to n-grams, sequence models
2016-chorowski-better-decoding
- label smoothing and beam search
2017-vaswani-attention-is-all-you-need
- high performance sequence-to-sequence with attention
- masked, multi-head attention
2017-prabhavalkar-s2s-comparison
- a comparison of different sequence-to-sequence approaches
2017-gehring-convolutional-s2s
- purely convolutional sequence-to-sequence with attention

OCR:

2015-sahu-s2s-ocr
- standard seq2seq encoder/decoder approach
- TSNE visualizations of encoded word images
- word images from scanned books

Visual Attention

2017-nam-dual-attention
- joint visual and text attention networks

OCR:

2016-bluche-end-to-end-hw-mdlstm-attention
- full paragraph handwriting recognition without explicit segmentation
- MDLSTM plus attention, tracking, etc.
- IAM database, pretraining LSTM+CTC, curriculum learning
2016-lee-recursive-recurrent-attention-wild
- recursive convolutional layers, tied weights, followed by attention, character level modeling
- ICDAR 2003, 2013, SVT, IIT5k, Synth90k using bounding boxes for training

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Blogs		Blogs
General		General
Learning		Learning
More		More
OCR		OCR
.gitattributes		.gitattributes
Makefile		Makefile
README.html		README.html
README.md		README.md
index.html		index.html
results.html		results.html
results.md		results.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Readings for the ICDAR2019 Deep Learning Tutorial

Original Convolutional Networks

Convolutional Networks on GPUs

Segmentation, Superresolution with Convolutional Networks

RCNN and Overfeat

Saliency, Attention, Visualization

LSTM, CTC, GRU

2D LSTM

Seq2Seq, Attention

Visual Attention

About

Uh oh!

Releases

Packages

Languages

tmbdev-tutorials/icdar2019-readings

Folders and files

Latest commit

History

Repository files navigation

Readings for the ICDAR2019 Deep Learning Tutorial

Original Convolutional Networks

Convolutional Networks on GPUs

Segmentation, Superresolution with Convolutional Networks

RCNN and Overfeat

Saliency, Attention, Visualization

LSTM, CTC, GRU

2D LSTM

Seq2Seq, Attention

Visual Attention

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages