Powered by GitBook

copy of SIGVID

SIGVID

The reading list for the Special Interest Group on Visual Information Description

Image Captioning

Level 0

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)[ICLR 2015 Oral]
Sequence to Sequence -- Video to Text[ICCV 2015]
What value do explicit high level concepts have in vision to language problems?[CVPR 2016]

Level 1

Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images[ICCV 2015]
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention[ICML 2015]
DenseCap: Fully Convolutional Localization Networks for Dense Captioning[CVPR 2016 Oral]
Image Captioning with Deep Bidirectional LSTMs[ACMMM 2016 Oral]

Video Captioning

Early Embedding and Late Reranking for Video Captioning[ACMMM 2016 Grand Challenge Award]
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks[CVPR 2016 Oral]
Frame-and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation[best in MSR Video to Language Challenge]

Visual Question Answering

VQA: Visual Question Answering[ICCV 2015]

Miscellaneous

Spatial Transformer Networks[NIPS 2015]

Theories of DNN

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization[NIPS 2014]
The loss surfaces of multilayer networks[JMLR 2015]
On the expressive power of deep neural networks[ML/AI arxiv 2016]

Appendix

Other Reading Lists

Project Demo

NeuralTalk

DenseCap

Deeper LSTM+ normalized CNN for Visual Question Answering

Faster RCNN

anchor_target_layer

results matching ""

No results matching ""