The 20th ACM Symposium on Document Engineering

September 29, 2020 to October 1, 2020
San Jose, CA, USA


DocEng will be a virtual event on BlueJeans and Slack. No installation is required, apart from a Web browser. See for information on how to access the virtual events.

The date and times of the individual paper presentations will be provided soon. Papers will be presented as pre-recorded videos followed by a live Q&A. Please see Presentation Guidelines for Authors below for more details.

All times shown are in Pacific Daylight Time (PDT).

Monday, September 28th (Tutorial: Machine Interpretation of Sketched)

Drawings have been used as a means of communication for as long as people have made any form of writing. In this tutorial, we look at the use of sketches in the product design and manufacturing industry. Specifically, we will look at the difficulties associated with the machine interpretation of sketches. We then describe a sketch-based interface which allows us to create a 3D model from a 2D, annotated drawing. Next, we describe how Gabor filters can be used to obtain clean drawings from rough, over-sketched line strokes. We will then look at how a convolutional neural network can be used to attain similar results.

Throughout the tutorial, we will use Jupyter Notebook examples which are available on the Slack channel.

Time (PDT) Session Authors
7:00am ‑ 10:00am Tutorial: Machine Interpretation of Sketched Documents Alexandra Bonnici, Kenneth Camilleri

Tuesday, September 29th

Time (PDT) Paper ID Session Authors
7:00am ‑ 7:15am Welcome
7:15am ‑ 8:00am Keynote: OCR and Document Understanding Cha Zhang
8:00am ‑ 8:05am - break -
Session 1: OCR and Images Processing Session Chair: Cha Zhang
8:05am ‑ 8:20am 1.1 Direct Sampling of Multiview Line Drawings for Document Retrieval Cristopher Flagg and Ophir Frieder
8:20am ‑ 8:35am 1.2 Cardinal Graph Convolution Framework for Document Information Extraction Rinon Gal, Shai Ardazi and Roy Shilkrot
8:35am ‑ 8:45am 1.3 HTR-Flor++: A handwritten text recognition system based on a pipeline of optical and language models Arthur Flor de Sousa Neto, Byron Leite Dantas Bezerra, Alejandro Héctor Toselli and Estanislau Baptista Lima
8:45am ‑ 8:55am 1.4 The Old Bailey and OCR: Benchmarking AWS, Azure, and GCP with 180,000 Page Images William Ughetta and Brian Kernighan
8:55am ‑ 9:00am - break -
Session 2: COVID Documents and Data Session Chair: Steve Simske
9:00am ‑ 9:10am 2.1 COVID-19 Kaggle Literature Organization Nick Solovyev, Maksim Eren, Charles Nicholas, Edward Raff and Ben Johnson
9:10am ‑ 9:20am 2.2 COVIDSeer : Extending the CORD-19 Dataset Shaurya Rohatgi, Zeba Karishma, Jason Chhay, Sai Raghav Reddy Keesara, Jian Wu, Cornelia Caragea and C. Lee Giles
9:20am ‑ 9:30am Open discussion

Wednesday, September 30th

Time (PDT) Paper ID Session Authors
7:00am ‑ 7:10am Time-Quality Competition on Binarizing Photographed Documents Rafael Dueire Lins, Rafael Ferreira de Mello and Steven Simske
7:10am ‑ 7:20am Competition on Extractive Text Summarization Rafael Dueire Lins, Rafael Ferreira de Mello and Steven Simske
7:20am ‑ 7:30am Tutorial on Machine Interpretation of Sketched Documents Alexandra Bonnici and Kenneth Camilleri
Session 3: Text Processing Session Chair: Alexandra Bonnici
7:30am ‑ 7:45am 3.1 An Assessment of Sentence Simplification Methods in Extractive Text Summarization Rafaella Vale, Rafael Lins and Rafael Ferreira Mello
7:45am ‑ 7:55am 3.2 Short Text Stream Clustering via Frequent Word Pairs and Reassignment of Outliers to Clusters Md Rashadul Hasan Rakib, Norbert Zeh and Evangelos Milios
7:55am ‑ 8:05am 3.3 Improving query expansion strategies with word embeddings Alfredo Silva and Marcelo Mendoza
8:05am ‑ 8:15am 3.4 Assessing Causality Structures learned from Digital Text Media Mariano Maisonnave, Fernando Delbianco, Fernando Tohmé, Ana G. Maguitman and Evangelos E. Milios
8:15am ‑ 8:20am - break -
Session 4: Document Changes Session Chair: Angelo Di Iorio
8:20am ‑ 8:35am 4.1 Change Detection on JATS Academic Articles: An XML Diff Comparison Study Milos Cuculovic, Frédéric Fondement, Maxime Devanne, Jonathan Weber and Michel Hassenforder
8:35am ‑ 8:45am 4.2 Interactive and Scalable visualization framework for Version-aware XML documents Ahmed Shatnawi and Ethan Munson
8:45am ‑ 8:55am 4.3 A Framework for Extracted View Maintenance Besat Kassaie and Frank Tompa
8:55am ‑ 9:00am - break -
9:00am ‑ 9:15am ACM Town Hall

Thursday, October 1st

Time (PDT) Paper ID Session Authors
7:00am ‑ 7:25am BoF report Charles Nicholas
7:25am ‑ 7:30am - break -
Session 5: Markup Languages and Standards Session Chair: Charles Nicholas
7:30am ‑ 7:40am 5.1 Automatic Generation of Electrical Plan Documents from Architectural Data Melissa Cote, Alireza Rezvanifar and Alexandra Branzan Albu
7:40am ‑ 7:50am 5.2 Parsing a markup language that supports overlap and discontinuity Ronald Haentjens Dekker, Bram Buitendijk and Elli Bleeker
7:50am ‑ 7:55am - break -
Session 6: Document Analysis Session Chair: Tamir Hassan
7:55am ‑ 8:10am 6.1 PDF2LaTeX: A Deep Learning System to Convert Mathematical Documents from PDF to LaTeX Zelun Wang and Jyh-Charn Liu
8:10am ‑ 8:25am 6.2 Order out of Chaos: Construction of Knowledge Models from PDF Textbooks Isaac Alpizar Chacon and Sergey Sosnovsky
8:25am ‑ 8:35am 6.3 A Framework to Evaluate Webpage Segment Recognizers Nicola Raffaele Di Matteo and James Blustein
8:35am ‑ 8:45am 6.4 ServiceMarq: Extracting Service Contributions from Call for Papers Tian Shi, Abhinav Ramesh Kashyap and MinYen Kan
8:45am ‑ 8:50am - break -
8:50am ‑ 9:15am Closing

Bird of a Feather

Charles Nicholas will serve as chair of this year's Birds of a Feather (BoaF) session. We invite you to go to the bof-general Slack channel or message Charles Nicholas on Slack with ideas or suggestions for discussion. The only constraint is that the topic must have some (perhaps dubious) relationship to Document Engineering. These suggestions will be boiled down to a few specific topics, and shared with the participants a day or so before the conference. We'll use the BlueJeans system to set up meeting spaces for those who want to take part in one or more BoaF discussions. Each session will have the chance to give a brief (two minutes) presentation on the last day of the conference.

Presentation Guidelines for Authors

DocEng will hold three live sessions. In order to facilitate an on-time schedule, we are requiring authors to pre-record their presentations. Long papers will have 10 minutes for their presentation recording with 5 minutes for a live Q&A. Short papers will have 7 minutes for their presentation recording with 3 minutes for a live Q&A. While we do not provide strict guidelines on which tools should be used to record the presentations, we recommend using the built in screen recording tool over the presentation of the slides ( or We do not require a video stream of the author/presentor to appear in this video.

Videos should be submitted one week before the presentation date. Further upload instructions to be provided soon.

Please contact if you have any questions.