The 21st ACM Symposium on Document Engineering

August 24, 2021 to August 27, 2021
Limerick, Ireland


This year DocEng will be a virtual event using Zoom for the video conference and Slack for instant messaging/collaboration.



Please use Channel Browser in Slack to view all the available channels, one for each paper or presentation. Feel free to join the relevant channel for the papers that you want to engage and discuss with others.

The video presentations for each paper will be made available in each paper's channel as the conference progresses.

Technical support

Please contact the organizers via Slack on the #general-networking channel or by email

Final Programme

The full programme is available for download in PDF format:


The proceedings for DocEng2021 are available at:


All times below are IST (Irish Standard Time or UTC+1)

Tuesday, 24th August, 2021 - Tutorials

Session Chair: Rafael Dueire Lins (Universidade Federal de Pernambuco, Brazil)

Time Session Abstract Presented by
3:00 PM Domain-specific Modelling in Document Engineering Domain-Specific Modelling raises the level of abstraction beyond current programming languages by specifying the solution directly using problem domain concepts. The final artifacts needed, such as code, tests, reports, protocols, configuration files and documents are then generated from these high-level specifications. This automation is possible because both the modelling language and generators need fit the requirements of a narrow domain — often inside only one company. This tutorial describes Domain-Specific Modelling approach and demonstrates the practical benefits of applying it to modelling business / automation processes where documents are used. It describes the process of creating your modelling solution along with examples. Verislav Djukic and Juha-Pekka Tolvanen
3:00 PM Document Engineering Issues in Malware Analysis We present an overview of the field of malware analysis with emphasis on issues related to document engineering. We will introduce the field with a discussion of the types of malware, including executable binaries, malicious PDFs, polymorphic malware, ransomware, and exploit kits. We will conclude with our view of important research questions in the field. This is an updated version of tutorials presented in previous years, with more information about newly-available tools. More info at: Charles Nicholas, Robert Joyce, Steven Simske

Wednesday, 25th August, 2021

Time Session Presented by
3:00 PM Welcome note Patrick Healy, Mihai Bilauca
3:15 PM Keynote I: Searching Harsh Documents Ophir Frieder
4:15 PM Break
S1 Document Content Analysis Chair: Besat Kassaie
4:35 PM Efficient Clustering of Short Text Streams using Online-Offline Clustering Md Rashadul Hasan Rakib, Norbert Zeh and Evangelos Milios
4:50 PM Efficient Sparse Spherical k-Means for Document Clustering Johannes Knittel, Steffen Koch and Thomas Ertl
5:00 PM Small-step Pipelines Reduce the Complexity of XSLT/XPath Programs Marcel Schaeben and Gioele Barabucci
5:10 PM MTLV: A Library for Building Deep Multi-task Learning Architectures Fatemeh Rahimi, Evangelos E. Milios and Stan Matwin
5:20 PM ELSKE: Efficient Large-Scale Keyphrase Extraction Johannes Knittel, Steffen Koch and Thomas Ertl
5:30 PM Break
S2 Generation, Manipulation and Presentation Chair: Steven Bagley
5:50 PM Ordering Sentences and Paragraphs with Pre-trained Encoder-Decoder Transformers and Pointer Ensembles Rémi Calizzano, Malte Ostendorff and Georg Rehm
6:05 PM SlideGen: An Abstractive Section-Based Slide Generator for Scholarly Documents Athar Sefid, Prasenjit Mitra and C. Lee Giles
6:15 PM Engineering of An Artificial Intelligence Safety Data Sheet Document Processing System for Environmental, Health, and Safety Compliance Kevin Fenton and Steven Simske
6:25 PM The DocEng Book Series Steve Simske and Nicki Dennis
6:35 PM Birds of a feather Charles Nicholas
7:35 PM End of Day

Thursday, 26th August, 2021

Time Session Presented by
3:00 PM Keynote II: 20 Years of Physical Document and Product Protection Using Digital Methods Justin Picard
4:00 PM Break
S3 Security and Sensitive Documents Chair: Charles Nicholas
4:20 PM A Novel Approach on the Joined De-Identification of Textual and Relational Data with a Modified Mondrian Algorithm Fabian Singhofer, Aygul Garifullina, Mathias Kern and Ansgar Scherp
4:35 PM Pornographic Content Classification Using Deep-Learning Andre Tabone, Kenneth Camilleri, Alexandra Bonnici, Stefania Cristina, Reuben Farrugia and Mark Borg
4:50 PM Counterfeit Detection with QR Codes Justin Picard, Paul Landry and Michael Bolay
5:00 PM Trustworthiness of Spam Email Addresses Using Machine Learning Francisco Jáñez-Martino, Rocio Alaiz-Rodríguez, Víctor González-Castro and Eduardo Fidalgo
5:10 PM Break
S4 Applications and User Experiences Chair: Dick Bulterman
5:30 PM Recognizing Creative Visual Design: Multiscale Design Characteristics in Free-Form Web Curation Documents Ajit Jain, Andruid Kerne, Nic Lupfer, Gabriel Britain, Aaron Perrine, Yoonsuck Choe, John Keyser and Ruihong Huang
5:45 PM Rescuing Historical Climate Observations to Support Hydrological Research: A Case Study of Solar Radiation Data Odunayo Ogundepo, Naveela Sookoo, Gautam Bathla, Anthony Cavallin, Bhaleka Persaud, Kathy Szigeti, Philippe Van Cappellen and Jimmy Lin
5:55 PM ALiBERT - Improved Automated List Inspection (ALI) with BERT Rajkumar Ramamurthy, Maren Pielka, Robin Stenzel, Christian Bauckhage, Rafet Sifa, Tim Khameneh, Uli Warning, Bernd Kliem and Rüdiger Loitz
6:05 PM A Large-Scale Exploration of Terms of Service Documents on the Web Soundarya Nurani Sundareswara, Mukund Srinath, Shomir Wilson and C. Lee Giles
6:15 PM Metadata-Driven Eye Tracking for Real-Time Applications Yasith Jayawardana, Gavindya Jayawardena, Andrew Duchowski and Sampath Jayarathna
6:25 PM ACM Town hall Peter Brusilovsky
6:35 PM Networking/Free Chat
7:35 PM End of Day

Friday, 27th August, 2021

Time Session Presented by
3:00 PM DocEng 2022 Matthew Hardy, Curtis Wigington
S5 Systems for Visual Document Analysis Chair: Tamir Hassan
3:10 PM Table-structure Recognition Method Using Neural Networks for Implicit Ruled Line Estimation and Cell Estimation Manabu Ohta, Ryoya Yamada, Teruhito Kanazawa and Atsuhiro Takasu
3:25 PM Evaluating Deep Neural Networks for Image Document Enhancement Lucas Kirsten, Ricardo Piccoli and Ricardo Ribani
3:35 PM Towards Extraction of Theorems and Proofs in Scholarly Articles Shrey Mishra, Lucas Pluvinage and Pierre Senellart
3:45 PM A Comparative Study on Methods and Tools for Handwritten Mathematical Expression Recognition Daniela Costa, Carlos Mello and Marcelo d'Amorim
3:55 PM Short Break
4:00 PM Text line extraction using deep learning and minimal sub seams Adi Azran, Alon Schclar and Raid Saabni
4:10 PM Direct Binarisation A Quality-and-Time Efficient Binarisation Strategy Rafael Lins, Rodrigo B. Bernardino, Ricardo Barboza and Zanoni Lins
4:20 PM Challenges in Chart Image Classification: A Comparative Study of Different Deep Learning Methods Jennil Thiyam, Sanasam Ranbir Singh and Prabin K. Bora
4:30 PM Break
S6 Collections, Systems and Management Chair: Angelo Di Iorio
4:45 PM On Minimizing Cost in Legal Document Review Workflows Eugene Yang, David Lewis and Ophir Frieder
5:00 PM Heuristic Stopping Rules For Technology-Assisted Review Eugene Yang, David Lewis and Ophir Frieder
5:15 PM Shock Wave: a Graph Layout Algorithm for Text Analyzing Maxime Cauz, Julien Albert, Anne Wallemacq, Isabelle Linden and Bruno Dumas
5:25 PM COVID-19 Multidimensional Kaggle Literature Organization Maksim Eren, Nick Solovyev, Chris Hamer, Renee McDonald, Boian Alexandrov and Charles Nicholas
5:35 PM Break
5:55 PM Binarisation challenge summary Steve Simske
6:05 PM Birds of a Feather presentations Charles Nicholas
6:15 PM Best paper awards
6:25 PM Closing remarks
6:35 PM Networking/Free Chat
6:45 PM End of Symposium

Birds of a Feather

Charles Nicholas will serve as chair of this year's Birds of a Feather (BoaF) session. We invite you to go to the bof-general Slack channel or message Charles Nicholas on Slack with ideas or suggestions for discussion. The only constraint is that the topic must have some relationship to Document Engineering. These suggestions will be boiled down to a few specific topics, and shared with the participants a day or so before the conference. We'll use Zoom to set up meeting spaces for those who want to take part in one or more BoaF discussions. Each session will have the chance to give a brief (two minutes) presentation on the last day of the conference.

A message from Charles:

Greetings, DocEng 2021 Participants!

It is once again my privilege to serve as chair of this year's Birds of a Feather (BoaF) session. As such, I invite you to email me (Charles Nicholas with ideas or suggestions for discussion.

We already have two BoaF topics:

  1. How to get a book published in Document Engineering
  2. Tensor methods in Document Engineering

Other topics are welcome, up to the designated time, near the end of the first day of the conference, as shown in the conference program . Each group will have the chance to give a brief (two minutes) presentation on the last day of the conference.

So, put your thinking caps on! Let me hear from you!

Thanks! Charles

Presentation Guidelines for Authors

DocEng'21 will hold four live sessions on each day of the conference, starting with the Tutorials on the 24th of August.

In order to facilitate an on-time schedule, we are requiring authors to pre-record their presentations. Long papers will have 10 minutes for their presentation recording with 5 minutes for a live Q&A. Short papers will have 7 minutes for their presentation recording with 3 minutes for a live Q&A.

There are several video conferencing tools available to easily record a presentation. In this method, you can show your face via webcam and display your slides as you talk. You can use any meeting software as long as you get a good quality recording and your final file is in the MP4 format. Here are some links to instructions on recording a meeting on common platforms:

In order for videos to be verified by the technical program committee, please upload your video using easyChair no later than Friday, 13th August 2021.

Video Specifications:

Please contact if you have any questions.