Can generative AI create accessible web code? A benchmark analysis of AI-generated HTML against accessibility standards
No Thumbnail Available
Authors
Abu Doush, Iyad
Kassem, Reem
Issue Date
2025-07-31
Type
Article
Language
Keywords
Alternative Title
Abstract
This study investigates the accessibility performance of leading generative AI models, ChatGPT 4o, Copilot Pro, Claude 3.7 Sonnet, and Grok 3, when used to generate web components that conform to WCAG 2.1 accessibility standards. Eleven different components, including skip to main content, negative tab index, select-only combo box, two-state checkbox, accordion user interface, login form, sign-up form, table with one header, table with multiple headers, images of text, and complex images, were tested. For each web component, every GAI model was first provided with a foundation prompt containing detailed context and accessibility requirements. This was followed by a main prompt that specified the web component to be generated. Once the model produced the full code, the resulting web page was manually tested using keyboard navigation and screen reader, and compared against an ideal code developed according to accessibility standards. If any issues were identified, a follow-up prompt was issued clearly describing the problems that needed to be addressed. The results show that while all models are capable of producing semantically valid code, they frequently fail to meet full accessibility compliance without further prompting and human interventions. Claude 3.7 Sonnet and ChatGPT 4o demonstrated the strongest performance, with fewer violations and less reliance on follow-up instructions. Follow-up prompting significantly improved results, highlighting the importance of prompt engineering. The findings confirm that GAI can support accessible web development, but it requires guidance and human oversight to ensure compliance with standards.
Description
Citation
Publisher
Springer Verlag
License
Journal
Volume
24