A new approach to defining a JSON schema

The representation of data has undergone many evolutions. The development of the relational database was clearly a vital technological revolution, but the relational model of data remains severely limited. While procedural languages allowed the use of more sophisticated data structures, the advent of Object Orientation gave programmers an importantly different perspective on how data really “works”. In the late Nineties, XML added to the way we thought about representing data, and the early Naughties saw the advent of XML Schema as a way of formally defining the syntax of XML documents.

In parallel with the evolution of XML, the early Nineties saw the beginnings of JavaScript as a way of adding dynamic functionality to static web pages. Being only tenuously related to Java, the simplicity of JavaScript, perhaps coupled with its ability to mix procedural and object-oriented paradigms, has led to its widespread adoption as both a client-side and a server-side programming language. In the early days of JavaScript development, client-server interaction was generally implemented using XML through AJAX (Asynchronous JavaScript and XML) calls, in which client-side JavaScript could receive and process XML documents to dynamically change the DOM.

While developers could easily recognise the benefits of both JavaScript and XML, it also became apparent that the native representation of data in JavaScript was much simpler and less cumbersome (e.g. no need for verbose closing tags) than XML. Furthermore, XML inherited some challenges from its predecessor, HMTL, as a format for representing structured data (the ambiguity over when to use an attribute rather than an element and the concept of mixed content), and this led to the evolution of JSON as much more consistent and lightweight format for exchanging structured data in a web context. Very quickly, the JSON format, in conjunction with the development of “Representational State Transfer” (REST) web services as an architectural style for defining APIs, drove a switch to JSON being the dominant format for data exchange in distributed APIs.

Part of the attraction of JSON was that there was no need to define a schema before creating structured data and, in the spirit of Tim Berners-Lee’s injunction to “be tolerant in what you receive and conservative in what you send”, the assumption was that the code should deal with any problems in the data. While this was largely fine for teams of web developers working together within a tightly defined context, the inevitable spread of JSON and REST into enterprise integration patterns and the need for such APIs to be consumed by teams in different contexts called for some of the formalities of API definition to be needed, once again, in the RESTful world. In response to the need for formal RESTful API definitions, the Swagger format was created with schema definition for JSON documents at its core. This subsequently evolved into JSON Schema, and this remains the only currently available recognised standard for defining the syntax of JSON documents.

Too often in the world of computer science (and more generally), religious divides open up that polarise perspectives. XML is now typically associated with the unfairly maligned SOA and, as such, an appalling chapter of legacy IT. The WS-* standards certainly didn’t help matters but, as so often happens, the underlying intent of “modern” SOA and the principle of Service Orientation was conflated with the complexity of its implementation. This has led to a level of blindness to some of the positive aspects of XML and XML Schema that have not been successfully translated into the world of JSON and JSON Schema.

The representation of data has undergone many evolutions. The development of the relational database was clearly a vital technological revolution, but the relational model of data remains severely limited. While procedural languages allowed the use of more sophisticated data structures, the advent of Object Orientation (OO) gave programmers an importantly different perspective on how data really “works”. In the late Nineties, XML added to the way we thought about representing data, and the early Naughties saw the advent of XML Schema as a way of formally defining the syntax of XML documents.

In parallel with the evolution of XML, the early Nineties saw the beginnings of JavaScript as a way of adding dynamic functionality to static web pages. Being only tenuously related to Java, the simplicity of JavaScript, perhaps coupled with its ability to mix procedural and object-oriented paradigms, has led to its widespread adoption as both a client-side and a server-side programming language. In the early days of JavaScript development, client-server interaction was generally implemented using XML through AJAX (Asynchronous JavaScript and XML) calls, in which client-side JavaScript could receive and process XML documents to dynamically change the DOM.

While developers could easily recognise the benefits of both JavaScript and XML, it also became apparent that the native representation of data in JavaScript was much simpler and less cumbersome (e.g. no need for verbose closing tags) than XML. Furthermore, XML inherited some challenges from its predecessor, HMTL, as a format for representing structured data (the ambiguity over when to use an attribute rather than an element and the concept of mixed content), and this led to the evolution of JSON as much more consistent and lightweight format for exchanging structured data in a web context. Very quickly, the JSON format, in conjunction with the development of “Representational State Transfer” (REST) web services as an architectural style for defining APIs, drove a switch to JSON being the dominant format for data exchange in distributed APIs.

Part of the attraction of JSON was that there was no need to define a schema before creating structured data and, in the spirit of Tim Berners-Lee’s injunction to “be tolerant in what you receive and conservative in what you send”, the assumption was that the code should deal with any problems in the data. While this was largely fine for teams of web developers working together within a tightly defined context, the inevitable spread of JSON and REST into enterprise integration patterns and the need for such APIs to be consumed by teams in different contexts called for some of the formalities of API definition to be needed, once again, in the RESTful world. In response to the need for formal RESTful API definitions, the Swagger format was created with schema definition for JSON documents at its core. This subsequently evolved into JSON Schema, and this remains the only currently available recognised standard for defining the syntax of JSON documents.

Too often in the world of computer science (and more generally), religious divides open up that polarise perspectives. XML is now typically associated with the unfairly maligned SOA and, as such, an appalling chapter of legacy IT. The WS-* standards certainly didn’t help matters but, as so often happens, the underlying intent of “modern” SOA and the principle of Service Orientation was conflated with the complexity of its implementation. This has led to a level of blindness to some of the positive aspects of XML and XML Schema that have not been successfully translated into the world of JSON and JSON Schema.

While JSON is, fundamentally, a format for describing the state of “Objects”, JSON Schema follows a “subtractive” approach (i.e. through progressive restriction) to schema definition that is incompatible with an OO approach to defining data. XML Schema, by contrast, follows an “additive” approach (i.e through progressive extension) that is rooted in the principles of Object-Orientation (notably inheritance and polymorphism). In the “real world”, data is much more closely aligned to the principles of OO than it is to purely relational models, and this means that the lack of OO support within JSON Schema is a significant impediment to its general utility.

It is in this context that I seek to propose an alternative to JSON Schema that embraces the concept of OO rather than rejecting it. While a mere name should really be insignificant, there is the slight challenge that the term “JSON Schema” very accurately describes the basic need. Accordingly, I will tentatively advance the concept of “JSON Syntax Definition” (JSD) as an alternative to JSON Schema.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.