TE 2.0 – JSON

René Reitsma; Kevin Krueger

4 TE 2.0 – JSON

JavaScript Object Notation (JSON) – The New XML

The previous chapter discussed how XML was an invention which made possible a great variety of programmatic use of web content. The new kid on the block, however, is an alternative, lighter-weight data interchange format known as JSON (pronounced: Jay-son).

Although JSON’s history goes back almost as far as XML’s, its recent rise as an alternative for XML stems from several factors:

It is lightweight in that it has less overhead than XML (just take this for granted right now; we will explain this later).
It is often ―although not necessarily― less verbose (less ‘wordy’) than XML and, therefore, faster to transfer across networks.
It is tightly linked with JavaScript, which has seen very rapid growth as the programming language for web browser-based processing.
A growing number of databases support the storage and retrieval of data as JSON.

Before we consider each of these, let us first look at JSON as a means of representing information.

JSON, as XML, is a way of hierarchically representing data in that it uses the same tree-like structure to represent information in nested form. Table 1 shows the identical information in both XML and JSON (example taken from Wikipedia’s JSON page).

Table 1. Identical data represented both in XML (left) and JSON (right)

XML

JSON

<person>
    <firstName>John</firstName>
    <lastName>Smith</lastName>
    <age>25</age>
    <address>
        <streetAddress>21 2nd Street</streetAddress>
        <city>New York</city>
        <state>NY</state>
        <postalCode>10021</postalCode>
    </address>
    <phoneNumbers>
        <phoneNumber>
            <type>home</type>
            <number>212 555-1234</number>
        </phoneNumber>
        <phoneNumber>
            <type>fax</type>
            <number>646 555-4567</number>
        </phoneNumber>
    </phoneNumbers>
    <gender>
        <type>male</type>
    </gender>
</person>

{"person":
  { "firstName": "John", 
    "lastName": "Smith", 
    "age": 25, 
    "address": 
      { "streetAddress": "21 2nd Street", 
        "city": "New York", 
        "state": "NY", 
        "postalCode": "10021" }, 
    "phoneNumber": [ 
      { "type": "home", 
        "number": "212 555-1234" }, 
      { "type": "fax", 
        "number": "646 555-4567" } ], 
    "gender": { "type": "male" } } }

Consider the JSON. It is a data structure which consists of a single complex element (person) containing six sub elements: firstName, lastName, age, address, phoneNumber, and gender. Of these, address and gender are once again complex. phoneNumber is a list containing two complex elements.

The JSON representation looks very much like a JavaScript data structure. In fact, it actually is just such a structure, which we can illustrate with the following exercise.

Exercise 4.1
Store the following content in a file foo.html and pick it up with your browser.

<html>
<script language="javascript">
var foo = {
  "firstName": "John",
  "lastName": "Smith",
  "age": 25,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021"
  },
  "phoneNumber": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "fax",
      "number": "646 555-4567"
    }
  ],
  "gender": {
    "type": "male"
  }
}
alert(foo.gender.type);
</script>
</html>

What did we just do? We declared a JavaScript variable foo and assigned it the JSON structure. Next, we passed foo’s gender.type to the JavaScript alert() method, which pops it up in your browser. So apparently, the JSON structure in Table 1 is perfect JavaScript code all by itself. Hence, it fits seamlessly and without parsing or special processing in a JavaScript program.

If we compare this code with the exercise in the previous chapter where we used JavaScript to parse XML, the advantage of using JSON over XML when working in JavaScript becomes clear. Since JSON is just JavaScript ―at least on the data side of things― when working in JavaScript, no parsing or special processing of JSON is needed. We can just grab it, store it in a variable and we are ready to go.

Of course, it does not really matter whether the JSON is embedded in the JavaScript as in our example above or if we retrieve it from an external source. In the next exercise we do the latter.

Exercise 4.2
Point your browser at https://classes.business.oregonstate.edu/reitsma/person.html and note how it results in John Smith being echoed in your browser. Now look at the person.html source code:

<html>

<p id="person"></p>

<script>
var request = new XMLHttpRequest();
request.overrideMimeType("application/json");
request.onreadystatechange = extract;
request.open("GET", 
             "https://classes.business.oregonstate.edu/reitsma/person.json", 
             true);
request.send();

function extract()
{
  if (request.readyState == 4 && request.status == 200)
  {
    var person = JSON.parse(request.responseText);
    document.getElementById("person").innerHTML = 
        person.firstName + " " + person.lastName;
  }
}
</script>

</html>

Notice how this is pretty much exactly what we did in the exercise in the previous chapter when we retrieved XML from a web source and parsed it. Here we retrieved JSON from a web source (https://classes.business.oregonstate.edu/reitsma/person.json) and echoed some of its content.

Also note the call to JSON.parse(). The method takes in a string returned from the external web source and tries parsing it into a JavaScript structure. If the string represents valid JSON ―as in our case― that will work just fine. We then assign that structure to the variable person:

var person = JSON.parse(request.responseText);

As we did in the XML variant of this, we then extract information from that person and substitute it for the content of the HTML tag with id=”person”:

document.getElementById("person").innerHTML =
    person.firstName + " " + person.lastName;

JSON in Python

Whereas JSON is particularly efficient to use in a JavaScript context, it can, just as XML, be used in other contexts as well. In fact, JSON has become such a common format for exposing and exchanging data across the web that many programming languages other than JavaScript can be used to consume or generate JSON. Let us, once again, extract John Smith from the external JSON web source, but this time using Python (3.*)

Exercise 4.3
Run the following Python (3.*) code:

import requests
import json

#Request the JSON over HTTP
try:
  response = requests.get( \
    "https://classes.business.oregonstate.edu/reitsma/person.json")
except Exception as err:
  print("Error downloading JSON...\n\n", err)
  exit(1)

#Load the retrieved JSON into a JSON structure
json_data = json.loads(response.text)

#Since JSON objects are dictionaries, we can index into them by name.
first_name = json_data["firstName"]
last_name = json_data["lastName"]
street_address = json_data["address"]["streetAddress"]

print(first_name, last_name, "\n", street_address)

DTDs or XSDs for JSON: JSON Schema

In the previous chapter we discussed how DTDs and XSDs are used to declare the syntax of XML documents. We also discussed document validation as one of the functions of these specifications. You may therefore wonder whether or not a similar standard exists for JSON documents. Indeed, there is such a standard, namely JSON Schema, sponsored by the Internet Engineering Task Force (ITEF).

JSON Schema is heavily based on the approach taken by XML Schema. Just as XSDs are written in XML, so is JSON Schema written in JSON and just as for XML, the JSON Schema is self describing.

To provide a flavor of JSON Schema, we use the example from json-schema.org (2016) of a simple product catalog. Here is the JSON for the catalog (only two products included^[1]):

[
  {
    "id": 2,
    "name": "An ice sculpture",
    "price": 12.50,
    "tags": ["cold", "ice"],
    "dimensions": {
      "length": 7.0,
      "width": 12.0,
      "height": 9.5
    },
    "warehouseLocation": {
      "latitude": -78.75,
      "longitude": 20.4
    }
  },
  {
    "id": 3,
    "name": "A blue mouse",
    "price": 25.50,
    "dimensions": {
      "length": 3.1,
      "width": 1.0,
      "height": 1.0
    },
    "warehouseLocation": {
      "latitude": 54.4,
      "longitude": -32.7
    }
  }
]

Pretty straightforward so far (note: the product catalog is a list ([…]) ). Now let us take a look at its JSON Schema:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "title": "Product set",
  "type": "array",
  "items": {
    "title": "Product",
    "type": "object",
    "properties": {
      "id": {
        "description": "The unique identifier for a product",
        "type": "number"
      },
      "name": {
        "type": "string"
      },
      "price": {
        "type": "number",
        "minimum": 0,
        "exclusiveMinimum": true
      },
      "tags": {
        "type": "array",
        "items": {
          "type": "string"
        },
        "minItems": 1,
        "uniqueItems": true
      },
      "dimensions": {
        "type": "object",
        "properties": {
          "length": {"type": "number"},
          "width": {"type": "number"},
          "height": {"type": "number"}
        },
        "required": ["length", "width", "height"]
      },
      "warehouseLocation": {
      "description": "Coordinates of the warehouse with the product",
      "$ref": "http://json-schema.org/geo"
      }
    },
    "required": ["id", "name", "price"]
  }
}

Studying this data structure, one quickly notices its ‘programming’ or ‘programmatic’ orientation. For instance, it declares variables to be of traditional data types such as string, array or number. Also, the variable required is a string array containing the strings “id”, “name” and “price”. This programmatic orientation makes JSON structures a little easier to parse than XML strings and, as mentioned before, since JSON is essentially JavaScript, makes JSON integrate seamlessly into JavaScript programs.

Discussion: XML vs. JSON; XSD vs. JSON Schema; Is This Just the Next Cycle?

In the previous section we introduced JSON Schema as a way to specify the syntax of a JSON document, just as XSD is a way to specify the syntax of an XML document. So one might ask: if JSON is essentially like XML and if it too requires a validation meta layer (JSON Schema), then what is its real advantage over XML, if any? Let us reconsider the (alleged) advantages of JSON mentioned at the start of this chapter:

JSON is light weight: it has less overhead than XML. It is less verbose than XML and, therefore, faster to transfer across networks.
It is tightly linked with JavaScript which is the de facto programming language for web browser-based computing.
A growing number of databases support the storage and retrieval of data as JSON.

The tight linkage with JavaScript and the availability of fast databases which store data as JSON are clear JSON advantages. Using ‘raw’ JSON directly in our (JavaScript) programs eliminates a parsing step. Similarly, because JSON is so closely related to object-oriented data representation, JSON structures are easily (de)serializable in object-oriented programming languages other than JavaScript. With (de)serialization we mean the conversion of a JSON string into an object in memory (deserialization) or vice versa (serialization).

The availability of JSON databases is another important factor in the attractiveness of JSON. If we can just ‘throw’ JSON structures into a database and then have the database software search those structures for certain data elements, that can make life nice and easy, especially if we are willing and able to relax on the ‘normal form’ and integrity constraints we are so accustomed to in the relational world. There are, of course, XML databases as well, yet somehow, JSON seems to be the ‘new kid in town,’ quickly either replacing XML or providing an additional format for data exchange.

What about the ‘lightweight’ argument, though? It is true that XML seems more verbose. After all, in XML we must embed data in tags whereas in JSON there is no such requirement. To gain a rough idea of the relative sizes of XML vs. JSON data sets, we compared the sizes of a small series of data sets randomly collected (scout’s honor!) from www.data.gov, available in both XML and JSON (Table 2). Except for the smallest of data sets, the XML sets are, on average, almost twice the size of the corresponding JSON sets.

Table 2: Comparison of XML and JSON data sets found at www.data.gov.
www.data.gov data set^[2]	XML (bytes)	JSON (bytes)	XML/JSON
data.consumerfinance.gov/api/views.xml{json}	132003	143515	.920
data.cdc.gov/api/views/ebbj-sh54/rows.xml?accessType=DOWNLOAD	36055	51102	.706
data.cdc.gov/api/views/w9j2-ggv5/rows.xml?accessType=DOWNLOAD	566948	352678	1.608
data.cdc.gov/api/views/fwns-azgu/rows.xml?accessType=DOWNLOAD	5178244	2353920	2.200
data.montgomerycountymd.gov/api/views/4mse-ku6q/rows.xml?accessType=DOWNLOAD	1513181656	677424751	2.234
data.illinois.gov/api/views/t224-vrp2/rows.xml?accessType=DOWNLOAD	630175	362823	1.737
data.oregon.gov/api/views/kgdq-26yj/rows.xml?accessType=DOWNLOAD	1268018	707931	1.791
data.oregon.gov/api/views/c5a8-vfhd/rows.xml?accessType=DOWNLOAD	500160	324444	1.542
data.ny.gov/api/views/rsxa-xf6b/rows.xml?accessType=DOWNLOAD	140290178	71363706	1.966
data.ny.gov/api/views/e8ky-4vqe/rows.xml?accessType=DOWNLOAD	875873994	329161433	2.661

Another dimension of ‘weight’ is the overhead: the extra load or burden associated with working with XML vs. JSON datasets. To be clear, neither XML nor JSON mandates the use of a Schema and, hence, one should not hold the existence of extensive XML schemas and the relative absence of JSON schemas as a relative JSON advantage. However, since XML is the older of the two technologies, it has a deeper penetration in organizational, business and governmental computing and, hence, its ecosystem of protocols for standardization and validation is quite encompassing. Examples of these are XSD but also protocols such as SOAP, WSDL and the now defunct UDDI which were aimed to make XML into a general and overarching data representation and data exchange mechanism. Elegant and general as these might be, they often resulted in highly complex and arcane data structures and made computing harder by adding more ‘regulation’ in the form of additional hurdles to take. These protocols can be experienced as constraints or ‘overkill’ by those who wish to rapidly develop an application without being encumbered with those ‘governance’ protocols.

Although the absence of these protocols from much of the (current) JSON ecosystem should, perhaps, not be considered as a proper argument in the XML vs. JSON debate, the relative laissez faire climate of the JSON world does seem to promote developers to move away from XML and toward JSON. Will this trend continue? That remains to be seen. As JSON becomes more entrenched in organizational, business and governmental computing and data exchanges, the desire for validation, translation and specification of rich and complex data structures will likely increase. This could then well drive ‘regulation’ in the form of protocol specification, and this implies programming overhead in pretty much the same way it occurred for XML. Still, JSON’s footprint in terms of byte size advantage, its database advantage and its (de)serialization advantage are real.

Perhaps the most likely outcome is that JSON and XML both remain relevant and complementary technologies, with JSON being most prevalent in gluing together modern applications and XML being used for marking-up content and in data exchange scenarios where rigorous validation (XSD) and transformation (XSLT) are called for.

Exercise 4.4: Who is putting out JSON services?

As with XML web services, JSON services do not typically come across people’s web browser. The reason, of course, is that those services are meant for machines (programs) to be consumed, not people. Still, with a little googling it is pretty easy to find some of the many JSON services currently in function.

One example are many of the data sets offered by the USA government on sites such as data.gov, cdc.gov, or census.gov.

Ask the API of the US census Bureau for information on an address in Corvallis, Oregon

https://geocoding.geo.census.gov/geocoder/locations/address?street=120+NW+4th+Street&city=Corvallis&state=OR&zip=97330&benchmark=Public_AR_Census2020&format=json

Request USA life expectancy data from the CDC

https://data.cdc.gov/api/views/w9j2-ggv5/rows.json

TeachEngineering too exposes some of its data as JSON. For instance, to see a complete list of all of TeachEngineering’s resources in JSON, point your browser to
```
https://www.teachengineering.org/api/standards/VisualizationExport
```

Notice that, if your web browser is set up for this, it recognizes the returned results as JSON and renders them accordingly. If your browser puts out the returned JSON as one long, unformatted string, it has not been set up to recognize JSON. There are several ways to make this string more readable though. Some of these are web browser specific, such as the JSONView plugin for the Firefox browser or JSON Viewer for the Chrome browser. A browser-agnostic, on-line service is available at http://jsonviewer.stack.hu/. Simply enter any JSON string in the site’s Text tab and click on the Viewer tab.

The data sets listed in Table 2 show that static JSON sets are also increasingly available. One only needs to peruse the thousands of data sets available through www.data.gov to notice that increasingly, JSON is one of the formats offered for data retrieval.

TeachEngineering (TE 2.0) Resources as JSON structures

In the previous chapter on XML we saw that TE 1.0 resources were stored as XML structures. This worked fine and served some valuable purposes such as resource rendering, document validation and metadata provisioning. In case you must refresh your memory on TE 1.0 XML, refer back to the TE 1.0 example.

In TE 2.0, however, we switched from XML to JSON. The switch was motivated by the reasons mentioned in the opening paragraph of this chapter: JSON is lightweight, quick to transport and perhaps most important, the recent availability of JSON-based databases which allow for fast storage and retrieval of JSON-based data.

Take a look at the (partial and abbreviated!!) JSON representation of the same ‘intraocular’ resource we looked at in XML:

{
  "Header":"<p><img data-url=
  \"mis_/activities/mis_eyes/mis_eyes_lesson01_activity1_image1web.jpg\"
  data-rights=\"Apple Valley Eye Care. Used with permission" 
  data-caption=\"As seen in the image, irreparable vision loss can occur 
  in persons with glaucoma.\"
  alt=\"A photograph of two young girls looking at a camera. 
  The edges of the image have a black vignette—a loss in clarity towards 
  the corners and sides of an image—which portrays what is seen when 
  damage to the optic nerve has occurred due to the effects of glaucoma.\"
  /></p>",
  "Dependencies":[
    {
      "Url":"mis_eyes_lesson01",
      "Description":null,
      "Text":"These Eyes!",
      "LinkType":"Lesson"
    }
  ],
  "Time":{
    "TotalMinutes":350,
    "Details":"<p>(seven 50-minute class periods)</p>"
  },
  "GroupSize":3,
  "Cost":{
    "Amount":0.3,
    "Details":"<p>Students use online web quest (free), 3D modeling app (free) 
  and a 3D printer (or modeling clay) to design and create prototypes.</p>"
  },
  "EngineeringConnection":"<p>Biomedical engineers rely on modeling to design 
  and create prototypes for devices that may not yet be approved for testing. 
  In order to prepare for the cost of manufacturing a device, careful consideration 
  goes into the potential constraints of that device. Using various software programs, 
  engineers design and visualize the device they wish to create in order to determine 
  whether the future device is worth the effort, time and expense. 
  Mirroring real-world engineers, in this activity, students play the role of 
  engineers challenged to create intraocular pressure sensor prototypes to measure 
  pressure within the eyes of people with glaucoma.</p>",
  "EngineeringCategoryType":"Category2EngineeringAnalysisOrPartialDesign",
  "Keywords":[
    "3D printer",
    "3D printing",
    "at-scale modeling",
    "biomedical"
  ],
  "EducationalStandards":[
    {
      "Id":"http://asn.jesandco.org/resources/S113010D",
      "StandardsDocumentId":"http://asn.jesandco.org/resources/D1000332",
      "Jurisdiction":"Michigan",
      "Subject":"Science",
      "ListId":null,
      "Description":[
        "Science Processes",
        "Reflection and Social Implications",
        "K-7 Standard S.RS: Develop an understanding that claims and evidence for 
  their scientific merit should be analyzed. Understand how scientists decide 
  what constitutes scientific knowledge. Develop an understanding of the importance 
  of reflection on scientific knowledge and its application to new situations to 
  better understand the role of science in society and technology.",
        "Reflecting on knowledge is the application of scientific knowledge to new 
  and different situations. Reflecting on knowledge requires careful analysis of 
  evidence that guides decision-making and the application of science throughout 
  history and within society.",
        "Design solutions to problems using technology."
      ],
      "GradeLowerBound":7,
      "GradeUpperBound":7,
      "StatementNotation":"S.RS.07.16",
      "AlternateStatementNotation":"S.RS.07.16"
    },
    {
      "Id":"http://asn.jesandco.org/resources/S114173E",
      "StandardsDocumentId":"http://asn.jesandco.org/resources/D10003E9",
      "Jurisdiction":"International Technology and Engineering Educators Association",
      "Subject":"Technology",
      "ListId":"E.",
      "Description":[
        "Design",
        "Students will develop an understanding of the attributes of design.",
        "In order to realize the attributes of design, students should learn that:",
        "Design is a creative planning process that leads to useful products and systems."
      ],
      "GradeLowerBound":6,
      "GradeUpperBound":8,
      "StatementNotation":null,
      "AlternateStatementNotation":null
    },

Comparing things with the TE 1.0 XML, things look quite similar. However, there are a few important differences:

Whereas in the XML version of the resources only a reference to an educational standard was kept, such as S113010D or S114173E, in the JSON version not only the identifiers, but all the properties of the standard ―description, grade levels, etc.― are stored as well. To anyone trained in and used to relational database modeling and so-called ‘normal form’ this raises a big red flag as it implies a potential for a lot(!) of data duplication because each time that the standard appears in a resource its entire content is stored in that resource. How likely is this to happen? Table 3 shows a tally of only the ten most-referenced standards and the number of times they occur. Just for these ten standards this results in 1,121 duplications. Add to that, that in TeachEngineering more than 1,200 different standards are used more than once and we can see why relationally trained system designers frown when they notice this. Interesting observation: proponents of non-relational (NoSQL) databases refer to this practice of duplicating data as ‘hydrating.’ Those proponents would label the above case ―each standard contains its entire data, regardless of how many other standards share that same data― as being ‘fully hydrated.’

Table 3: Ten most referenced K-12 education standards in TeachEngineering and their number of occurrences.
Standard	Number of occurrences in TeachEngineering
S11416DD	176
S11434D3	140
S2454468	127
S2454533	125
S2454534	117
S11416DA	107
S2454469	92
S11416D0	83
S1143549	81
S114174D	73
Total number of duplicates in the ten most referenced standards	1,121

It is important, however, to realize that this difference between the TE 1.0 XML representation and the TE 2.0 JSON representation is not at all related to differences in how XML and JSON represent information. After all, the designers of TE 2.0 could have easily chosen to include only the standard references in the JSON representation and leave out the standards’ contents. Choosing to include the standards’ contents in the resources and hence having to accept its consequences in the form of quite extensive data duplication, therefore, was entirely an architectural decision. We discuss this decision in the next chapter on document databases.

A second difference between the XML and JSON representation is that certain members of the JSON representation seem to contain explicit HTML. For instance, the text of the Header section of the activity JSON above contains HTML’s <p> and <img> tags. On first inspection, this may seem strange as in the previous chapter we celebrated the value of text-based web services such as the ones based on XML (and hence, JSON), because they liberated developers from the use of HTML, a language meant for formatting rather than content description. Why then, one may ask, introduce formatting instructions in the content description? Interestingly, when we take a look at the TE 1.0 XML content of that same header, we see something similar:
```
<header>
  <text_section>
    <text_block format="text">
      <text_element>
        <image description="A photograph of two young girls 
            looking at a camera. The edges of the image have
            a black vignette—a loss in clarity towards the
            corners and sides of an image—which portrays what
            is seen when damage to the optic nerve has
            occurred due to the effects of glaucoma."
          url="mis_eyes_lesson01_activity1_image1web.jpg"
          rights="Apple Valley Eye Care. Used with permission.
http://aveyecare.com/photolibrary_rf_photo_of_glaucoma_vision.jpg"
          caption="As seen in the image, irreparable vision
          loss can occur in persons with glaucoma."
          />
      </text_element>
    </text_block>
  </text_section>
</header>
```
Clearly, in both cases we see formatting instructions included in the content descriptions. In the JSON case, the instructions are pure HTML whereas in the XML case they are XML-based elements conveying the same information. So what is going on here? Why this re-mixing of content and formatting of the XML and JSON after all this work in the late 1990s and early 2000s to separate them? The reason is subtle but not uncommon. When looking at TeachEngineering pages, we see that all resources of the same kind —lessons, activities, sprinkles— all have the same basic layout. Yet not all resources from the same type are precisely the same. For instance, some resources have more images than others and some center certain sections of text whereas others do not. Since the curriculum authors have some freedom to layout contents within the structural constraints of the collection, the formatting stored in both the XML and JSON representations is that specified by the resource authors and must be considered intrinsic to the resource’s content.

Up to this point we have seen some of the differences and similarities between XML and JSON. On the face of it, the differences may seem hardly significant enough to warrant a wholesale switch from XML to JSON. Sure, JSON is perhaps a little faster and is perhaps easier to work with in JavaScript. The perspective changes quite dramatically, however, when we consider the integration of JSON and the new generation of NoSQL document databases, especially the JSON-based ones. That is the topic of our next chapter.

References

JSON-schema.org (2016) Example data. http://json-schema.org/example1.html. Accessed: 12/2016 (no longer available)

In this example we ignore details such as the units on dimensional numbers. For instance, are product dimensions in feet, inches, centimeters? And how about product weight? Similarly, any warehouse would likely have some identifier associated with it rather than just longitude and latitude. Still, as an example of JSON/Schema, this works fine. ↵
All URLs refer to the XML version of the data sets (*.xml). To retrieve the JSON versions, replace .xml with .json. ↵

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Tale of Two Systems 2E Copyright © 2022 by René Reitsma and Kevin Krueger is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.