Learn Computing from the Experts | The Rheinwerk Computing Blog

What Are the CSV, XML, and JSON Data Formats?

Written by Rheinwerk Computing | Sep 2, 2024 1:00:00 PM

In this blog post, let’s discuss various data formats used primarily as an exchange format, for example, when data is sent back and forth from a server to a client, or vice versa, and between different web services.

 

For example, let’s say you want to implement a web application for managing contacts. Since users should be able to create new contacts and query and update existing contacts, contact data (such as first name and last name) must somehow get from the client to the server during creation, and vice versa, from the server to the client, when querying.

 

In such use cases, data formats or interchange formats help you structure the data.

 

 

CSV

The simplest data format is Comma Separated Values (CSV). This format is especially suitable for the exchange of simply structured table data. Individual records (rows) are introduced by a line break by default, the eponymous comma is the default separator of individual data fields, the columns. (Other characters can be specified for separating records and data fields.) Optionally, the column names can be defined in the first row in order to better label a data field.

 

In our example contact data, which is in CSV format, a total of three different datasets (contacts) are defined. Each contact contains information on the first name, last name, telephone number, and email address of a contact all separated by commas.

 

firstname,lastname,phone,email

John,Doe,01234567,john.doe@example.com

Paula,Doe,01234567,paula.doe@example.com

Peter,Doe,3456789,peter.doe@example.com

 

As you can easily imagine, the CSV format is not suitable for more complicated structured data, which might, for example, have a nested structure. For such data, you should use one of the data formats covered next.

 

XML

One of the most important exchange formats on the web is the Extensible Markup Language (XML) format. XML is a markup language that can structure data hierarchically and is quite similar to HTML. (After all, HTML is also a markup language.) In XML, you’re also dealing with elements and attributes, but now XML elements and XML attributes.

 

Unlike HTML, however, with XML, you’re free to decide which elements you use within an XML document and which attributes you use within an element. Thus, XML is extensible to meet your requirements.

 

The listing below shows a typical XML document. The first line contains information about the XML version used and the encoding; the rest of the document represents the actual content. The <contacts> element in this example is the root node. (As with HTML, only one root node can exist at a time within a document.) Below <contacts> is a child element of the <contact> type, which in turn has different child elements for the various data fields of the contacts. In contrast to CSV, XML also allows complex hierarchies or structures through the nesting of elements. Thus, elements can be logically grouped together, such as the <address> element in our example, which groups together address data (street, street number, ZIP code, and city).

 

<?xml version="1.0" encoding="UTF-8"?>

<contacts>

   <contact>

      <firstname>John</firstname>

      <lastname>Doe</lastname>

      <phone type="cell">01234567</phone>

      <email>john.doe@example.com</email>

      <address>

         <street>Sample Street</street>

         <number>99</number>

         <code>12345</code>

         <city>Sample City</city>.

      </address>

   </contact>

   <contact>

      <firstname>Paula</firstname>

      <lastname>Doe</lastname>

      <phone type="cell">01234567</phone>

      <email>paula.doe@example.com</email>

      <address>

         <street>Sample Street</street>

         <number>99</number>

         <code>12345</code>

         <city>Sample City</city>.

      </address>

   </contact>

   <contact>

      <firstname>Peter</firstname>

      <lastname>Doe</lastname>

      <phone type="landline">3456789</phone>

      <email>peter.doe@example.com</email>

      <address>

         <street>Sample Street</street>

         <number>200</number>

         <code>12345</code>

         <city>Sample City</city>.

      </address>

   </contact>

</contacts>

XML Parsers

If you want to process XML, you’ll need an XML parser, which is a component that converts XML code into a suitable model for further processing within the relevant programming language. XML parsers are available for various programming languages, and fortunately, you don’t have to reinvent the wheel here and can simply fall back on appropriate libraries.

 

Basically, several types of XML parsers exist, two of which are particularly important and will be described in a bit more detail next.

XML-DOM Parsers

To convert an XML document into a tree-like data structure, what’s called the Document Object Model (DOM) or sometimes called a DOM tree, you would use an XML-DOM parser. You can access this parser within a program using the DOM Application Programming Interface (API).

 

Parsing with DOM parsers is only suitable for small-to-medium-sized XML documents because the complete DOM tree must be kept in the memory. For large XML documents, you should instead use the second well-known type of XML parsing, which I cover next.

XML-SAX Parsers

With the Simple API for XML (SAX) option, the XML-SAX parser goes through an XML document step by step without building an object model and keeping the result in the memory. Instead, XML-SAX parsers use events to provide information about the elements, attributes, etc. that they encounter while traversing the XML document. Within a program, you then have the option of registering for these events and then responding.

 

 

 

Note: XML parsers in both variants mentioned earlier are available for a wide variety of platforms and programming languages, so they can be easily installed as packages.

XML Schemas

You can define how an XML document should be structured in what’s called an XML schema or alternatively via a document type definition (DTD). With XML schemas and DTDs, you can define rules, for example, which elements may be used in an XML document, which child elements an element may have, which attributes may (or must) be used for an element, and much more.

 

Note: In the following sections, we’ll discuss only the XML schema as an example because it is more modern than DTD and is used more frequently.

 

Based on an XML schema, you can use XML validators to check whether a given XML code corresponds to the structure specified in the schema. For example, when you implement a web service that receives data in XML format, you can verify that the data received is XML and adheres to the specified schema.

 

The next listing shows the XML schema for the XML code from before. Notice how the XML schema itself is also XML. For example, you can use the <xs:element> element and its name attribute to define which elements your XML may contain. You can use the type attribute to specify the type an element may have, for instance, whether the element contains a string as a value (in our example, the <firstname>, <lastname>, and <code> elements, among others ) or a number (in the example, the <number> element for the street number).

 

<xs:schema attributeFormDefault="unqualified" elementFormDefault=

"qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="contacts">

      <xs:complexType>

         <xs:sequence>

            <xs:element name="contact" maxOccurs="unbounded" minOccurs="0">

               <xs:complexType>

                  <xs:sequence>

                     <xs:element type="xs:string" name="firstname"/>

                     <xs:element type="xs:string" name="lastname"/>

                     <xs:element name="phone">

                        <xs:complexType>

                           <xs:simpleContent>

                              <xs:extension base="xs:int">

                                 <xs:attribute

                                    type="xs:string"

                                    name="type"

                                    use="optional"

                                 />

                              </xs:extension>

                           </xs:simpleContent>

                        </xs:complexType>

                     </xs:element>

                     <xs:element type="xs:string" name="email"/>

                     <xs:element name="address">

                        <xs:complexType>

                           <xs:sequence>

                              <xs:element type="xs:string" name="street"/>

                              <xs:element type="xs:int" name="number"/>

                              <xs:element type="xs:string" name="code"/>

                              <xs:element type="xs:string" name="city"/>

                           </xs:sequence>

                        </xs:complexType>

                     </xs:element>

                  </xs:sequence>

               </xs:complexType>

            </xs:element>

         </xs:sequence>

      </xs:complexType>

   </xs:element>

</xs:schema>

 

XML Namespaces: Note that the xs prefix used above for the element names is a namespace prefix. This prefix defines the namespace of the corresponding element. These namespaces allow you to uniquely identify elements and also, for example, to use different elements with the same name (but from different namespaces) within an XML document.

 

JSON

The JavaScript Object Notation (JSON) format is characterized above all by its simple structure and by its easy integration into JavaScript applications. Like XML, JSON is also suitable for the structured definition of data and is also commonly used as a data exchange format. In contrast to XML, however, JSON is much leaner and can be processed much more easily within JavaScript code.

 

An essential feature of the JSON format is its curly brackets, which define individual objects. Object properties (also keys) are written in double quotes and separated from their values by a colon.

 

{

   "message": "Hello World"

}

 

You can use strings, numeric values, Boolean values, arrays, or other objects as values, and the syntax is quite similar to JavaScript. The next listing shows the structure of a JSON document that contains the same data as the XML document from the previous section. In contrast to XML, note how JSON is much leaner, mainly due to the lack of opening and closing tags.

 

{

   "contacts": [

      {

         "firstname": "John",

         "lastname": "Doe",

         "phone": {

            "type": "cell",

            "number": "01234567"

         },

         "email": "john.doe@example.com",

         "address": {

            "street": "Sample Street",

            "number": 99,

            "code": "12345",

            "city": "Sample City"

         }

      },

      {

         "firstname": "Paula",

         "lastname": "Doe",

         "phone": {

            "type": "cell",

            "number": "01234567"

         },

         "email": "paula.doe@example.com",

         "address": {

            "street": "Sample Street",

            "number": 99,

            "code": "12345",

             "city": "Sample City"

         }

      },

      {

         "firstname": "Peter",

         "lastname": "Doe",

         "phone": {

            "type": "landline",

            "number": "3456789"

         },

         "email": "peter.doe@example.com",

         "address": {

            "street": "Sample Street",

            "number": 200,

            "code": "12345",

            "city": "Sample City"

         }

      }

   ]

}

JSON Parsers

To process JSON documents, you’ll need a suitable JSON parser. As with XML, corresponding libraries for JSON exist for various programming languages that you can use for this purpose.

 

 

In the case of JavaScript, parsing JSON documents is even natively built into the language, which means that you won’t need any external libraries. Instead, you can directly convert a JSON string into a JavaScript object using the JSON.parse() method.

 

const jsonString = `{

   "firstname": "John",

   "lastname": "Doe",

   "phone": {

      "type": "cell",

      "number": "01234567"

   },

   "email": "peter.doe@example.com",

   "address": {

      "street": "Sample Street",

      "number": 99,

      "code": "12345",

      "city": "Sample City"

   }

}`;

 

const person = JSON.parse(jsonString);

console.log(person.firstname);       // "John"

console.log(person.lastname);        // "Doe"

console.log(person.phone.type);      // "cell"

console.log(person.phone.number);    // "01234567"

console.log(person.email);           // "john.doe@example.com"

console.log(person.address.street);  // "Sample Street"

console.log(person.address.number);  // 99

console.log(person.address.code);    // "12345"

console.log(person.address.city);    // "Sample City"

 

Alternatively, you can even embed JSON directly in JavaScript and assign it to a variable, for example, as shown in the next listing. JavaScript then automatically recognizes the JSON code and converts it into a corresponding JavaScript object.

 

const person = {

   "firstname": "John",

   "lastname": "Doe",

   "phone": {

      "type": "cell",

      "number": "01234567"

   },

   "email": "peter.doe@example.com",

   "address": {

      "street": "Sample Street",

      "number": 99,

      "code": "12345",

      "city": "Sample City"

   }

};

 

console.log(person.firstname);      // "John"

console.log(person.lastname);       // "Doe"

console.log(person.phone.type);     // "cell"

console.log(person.phone.number);   // "01234567"

console.log(person.email);          // "john.doe@example.com"

console.log(person.address.street); // "Sample Street"

console.log(person.address.number); // 99

console.log(person.address.code);   // "12345"

console.log(person.address.city);   // "Sample City"

 

JSON Schemas

Similar to an XML schema, you can also define schemas for the JSON format. The JSON schema is also JSON code for specifying exactly what the JSON referred to by the schema may look like, for instance, what objects must be included, what properties these objects must have, what values they have, and so on. This listing shows the JSON schema for the JSON code from earlier.

 

{

   "$schema": "http://json-schema.org/draft-04/schema#",

   "type": "object",

   "properties": {

      "contacts": {

         "type": "array",

         "items": {

            "type": "object",

            "properties": {

               "firstname": {

                  "type": "string"

               },

               "lastname": {

                  "type": "string"

               },

               "phone": {

                  "type": "object",

                  "properties": {

                     "type": {

                        "type": "string"

                     },

                     "number": {

                        "type": "string"

                     }

                  },

                  "required": ["type", "number"]

               },

               "email": {

                  "type": "string"

               },

               "address": {

                  "type": "object",

                  "properties": {

                     "street": {

                        "type": "string"

                     },

                     "number": {

                        "type": "integer"

                     },

                     "code": {

                        "type": "string"

                     },

                     "city": { =

                        "type": "string"

                     }

                  },

               "required": ["street", "number", "code", "city"]

               }

            },

            "required": ["firstname", "lastname", "phone", "email", "address"]

         }

      }

   },

   "required": ["contacts"]

}

 

As with XML and XML schemas, corresponding JSON validators are available for JSON and JSON schemas, so you can check, for a given JSON and JSON schema, whether the JSON conforms to the rules defined in the JSON schema.

 

Editor’s note: This post has been adapted from a section of the book Full Stack Web Development: The Comprehensive Guide by Philip Ackermann.