Pull to refresh

One of the ways to dynamically deserialize a part of a JSON document with an unknown structure

.NET *C# *Data Engineering *
Tutorial

In this topic, I will tell you how to dynamically parse and deserialize only part of the whole JSON document. We will create an implementation for .NET Core with C# as a language.

Thanks dbc for this idea!

An issue description

Recently, at work, I was faced with the following task: to develop some tool to generate flexible reports and configure a schedule for sending these reports to the customers. Not a particularly original task. One of the biggest issues I encounter during the development process was dynamically deserializing a part of provided JSON document with the data for the report. At the compile step, we don't know the structure of this JSON and a list of fields that we need to take for the report.

For example, we have the next JSON as a data source for the report. Notice that we will get this JSON in the runtime and at the compile step we don't know the structure of this document:

Original JSON
{
    "ResponseCode" : 200,
    "ResponseMessage" : "Data was received",
    "Data":[
        {
            "Id" : 147,
            "FirstName" : "Bob",
            "MiddleName" : "Lawrence",
            "LastName" : "Welch",
            "Age" : 66,
            "Profession" : "musician",
            "Score" : 10,
            "Address": "Hollywood, Los Angeles, California, USA"
        },
        {
            "Id" : 18,
            "FirstName" : "Steven",
            "MiddleName" : "Paul",
            "LastName" : "Jobs",
            "Age" : 56,
            "Profession" : "businessman",
            "Score" : 9,
            "Address": "San Francisco, California, USA"
        },
        {
            "Id" : 594,
            "FirstName" : "Hanna",
            "MiddleName" : "",
            "LastName" : "Holasava",
            "Age" : 23,
            "Profession" : "developer",
            "Score" : 10,
            "Address": "Gomel, Belarus"
        },
        {
            "Id" : 363,
            "FirstName" : "Stephen",
            "MiddleName" : "William",
            "LastName" : "Hawking",
            "Age" : 76,
            "Profession" : "scientist",
            "Score" : 7,
            "Address": "Oxford, United Kingdom"
        }
    ]
}

And imagine that for the report we need only the next fields:

  • FirstName

  • LastName

  • Age

  • Score

We need to answer two questions: how to parse only part of the unknown JSON document and how to allow users to tell us which data we need to extract for the report?

One of the solutions

I want to suggest one of the solutions. I know that there can be some other ways to solve my problem. I will be happy if you will suggest me something different in the comments.

We will use Newtonsoft.Json.Schema NuGet package. It allows us to build custom schemas of the JSON documents and validate that some JSON corresponds to this schema. The main idea is:

  • Allow users to define JSON schema with only fields needed for the report

  • Mark all subnodes with "additionalProperties" : false property.

  • Validate provided JSON document. After validation, we will get a list of JSON tokens with an error.

  • Remove all tokens that have an error.

To extract JSchema from some JSON you can use the next tool. For our example using this tool we will get the next schema:

JSchema that was generated for our data
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "ResponseCode": {
      "type": "integer"
    },
    "ResponseMessage": {
      "type": "string"
    },
    "Data": {
      "type": "array",
      "items": [
        {
          "type": "object",
          "properties": {
            "Id": {
              "type": "integer"
            },
            "FirstName": {
              "type": "string"
            },
            "MiddleName": {
              "type": "string"
            },
            "LastName": {
              "type": "string"
            },
            "Age": {
              "type": "integer"
            },
            "Profession": {
              "type": "string"
            },
            "Score": {
              "type": "integer"
            },
            "Address": {
              "type": "string"
            }
          },
          "required": [
            "Id",
            "FirstName",
            "MiddleName",
            "LastName",
            "Age",
            "Profession",
            "Score",
            "Address"
          ]
        },
        {
          "type": "object",
          "properties": {
            "Id": {
              "type": "integer"
            },
            "FirstName": {
              "type": "string"
            },
            "MiddleName": {
              "type": "string"
            },
            "LastName": {
              "type": "string"
            },
            "Age": {
              "type": "integer"
            },
            "Profession": {
              "type": "string"
            },
            "Score": {
              "type": "integer"
            },
            "Address": {
              "type": "string"
            }
          },
          "required": [
            "Id",
            "FirstName",
            "MiddleName",
            "LastName",
            "Age",
            "Profession",
            "Score",
            "Address"
          ]
        },
        {
          "type": "object",
          "properties": {
            "Id": {
              "type": "integer"
            },
            "FirstName": {
              "type": "string"
            },
            "MiddleName": {
              "type": "string"
            },
            "LastName": {
              "type": "string"
            },
            "Age": {
              "type": "integer"
            },
            "Profession": {
              "type": "string"
            },
            "Score": {
              "type": "integer"
            },
            "Address": {
              "type": "string"
            }
          },
          "required": [
            "Id",
            "FirstName",
            "MiddleName",
            "LastName",
            "Age",
            "Profession",
            "Score",
            "Address"
          ]
        },
        {
          "type": "object",
          "properties": {
            "Id": {
              "type": "integer"
            },
            "FirstName": {
              "type": "string"
            },
            "MiddleName": {
              "type": "string"
            },
            "LastName": {
              "type": "string"
            },
            "Age": {
              "type": "integer"
            },
            "Profession": {
              "type": "string"
            },
            "Score": {
              "type": "integer"
            },
            "Address": {
              "type": "string"
            }
          },
          "required": [
            "Id",
            "FirstName",
            "MiddleName",
            "LastName",
            "Age",
            "Profession",
            "Score",
            "Address"
          ]
        }
      ]
    }
  },
  "required": [
    "ResponseCode",
    "ResponseMessage",
    "Data"
  ]
}

And we need to modify this schema. We will make the next manipulations:

  • Remove all required properties.

  • Remove unnecessary fields definition (Like ResponseCode, ResponseMessage, Id, MiddleName etc.)

  • For all nodes with type "object" add the "additionalProperties": false property

  • For the array definition change "items": [ { ... } ] to "items": { ... }

As a result, we will have the next correct schema:

Correct JSchema
{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "type": "object",
    "additionalProperties": false,
    "properties": {
        "Data": {
            "type": "array",
            "items": {
                "type": "object",
                "additionalProperties": false,
                "properties": {
                    "FirstName": {
                        "type": "string"
                    },
                    "LastName": {
                        "type": "string"
                    },
                    "Age": {
                        "type": "integer"
                    },
                    "Score": {
                        "type": "integer"
                    }
                }
            }
        }
    }
}

Then we need to validate our JSON document with built schema:

var data = JObject.Parse(jsonString); 
var isValid = data.IsValid(schema, out IList<ValidationError> errors);

Definitions for jsonString and schema variables you can find here. The errors variable will contains all validation error:

Validation errors
Validation errors

And each validation error contains a lot of info about an error. For us important is thePath and ErrorTypefields. We can use this field value to select the unnecessary tokens and ErrorType we can use to ensure that this token is superfluous.

Full info about an error
Full info about an error

!! Note that in array definition should be "items": { ... } not "items": [ { ... } ]. Otherwise, the validator will check the only first item in the provided array.

To remove some node from the Json three we will use the next code:

public static JToken RemoveFromLowestPossibleParent(this JToken node)
{
  if (node == null)
  {
    return null;
  }
  // If the parent is a JProperty, remove that instead of the token itself.
  var property = node.Parent as JProperty;
  var contained = property ?? node;
  if (contained.Parent != null)
  {
    contained.Remove();
  }
  // Also detach the node from its immediate containing property -- Remove() does not do this even though it seems like it should
  if (property != null)
  {
    property.Value = null;
  }

  return node;
}

And this is the way how we will process our validation errors:

var isValid = data.IsValid(schema, out IList<ValidationError> errors);

if (!isValid)
{
  foreach (var error in errors)
  {
    if (error.ErrorType == ErrorType.AdditionalProperties)
    {
      data.SelectToken(error.Path)?.RemoveFromLowestPossibleParent();
    }
  }
}

The full code of the program you can find in this spoiler or in my Git repository.

Thank you for your attention! Fill free to left any notes in the comment.

Full text of the program
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using Newtonsoft.Json.Schema;
using System;
using System.Collections.Generic;
using System.Data;
using System.Linq;

namespace JSchemaTester
{
    class Program
    {
        static void Main(string[] args)
        {
            string jsonString = @"{
                'ResponseCode' : 200,
                'ResponseMessage' : 'Data was received',
                'Data':[
                    {
                        'Id' : 147,
                        'FirstName' : 'Bob',
                        'MiddleName' : 'Lawrence',
                        'LastName' : 'Welch',
                        'Age' : 66,
                        'Profession' : 'musician',
                        'Score' : 10,
                        'Address': 'Hollywood, Los Angeles, California, USA'
                    },
                    {
                        'Id' : 18,
                        'FirstName' : 'Steven',
                        'MiddleName' : 'Paul',
                        'LastName' : 'Jobs',
                        'Age' : 56,
                        'Profession' : 'businessman',
                        'Score' : 9,
                        'Address': 'San Francisco, California, USA'
                    },
                    {
                        'Id' : 594,
                        'FirstName' : 'Hanna',
                        'MiddleName' : '',
                        'LastName' : 'Holasava',
                        'Age' : 23,
                        'Profession' : 'developer',
                        'Score' : 10,
                        'Address': 'Gomel, Belarus'
                    },
                    {
                        'Id' : 363,
                        'FirstName' : 'Stephen',
                        'MiddleName' : 'William',
                        'LastName' : 'Hawking',
                        'Age' : 76,
                        'Profession' : 'scientist',
                        'Score' : 7,
                        'Address': 'Oxford, United Kingdom'
                    }
                ]
            }";

            JSchema schema = JSchema.Parse(@"{
              '$schema': 'http://json-schema.org/draft-04/schema#',
              'type': 'object',
		      'additionalProperties' : false,
              'properties': {
                'Data': {
                  'type': 'array',
                  'items': 
                    {
                      'type': 'object',
		              'additionalProperties' : false,
                      'properties': {
                        'FirstName': {
                          'type': 'string'
                        },
                        'LastName': {
                          'type': 'string'
                        },
                        'Age': {
                          'type': 'integer'
                        },
                        'Score': {
                          'type': 'integer'
                        }
                      }
                    }
                }
              }
            }");

            var data = JObject.Parse(jsonString);

            var isValid = data.IsValid(schema, out IList<ValidationError> errors);

            if (!isValid)
            {
                foreach (var error in errors)
                {
                    if (error.ErrorType == ErrorType.AdditionalProperties)
                    {
                        data.SelectToken(error.Path)?.RemoveFromLowestPossibleParent();
                    }
                }
            }

            JToken dataNode = data.SearchNodeByName("Data");

            var dataTable = JsonConvert.DeserializeObject<DataTable>(dataNode.First.ToString());

            dataTable.PrintToConsole();

            Console.ReadKey();
        }
    }

    public static partial class JsonExtensions
    {
        public static JToken RemoveFromLowestPossibleParent(this JToken node)
        {
            if (node == null)
            {
                return null;
            }

            // If the parent is a JProperty, remove that instead of the token itself.
            var property = node.Parent as JProperty;
            var contained = property ?? node;
            if (contained.Parent != null)
            {
                contained.Remove();
            }

            // Also detach the node from its immediate containing property -- Remove() does not do this even though it seems like it should
            if (property != null)
            {
                property.Value = null;
            }
            return node;
        }

        public static JToken SearchNodeByName(this JToken parentNode, string nodeName)
        {
            if (parentNode == null || parentNode.Children().Count() == 0)
            {
                return null;
            }

            if (parentNode is JProperty && string.Equals(((JProperty)parentNode).Name, nodeName))
            {
                return parentNode;
            }

            foreach (JToken children in parentNode.Children())
            {
                var node = SearchNodeByName(children, nodeName);
                if (node != null)
                {
                    return node;
                }
            }

            return null;
        }
    }

    public static class DataTableExtensions
    {
        public static void PrintToConsole(this DataTable data)
        {
            string[] columnNames = data.Columns
                .Cast<DataColumn>()
                .Select(column => column.ColumnName)
                .ToArray();

            Console.WriteLine(string.Join("|", columnNames.Select(c => c.PadRight(15, ' '))));
            Console.WriteLine(string.Empty.PadRight(15 * data.Columns.Count, '-'));

            foreach (DataRow row in data.Rows)
            {
                string[] fields = row.ItemArray
                    .Select(field => field.ToString())
                    .ToArray();

                Console.WriteLine(string.Join("|", fields.Select(c => c.PadRight(15, ' '))));
            }
        }
    }
}
Tags:
Hubs:
Total votes 2: ↑2 and ↓0 +2
Views 2.1K
Comments Comments 2