Processing JSON with jq
Commonly, we process JSON data by writing a program to load, deserialize and manipulate this data. Depending on the programming language, this program may require an additional compilation step before being executed within a terminal. For simple operations, such as filtering and mapping, we don't need to write an additional program to perform these operations on our JSON data. Rather, we can directly manipulate our JSON data within a terminal via the jq command-line utility, which allows the editing of streamed JSON data without an interactive text editor interface (" sed for JSON"). If you're looking for a tool to retrieve JSON data from an API endpoint, process this data and save the result to a CSV, TSV or JSON file, then jq easily accomplishes this task in a single-line command. Below, I'm going to show you how to process JSON data with jq . Install the jq command-line utility by visiting the homepage of the jq website, downloading a prebuilt binary (compatible with your operating system) and executing this binary once the download is complete. Alternatively... To verify the installation was successful, restart the terminal, and inside of this terminal, enter the command jq . This should print an overview of the jq command: For extensive documentation, enter the command man jq , which summons man ual pages for the jq command: To get started, let's pretty-print a JSON dataset (with formatting and syntax-highlighting). The jq command must be passed a filter as its first argument. A filter is a program that tells jq what output should be returned given the input JSON data. The most basic filter is the pre-defined identity filter . , which tells jq to do nothing to the input JSON data and return it as is. To run jq on a JSON dataset, pipe the stringified JSON to jq (e.g., the file content of a .json file via the cat command or the JSON response from an API endpoint via the cURL command). If we pipe the JSON response of a cURL command to jq . , then jq pretty-prints this response in the terminal. Suppose we only wanted a single element from the JSON data. To access a single element from a JSON array, pass the array index filter to jq , which follows the syntax .[x] with x representing an index value (positive and negative integer). To access the first element: To access the last element: To access the penultimate (second to last) element: To access the element at index 3 : If the index value is outside of the JSON array's bounds, then no element is returned by the array index filter: Here, the dataset only contains 41 rows. Therefore, any index beyond 40 causes the filter to return no element. If an index value is omitted, then all of the elements are returned by the array index filter: Additionally, the .[] filter can be used on JSON objects to return all top-level values within the object. In case you are unsure whether the input data is not valid JSON, then append a ? to the empty square brackets to suppress errors. For example, if the input data is a stringified integer value... Without the ? , the error jq: error (at <stdin>:1): Cannot iterate over number (1) will be thrown. With the ? , this error is suppressed as if no error occurred. Suppose we only wanted a subset of the JSON data. To extract a sub-array from a JSON array, pass the array/string slice filter to jq , which follows the syntax .[x:y] with x and y representing starting (inclusive) and ending (exclusive) index values respectively (positive and negative integers). It behaves similar to JavaScript's .slice() method. To extract the first element only: To extract the last element only: To extract all elements but the first element (omit the first element): To extract all elements but the last element (omit the last element): To extract the elements at indices 3 - 5 : To retrieve the length of a JSON array, pipe the output of an identity filter to the built-in length function: This returns the total number of elements within the JSON array. For our example dataset, the total number of records returned by the NYC Open Data API is 41 . For a JSON object, the length function returns the total number of top-level keys within this object. To retrieve the length of each item of a JSON array, pipe the output of a .[] slice filter to the length function: This returns a list of each element's length. For our example dataset, each record contains four pieces of information: the year, the population of NYC for that year, the total number of gallons (in millions) of water consumed by NYC residents per day and the average number of gallons of water consumed by a NYC resident per day. If an element is a string, then length returns the string's length. If an element is a null value, then length returns zero. To retrieve the top-level keys from JSON, use the built-in keys function. These keys are returned as an array of strings. Unlike the length function, the keys function requires no filter piping. By default, these keys are sorted alphabetically. Alternatively, the keys_unsorted function does not sort keys alphabetically and returns the keys in their original order. For JSON arrays, this function returns a list of indices. Experiment with these techniques on other JSON data sources/files.