jsonlines
A simple JSON Lines (and NDJSON) parser library in Nim.
See Also: https://neroist.github.io/jsonlines/jsonlines.html
JSON Lines is a file format similar to JSON, except it can hold multiple JSON documents in a single file, which are delimited by newlines. It's a convenient format for storing structured data that may be processed one record at a time and works well with unix-style text processing tools and shell pipelines. It's also a great format for log files, and a flexible format for passing messages between cooperating processes.
This format has three main requirements:
-
UTF-8 Encoding. (although this library doesn't check for that)
-
Each line is a valid JSON value. The most common values will be objects or arrays, although any JSON value is permitted. However, with this library, you can also choose whether or not to ignore empty lines.
-
Line Separator is
'\n'
.'\r\n'
is also supported because surrounding white space is implicitly ignored when parsing JSON values.
Examples:
{"name": "Gilbert", "wins": [["straight", "7♣"], ["one pair", "10♥"]]}
{"name": "Alexa", "wins": [["two pair", "4♠"], ["two pair", "9♠"]]}
{"name": "May", "wins": []}
{"name": "Deloise", "wins": [["three of a kind", "5♣"]]}
{"some": "thing"}
{"foo": 17, "bar": false, "quux": true}
{"may": {"include": "nested", "objects": ["and", "arrays"]}}
Parsing JSON Lines
To parse JSON Lines data, all you simply have to do is use the parseJsonLines proc.
let jsonl = parseJsonLines("""{"some": "thing"}
{"foo": 17, "bar": false, "quux": true}
{"may": {"include": "nested", "objects": ["and", "arrays"]}}
""")
echo jsonl
{"some":"thing"} {"foo":17,"bar":false,"quux":true} {"may":{"include":"nested","objects":["and","arrays"]}}
This parses the data into a simple JsonLines object, which has a nodes attribute, containing a seq of all the JsonNodes in the document.
This proc also works with Streams!
Parsing from a file
⚠️ Note: This functionality is not supported on the JS backend
To parse from a file, you can:
-
Read the file and use parseJsonLines
-
Create the file stream yourself and use parseJsonLines
-
Simply call parseJsonLinesFile with the name/location of the file.
Example for #3:
Example file 1.jsonl
:
["Name", "Session", "Score", "Completed"]
["Gilbert", "2013", 24, true]
["Alexa", "2013", 29, true]
["May", "2012B", 14, false]
["Deloise", "2012A", 19, true]
Nim code:
echo parseJsonLinesFile("1.jsonl")
["Name","Session","Score","Completed"] ["Gilbert","2013",24,true] ["Alexa","2013",29,true] ["May","2012B",14,false] ["Deloise","2012A",19,true]
Retrieving JSON Data
Since JSON Lines is simply just a list of JSON values seperated by newlines,
it can be simply represented as a list of JsonNodes. The
[]
operator can be used to get the JsonNode at index idx
, and the
[]=
operator can be used to set a JsonNode at an index.
Example:
const data = """
{"creator": {"handle": "Wendigoon", "display_name": "Wendigoon"}, "video": {"id": "gCUFztOkrEU", "views": 2088488, "title": "Dante's Purgatorio & The 9 Levels of Purgatory Explained"}}
{"creator": {"handle": "TomScottGo", "display_name": "Tom Scott"}, "video": {"id": "BxV14h0kFs0", "views": 65367317, "title": "This Video Has 65,367,317 Views"}}
{"creator": {"handle": "HBMmaster", "display_name": "jan Misali"}, "video": {"id": "qID2B4MK7Y0", "views": 1272282, "title": "a better way to count"}}
{"creator": {"handle": "HBMmaster", "display_name": "jan Misali"}, "video": {"id": "2EZihKCB9iw", "views": 272019, "title": "what is toki pona? (toki pona lesson one)"}}
{"creator": {"handle": "SarahZ", "display_name": "Sarah Z"}, "video": {"id": "ohFyOjfcLWQ", "views": 3115062, "title": "A Brief History of Homestuck"}}
"""
let jsonl2 = parseJsonLines(data)
echo jsonl2[3].pretty # retrieve value, but make it pretty
# std/json is exported by jsonlines, so we can use this operator to
# construct JSON
jsonl2[3] = %* {
"creator": {
"handle": "sisterhoodofsalvationllc",
"display_name": "Sisterhood of Salvation, LLC"
},
"video": {
"id": "cX4SNX_UaZI",
"views": 393,
"title": "Awakened Waters (in partnership with SOS LLC)"
}
} # set value
echo "" # print newline to seperate values
echo jsonl2[3].pretty # echo new value, but make it pretty
{ "creator": { "handle": "HBMmaster", "display_name": "jan Misali" }, "video": { "id": "2EZihKCB9iw", "views": 272019, "title": "what is toki pona? (toki pona lesson one)" } } { "creator": { "handle": "sisterhoodofsalvationllc", "display_name": "Sisterhood of Salvation, LLC" }, "video": { "id": "cX4SNX_UaZI", "views": 393, "title": "Awakened Waters (in partnership with SOS LLC)" } }
Iterating Through JSON Lines Data
You can iterate through JSON Lines data via the standard items
iterator,
using a JsonLines object. However you can also iterate through JSON Lines
data using the
jsonLines
iterator, which accepts a string buffer
and supports the same parameters as
parseJsonLines()
. It parses the string buffer line-by-line and yields the
resulting JsonNode.
Using standard items
iterator:
let jsonl3 = parseJsonLines("""
["string", "length"]
["A", 5.8]
["B", 12.2]
["C", 0.34]
""")
for node in jsonl3:
echo node
["string","length"] ["A",5.8] ["B",12.2] ["C",0.34]
Using the jsonLines
iterator:
let jsonl4 = """
{"scp": {"item_number": "000", "name": "", "object_class": "#NULL"}}
{"scp": {"item_number": "002", "name": "The \"Living\" Room", "object_class": "Euclid"}}
{"scp": {"item_number": "093", "name": "Red Sea Object", "object_class": "Euclid"}}
{"scp": {"item_number": "102", "name": "Property of Marshall, Carter, and Dark Ltd.", "object_class": "Euclid"}}
{"scp": {"item_number": "999", "name": "The Tickle Monster", "object_class": "Safe"}}
{"scp": {"item_number": "2030", "name": "LA U GH IS F UN", "object_class": "Keter"}}
{"scp": {"item_number": "4000", "name": "Taboo", "object_class": "Keter"}}
"""
for node in jsonLines(jsonl4):
echo node["scp"]
{"item_number":"000","name":"","object_class":"#NULL"} {"item_number":"002","name":"The \"Living\" Room","object_class":"Euclid"} {"item_number":"093","name":"Red Sea Object","object_class":"Euclid"} {"item_number":"102","name":"Property of Marshall, Carter, and Dark Ltd.","object_class":"Euclid"} {"item_number":"999","name":"The Tickle Monster","object_class":"Safe"} {"item_number":"2030","name":"LA U GH IS F UN","object_class":"Keter"} {"item_number":"4000","name":"Taboo","object_class":"Keter"}
Pretty-Printing JSON Lines
You can use the pretty()
proc to pretty-print JSON Lines data. However, this results in invalid JSON
Lines, as each line is supposed to be a self-contained JSON value. However,
pretty printing it makes it easier for humans to view.
# from one of the previous sections
echo jsonl.pretty()
{ "some": "thing" } { "foo": 17, "bar": false, "quux": true } { "may": { "include": "nested", "objects": [ "and", "arrays" ] } }
You can also control the indent of the JSON via the indent
parameter.
other stuff (i guess)
ill fix this ...eventually.
Anyways, heres some of the other stuff in this library:
-
add()
: Add JsonNode to JsonLines object -
toJArray()
: Convert JsonLines into JSON array (JArray) -
toJsonLines()
: Convert openArray of JsonNodes to JsonLines