Golang Streaming parsing Json

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Json-iterator Library:https://github.com/json-iterator/go

Motivation

The existing library of Golang parsing JSON is in push mode, and a library based on the pull API is missing. The other is to see how fast Golang parses the JSON, and how much more space is available.

API style

The API style is based on StAX, but specifically optimized for JSON. Easier to control than StAX and SAX. Of course, if you need the simplest, the DOM class API is the simplest. The API for using streaming pull is to maximize the control parsing process.

Parse Array

iter := ParseString(`[1,2,3]`)for iter.ReadArray() {  iter.ReadUint64()}

As you can see, the pull API has a very different style. The entire parsing process is caller-driven

Parse Object

type TestObj struct {    Field1 string    Field2 uint64}
iter := ParseString(`{"field1": "1", "field2": 2}`)obj := TestObj{}for field := iter.ReadObject(); field != ""; field = iter.ReadObject() {    switch field {    case "field1":        obj.Field1 = iter.ReadString()    case "field2":        obj.Field2 = iter.ReadUint64()    default:        iter.ReportError("bind object", "unexpected field")    }}

The parsing process does not rely on reflection, and the parsed value of what is done is entirely up to you to manipulate. Instead of binding the value to the object first, you can do some cumulative operations directly.

SKIP

iter := ParseString(`[ {"a" : [{"b": "c"}], "d": 102 }, "b"]`)iter.ReadArray()iter.Skip()iter.ReadArray()if iter.ReadString() != "b" {    t.FailNow()}

For fields that you do not care about, you can choose to skip.

Performance optimization

Another goal of this project is to see if the Golang native JSON API is fast or slow, and there is no room for improvement.

Stream-based parsing eliminates the need to read into memory at once

// "encoding/json"func Benchmark_stardard_lib(b *testing.B) {    b.ReportAllocs()    for n := 0; n < b.N; n++ {        file, _ := os.Open("/tmp/large-file.json")        result := []struct{}{}        decoder := json.NewDecoder(file)        decoder.Decode(&result)        file.Close()    }}

5 215547514 ns/op 71467118 b/op 272476 allocs/op

// "github.com/json-iterator/go"func Benchmark_jsoniter(b *testing.B) {    b.ReportAllocs()    for n := 0; n < b.N; n++ {        file, _ := os.Open("/tmp/large-file.json")        iter := jsoniter.Parse(file, 1024)        for iter.ReadArray() {            iter.Skip()        }        file.Close()    }}

Ten 110209750 ns/op 4248 b/op 5 allocs/op

You can see that the implementation of the JSON iterator is very economical for memory consumption. One-fold faster than the standard library implementation. The GC pressure is much smaller.

Directly parse out int

The parsing of int does not need to be read two times, once. Merge the implementation of parseint into JSON-parsed code.

func Benchmark_jsoniter_array(b *testing.B) {    for n := 0; n < b.N; n++ {        iter := ParseString(`[1,2,3]`)        for iter.ReadArray() {            iter.ReadUint64()        }    }}

10000000 189 Ns/op

func Benchmark_json_array(b *testing.B) {    for n := 0; n < b.N; n++ {        result := []interface{}{}        json.Unmarshal([]byte(`[1,2,3]`), &result)    }}

1000000 1327 Ns/op

This scene is 7x speed.

Non-reflective, with schema parsing

According to schema analysis, reduce if-else judgment. Direct assignment without reflection

  type Level1 struct {Hello []level2}type Level2 struct {world String}func benchmark_jsoniter_nested (b *t Esting.        B) {for n: = 0; n < b.n; n++ {iter: = parsestring (' {"" Hello ": [{" World ":" Value1 "}, {" World ":" Value2 "}]} ') L1: = level1{} for L1field: = iter. ReadObject (); L1field = ""; L1field = iter. ReadObject () {switch L1field {case "Hello": L1. Hello = Readlevel1hello (iter) default:iter. Skip ()}}}}func Readlevel1hello (iter *iterator) []level2 {l2array: = make ([]level2, 0, 2) fo R ITER. Readarray () {L2: = level2{} for L2field: = iter. ReadObject (); L2field = ""; L2field = iter. ReadObject () {switch L2field {case ' world ': L2. World = iter. ReadString () Default:iter. Skip ()}} L2array = Append (L2array, L2)} return L2array}  

2000000 640 Ns/op

func Benchmark_json_nested(b *testing.B) {    for n := 0; n < b.N; n++ {        l1 := Level1{}        json.Unmarshal([]byte(`{"hello": [{"world": "value1"}, {"world": "value2"}]}`), &l1)    }}

1000000 1816 Ns/op

Summarize

Golang's own JSON library actually has a good performance. According to Benchmark (Https://github.com/json-itera ... In fact, faster than other stream-based analytic libraries (https://github.com/ugorji/go/... )。 And this library https://github.com/pquerna/ff ... Although it is claimed to be faster, it does not support streaming parsing (requires that all []byte are read in advance into memory). In most cases, it's good enough to bring your own golang, so don't go blind to other JSON parsing libraries.

If you need a pull API, or if you need additional 2x~6x performance, consider:https://github.com/json-iterator/go

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.