Skip to content

shdict_get flag comparison #1921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Miniwoffer opened this issue Sep 23, 2021 · 25 comments
Closed

shdict_get flag comparison #1921

Miniwoffer opened this issue Sep 23, 2021 · 25 comments

Comments

@Miniwoffer
Copy link
Contributor

Miniwoffer commented Sep 23, 2021

Proposal

Add optional flags argument to shmdict_get(_stale)? that only returns value if flags do not match.
And a 3rd (4th for _stale) return value to indicate if it is a flag match.

Example usage

local object, stale_data, lru_flag = lru:get(key)
local serialized, shm_flag, flag_match = shm:get(key, object and lru_flag)

if flag_match then
    return object
end

if serialized then
    object, err = cjson.decode(serialized)
    if object then
        lru:set(key, object, 0, shm_flag)
    end

    return object, err
end

Why?

Allocating and collecting the memory nedded to return a value can be
very expensive when value is not needed.

Bencmarking

Link to benchmark repo
Read only benchmarking, with random access order.
Tested with 5 000 000 shm lookups on a single worker.

Value 2048 bytes string
  Results: ~75.8% faster

Value 1024 bytes string
  Results: ~46.8% faster

Value 256 bytes string
  Results: ~24.7% faster

Value 16 byte string
  Results: ~19.8% faster

Value number
  Results: ~5.4% faster

Pull requests

@doujiang24
Copy link
Member

Hi @Miniwoffer
I think we also need to think about if the new API is slower than before when there is no flag used, could you please provide some bench results?

@Miniwoffer
Copy link
Contributor Author

Miniwoffer commented Sep 24, 2021

Sure, i had som trouble getting consistent results betwene runs, but managed to get some more or less reliable data.
Benchmarked with numbers, no random order and ran it for 1 800 000 000 gets.

With commits:
728602.810000 std-dev +- 3111.248909

Without commits
727996.340000 std-dev +- 2020.834505

0.083307% +0.149138 -0.149968

@Miniwoffer
Copy link
Contributor Author

Any progress?

@zhuizhuhaomeng
Copy link
Contributor

@Miniwoffer, please rebase to the master and change the lua-resty-core in .travis.yml to your branch.

@zhuizhuhaomeng
Copy link
Contributor

zhuizhuhaomeng commented Oct 4, 2021

we need to update the api in doc/HttpLuaModule.wiki.
run ngx-releng in openresty-devel-utils repo, it will convert doc/HttpLuaModule.wiki to README.markdown

@Miniwoffer
Copy link
Contributor Author

@zhuizhuhaomeng Done, travis is green on lua-nginx-module but not lua-resty-core. But master is also red, so i am unsure if its me.

@Miniwoffer
Copy link
Contributor Author

Miniwoffer commented Oct 4, 2021

Okay so there are some tests in lua-resty-core that uses get and get_stale with two arguments. Dont know why those tests specify a secound argument, so would need some guidance on how to alter those tests.

Tests in question are t/stdict.t(Test 24 and 25)

Also want me to write tests for lua-rest-core even though the tests in lua-nginx-module should cover everything?

@Miniwoffer
Copy link
Contributor Author

Fixed so Test 24 and 25 in t/shdict.t pass, just changed so key == nil is check first.

@zhuizhuhaomeng
Copy link
Contributor

@Miniwoffer
Copy link
Contributor Author

@zhuizhuhaomeng Yeah sorry about that, thought stream was generated from http so didn't implement anything in stream. I ported tho code and the tests over to the stream module.

@Miniwoffer
Copy link
Contributor Author

@zhuizhuhaomeng Think you might have missed the openresty/stream-lua-nginx-module#265 pull request.

Also am i the one who should revert the .travis.yml(cc12437, openresty/stream-lua-nginx-module@e53cdc7, openresty/lua-resty-core@358f770) changes?

Il add a summary of pull requests here and in the original post

Pull requests

@zhuizhuhaomeng
Copy link
Contributor

@Miniwoffer No, you don't need a revert .travis.yml.

@zhuizhuhaomeng
Copy link
Contributor

zhuizhuhaomeng commented Oct 7, 2021 via email

@agentzh
Copy link
Member

agentzh commented Oct 7, 2021

I think it's better to introduce a new Lua method instead of piggybacking the existing get() method so that it won't slow down get().

@bjne
Copy link

bjne commented Oct 8, 2021

Do you have a suggestion, @agentzh? get_conditional(key, flags) ? perhaps (key, flags, comparator) with comparator defaulting to "~="

or get_if(key, flags, comparator)

@agentzh
Copy link
Member

agentzh commented Oct 9, 2021

@bjne I've found the semantics a bit confusing. I don't see equivalents in redis or memcached's API. So that's probably a bad sign.

What exactly is your use case here? You're using flags as some kind of version number here? We'd better set the semantics straight first?

@bjne
Copy link

bjne commented Oct 9, 2021

@agentzh We are doing multilevel caching, db -> shm -> lru, but due to lack of (or complexity/performance) of IPC between workers, we are using the lru and shm uint32 flags to store a version of the object (actually just time of shm-insert in ms % 2^32 with a ttl of half that), but it could be any versioning ofcorse.

To keep updates to config available fast, we now get flags and value from shm, and then compare with lru flags and if they are same no need to deserialize.

This is where this idea was born, what performance can be gained from not getting the value at all from shm if we do not need it. And as shown above, quite significant numbers.

The closest thing found in redis I guess is for the CAS (compare-and-swap), dont know if COG (compare-or-get) exists in that context, but that is what we want. Also redis/memcache does not have flags afaik, so this type of op would not make sense there.

At the moment I only see the use for the 'NE' comparator, so I am fine with implementing without a comparator input or eventually with 'NE' as the default.

@bjne
Copy link

bjne commented Oct 11, 2021

@agentzh ping

@agentzh
Copy link
Member

agentzh commented Oct 13, 2021

Maybe we can add a cas() method here too?

@agentzh
Copy link
Member

agentzh commented Oct 13, 2021

BTW, we also provide lua-resty-jsonb library which does not need any serialization or de-serialization overhead when storing JSON data in shm. But it's not open source. You can contact [email protected] if you're interested.

@agentzh
Copy link
Member

agentzh commented Oct 13, 2021

@bjne BTW, memcached does support flags. that's actually where our flags come from.

@bjne
Copy link

bjne commented Oct 13, 2021

@agentzh Yeah, we can also implement a cas() method, but perhaps in a separate pull request.
But what do You think about naming of this operation, and where do we go from here?
The fact that we do not need to copy a string to lua (that needs gc) or copy the string at
all is the main performance benefit here.

@bjne
Copy link

bjne commented Oct 14, 2021

Ping @agentzh

@bjne
Copy link

bjne commented Oct 25, 2021

So where do we go from here, @agentzh. This is in many cases quite a performance gain, so I hope we can get it merged.

@Miniwoffer
Copy link
Contributor Author

New propossal #1954

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants