|
| 1 | +[](https://github.com/elgopher/batch/actions/workflows/build.yml) |
| 2 | +[](https://pkg.go.dev/github.com/elgopher/batch) |
| 3 | +[](https://goreportcard.com/report/github.com/elgopher/batch) |
| 4 | +[](https://codecov.io/gh/elgopher/batch) |
| 5 | +[](https://www.repostatus.org/#active) |
| 6 | + |
| 7 | +## What it can be used for? |
| 8 | + |
| 9 | +To speed up application performance without sacrificing *data consistency* or *durability* and making source code or architecture complex. |
| 10 | + |
| 11 | +The **batch** package simplifies writing Go applications that process incoming requests (HTTP, GRPC etc.) in a batch manner: |
| 12 | +instead of processing each request separately, group incoming requests to a batch and run whole group at once. |
| 13 | +This method of processing can significantly speed up the application and reduce the consumption of disk, network or CPU. |
| 14 | + |
| 15 | +The **batch** package can be used to write any type of *servers* that handle thousands of requests per second. |
| 16 | +Thanks to this small library, you can create relatively simple code without the need to use low-level data structures. |
| 17 | + |
| 18 | +## Why batch processing improves performance? |
| 19 | + |
| 20 | +Normally a web application is using following pattern to modify data in the database: |
| 21 | + |
| 22 | +1. **Load resource** from database. Resource is some portion of data |
| 23 | +such as record, document etc. Lock the entire resource pessimistically |
| 24 | +or optimistically (by reading version number). |
| 25 | +2. **Apply change** to data |
| 26 | +3. **Save resource** to database. Release the pessimistic lock. Or run |
| 27 | +atomic update with version check (optimistic lock). |
| 28 | + |
| 29 | +But such architecture does not scale well if number of requests |
| 30 | +for a single resource is very high |
| 31 | +(meaning hundreds or thousands of requests per second). |
| 32 | +The lock contention in such case is very high and database is significantly |
| 33 | +overloaded. Practically, the number of concurrent requests is limited. |
| 34 | + |
| 35 | +One solution to this problem is to reduce the number of costly operations. |
| 36 | +Because a single resource is loaded and saved thousands of times per second |
| 37 | +we can instead: |
| 38 | + |
| 39 | +1. Load the resource **once** (let's say once per second) |
| 40 | +2. Execute all the requests from this period of time on an already loaded resource. Run them all sequentially. |
| 41 | +3. Save the resource and send responses to all clients if data was stored successfully. |
| 42 | + |
| 43 | +Such solution could improve the performance by a factor of 1000. And resource is still stored in a consistent state. |
| 44 | + |
| 45 | +The **batch** package does exactly that. You configure the duration of window, provide functions |
| 46 | +to load and save resource and once the request comes in - you run a function: |
| 47 | + |
| 48 | +```go |
| 49 | +// set up the batch processor: |
| 50 | +processor := batch.StartProcessor( |
| 51 | + batch.Options[*YourResource]{ // YourResource is your Go struct |
| 52 | + MinDuration: 100 * time.Millisecond, |
| 53 | + LoadResource: ..., |
| 54 | + SaveResource: ..., |
| 55 | + }, |
| 56 | +) |
| 57 | + |
| 58 | +// following code is run from http/grpc handler |
| 59 | +// resourceKey uniquely identifies the resource |
| 60 | +err := s.BatchProcessor.Run(resourceKey, func(r *YourResource) { |
| 61 | + // here is the code which is executed inside batch |
| 62 | +}) |
| 63 | +``` |
| 64 | + |
| 65 | +For real-life example see [example web application](_example). |
| 66 | + |
| 67 | +## Installation |
| 68 | + |
| 69 | +```sh |
| 70 | +# Add batch to your Go module: |
| 71 | +go get github.com/elgopher/batch |
| 72 | +``` |
| 73 | +Please note that at least **Go 1.18** is required. |
| 74 | + |
| 75 | +## Scaling out |
| 76 | + |
| 77 | +Single Go http server is able to handle up to 10-50k of requests per second on a commodity hardware. This is a lot, but very often you also need: |
| 78 | + |
| 79 | +* high availability (if one server goes down you want other to handle the traffic) |
| 80 | +* you want to handle hundred thousands or millions of requests per second |
| 81 | + |
| 82 | +For both cases you need to deploy **multiple servers** and put a **load balancer** in front of them. |
| 83 | +Please note though, that you have to carefully configure the load balancing algorithm. |
| 84 | +Round-robin is not an option here, because sooner or later you will have problems with locking |
| 85 | +(multiple server instances will run batches on the same resource). |
| 86 | +Ideal solution is to route requests based on parameters or URL. |
| 87 | +For example some http parameter could be a resource key. You can instruct load balancer |
| 88 | +to calculate hash on this parameter and always route requests with this param value |
| 89 | +to the same backend (of course if all backends are still available). |
0 commit comments