Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: elgopher/batch
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.2.1
Choose a base ref
...
head repository: elgopher/batch
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref

Commits on May 4, 2022

  1. Add a way to drop operation

    Client code executing Processor.Run might want to abort running operation if the operation was not yet run in a batch. Such situation is possible when there is a high request congestion.
    
    From now on Processor.Run will accept new parameter context.Context. This context could be cancelled by the client effectively dropping the operation if it was still waiting to be run.
    
    Example:
    
    ```
    ctx := context.WithTimeout(context.Background(), 5 * time.Second)
    err := processor.Run(ctx, "key", ...)
    // err will be OperationCancelled
    ```
    elgopher committed May 4, 2022
    Copy the full SHA
    870ccfc View commit details

Commits on May 5, 2022

  1. Increase throughput

    Current implementation is not optimal. It is using fixed-size go-routines pool. If the pool specified by the user is too small, one slow resource could block processing of other resources (different keys, but same hash). User can adjust the pool size, but it is very hard to figure out this number upfront (before running the app on production).
    
    The new implementation spawn a dedicated go-routine for each new batch. At max one go-routine is created for given resource key. Go-routine is destroyed once batch ends.
    elgopher committed May 5, 2022
    Copy the full SHA
    a90382b View commit details

Commits on May 6, 2022

  1. Update README.md

    elgopher committed May 6, 2022
    Copy the full SHA
    aff6c4d View commit details
  2. Update README.md

    Add info about server-database round-trips.
    elgopher committed May 6, 2022
    Copy the full SHA
    e7fb6a9 View commit details
  3. Update Processor.Run documentation

    Add information about always leaving resource in a consistent state.
    elgopher committed May 6, 2022
    Copy the full SHA
    8569438 View commit details
  4. [example] Do not log errors when operation was cancelled

    In case when HTTP connection was closed and the operation was still waiting to be run. Then OperationCancelled error is returned by processor.Run.
    elgopher committed May 6, 2022
    Copy the full SHA
    bc57aa9 View commit details
  5. [README] Rewording

    elgopher committed May 6, 2022
    Copy the full SHA
    a4430a1 View commit details

Commits on May 7, 2022

  1. Copy the full SHA
    a17ed95 View commit details
  2. Reuse context.Context for the entire batch

    This is needed to support database transactions. After LoadResource succeeded the context cannot be canceled, because in the database driver there might be a running go-routine dedicated for transaction which will automatically roll back the transaction once context is canceled.
    elgopher committed May 7, 2022
    Copy the full SHA
    166b653 View commit details
  3. Update README.md

    elgopher authored May 7, 2022
    Copy the full SHA
    dafdb21 View commit details
  4. Copy the full SHA
    0261d1f View commit details

Commits on May 8, 2022

  1. Copy the full SHA
    ff85cd0 View commit details
  2. Copy the full SHA
    879350a View commit details
  3. [example] Add another validation

    Validate if person already booked a different seat in the train.
    
    This validation require some CPU time (for loop 30 iterations), which makes example closer to a real-world web app.
    elgopher committed May 8, 2022
    Copy the full SHA
    398a242 View commit details

Commits on May 12, 2022

  1. Reduce CPU usage

    Processor.Run actively polls for temporary batch channel every 10 millisecond and this could eat too much CPU resources. Instead, Run method could wait until temporary batch channel is closed.
    elgopher committed May 12, 2022
    Copy the full SHA
    0dfaa34 View commit details

Commits on May 16, 2022

  1. Add batch metrics

    Provide measurements for each executed batch. This can be used to monitor running Processor and publish metrics to external systems (such as Prometheus, M3).
    elgopher committed May 16, 2022
    Copy the full SHA
    e942404 View commit details
  2. Add more context to error returned by Run

    Add information what was the cause of the error.
    elgopher committed May 16, 2022
    Copy the full SHA
    c9abb18 View commit details

Commits on May 22, 2022

  1. Move example to separate Github repo

    Move example to github.com/elgopher/batch-example repo . This work is needed because in the example I plan to use real database, Docker etc. Adding such dependencies to batch repo would create a lot of noise which I want to avoid.
    elgopher committed May 22, 2022
    Copy the full SHA
    600c019 View commit details
  2. Update README.md

    elgopher authored May 22, 2022
    Copy the full SHA
    560fcc1 View commit details

Commits on May 28, 2022

  1. [README] Remove information about using pessimistic lock

    Pessimistic locks are bad for high-throughput systems, because they require transactions, which in turn occupy database connections which have a great cost.
    
    Pessimistic locks are especially bad when using batch processing, because batch operations takes significant amount of time (hundreds of millis).
    elgopher committed May 28, 2022
    Copy the full SHA
    055f212 View commit details
  2. [go.mod] go mod tidy

    elgopher committed May 28, 2022
    Copy the full SHA
    c48a04e View commit details

Commits on Aug 30, 2023

  1. Bump testify to 1.8.4

    elgopher committed Aug 30, 2023
    Copy the full SHA
    42bdc39 View commit details
Showing with 594 additions and 521 deletions.
  1. +20 −16 README.md
  2. +0 −60 _example/http/http.go
  3. +0 −35 _example/main.go
  4. +0 −48 _example/store/file.go
  5. +0 −15 _example/train/error.go
  6. +0 −27 _example/train/service.go
  7. +0 −48 _example/train/train.go
  8. +143 −73 batch.go
  9. +5 −1 batch_bench_test.go
  10. +87 −14 batch_test.go
  11. +1 −0 error.go
  12. +4 −4 go.mod
  13. +7 −8 go.sum
  14. +116 −0 goroutine.go
  15. +0 −15 hash.go
  16. +0 −23 hash_test.go
  17. +43 −0 metric.go
  18. +168 −0 metric_test.go
  19. +0 −134 worker.go
36 changes: 20 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -7,7 +7,7 @@

## What it can be used for?

To speed up application performance **without** sacrificing *data consistency* and *data durability* or making source code/architecture complex.
To **increase** database-driven web application **throughput** without sacrificing *data consistency* and *data durability* or making source code and architecture complex.

The **batch** package simplifies writing Go applications that process incoming requests (HTTP, GRPC etc.) in a batch manner:
instead of processing each request separately, they group incoming requests to a batch and run whole group at once.
@@ -20,18 +20,21 @@ Thanks to this small library, you can create relatively simple code without the

Normally a web application is using following pattern to modify data in the database:

1. **Load resource** from database. Resource is some portion of data
such as set of records from relational database, document from Document-oriented database or value from KV store.
Lock the entire resource pessimistically or optimistically (by reading version number).
2. **Apply change** to data
3. **Save resource** to database. Release the pessimistic lock. Or run
atomic update with version check (optimistic lock).
1. **Load resource** from database. **Resource** is some portion of data
such as set of records from relational database, document from Document-oriented database or value from KV store
(in Domain-Driven Design terms it is called an [aggregate](https://martinfowler.com/bliki/DDD_Aggregate.html)).
Lock the entire resource [optimistically](https://www.martinfowler.com/eaaCatalog/optimisticOfflineLock.html)
by reading version number.
2. **Apply change** to data in plain Go
3. **Save resource** to database. Release the lock by running
atomic update with version check.

But such architecture does not scale well if number of requests
But such architecture does not scale well if the number of requests
for a single resource is very high
(meaning hundreds or thousands of requests per second).
The lock contention in such case is very high and database is significantly
overloaded. Practically, the number of concurrent requests is limited.
overloaded. Also, round-trips between application server and database add latency.
Practically, the number of concurrent requests is severely limited.

One solution to this problem is to reduce the number of costly operations.
Because a single resource is loaded and saved thousands of times per second
@@ -60,13 +63,13 @@ processor := batch.StartProcessor(
)

// And use the processor inside http/grpc handler or technology-agnostic service.
// ResourceKey can be taken from request parameter.
err := processor.Run(resourceKey, func(r *YourResource) {
// ctx is a standard context.Context and resourceKey can be taken from request parameter
err := processor.Run(ctx, resourceKey, func(r *YourResource) {
// Here you put the code which will executed sequentially inside batch
})
```

For real-life example see [example web application](_example).
**For real-life example see [example web application](https://github.com/elgopher/batch-example).**

## Installation

@@ -88,7 +91,8 @@ For both cases you need to deploy **multiple servers** and put a **load balancer
Please note though, that you have to carefully configure the load balancing algorithm.
_Round-robin_ is not an option here, because sooner or later you will have problems with locking
(multiple server instances will run batches on the same resource).
Ideal solution is to route requests based on parameters or URL.
For example some http parameter could be a resource key. You can instruct load balancer
to calculate hash on this parameter value and always route requests with this param value
to the same backend (of course if all backends are still available).
Ideal solution is to route requests based on URL path or query string parameters.
For example some http query string parameter could have a resource key. You can instruct load balancer
to calculate hash on this parameter and always route requests with the same key
to the same backend. If backend will be no longer available the load balancer should route request to a different
server.
60 changes: 0 additions & 60 deletions _example/http/http.go

This file was deleted.

35 changes: 0 additions & 35 deletions _example/main.go

This file was deleted.

48 changes: 0 additions & 48 deletions _example/store/file.go

This file was deleted.

15 changes: 0 additions & 15 deletions _example/train/error.go

This file was deleted.

27 changes: 0 additions & 27 deletions _example/train/service.go

This file was deleted.

48 changes: 0 additions & 48 deletions _example/train/train.go

This file was deleted.

Loading