TT-12074 Empty interfaces being passed to pump backends

Timeout errors can occur when retrieving data from redis, especially when attempting to retrieve a large number of records:

```
time="Jan 16 10:50:11" level=error msg="Multi command failed: read tcp [::1]:56727->[::1]:6379: i/o timeout" prefix=redis
```

When resources become insufficient for larger loads, a state where the number of records created increases faster than they are purged out can be reached, so the corresponding timeout errors can be expected.

However, some (very noisy) unexpected additional error logs immediately follow the one above when this state is reached:

```
time="Jan 16 10:50:11" level=error msg="Couldn't unmarshal analytics data:EOF" analytic_key=tyk-system-analytics prefix=main
time="Jan 16 10:50:11" level=error msg="Couldn't unmarshal analytics data:EOF" analytic_key=tyk-system-analytics prefix=main
time="Jan 16 10:50:11" level=error msg="Couldn't unmarshal analytics data:EOF" analytic_key=tyk-system-analytics prefix=main
(...)
```

Depending on which pumps are configured, this can result in (also quite noisy) error logs such as:

```
time="Jan 16 10:50:33" level=error msg="Error decoding analytic record" prefix=resurface-pump
time="Jan 16 10:50:33" level=error msg="Error decoding analytic record" prefix=resurface-pump
(...)
```

In this case, the `resurfaceio` backend the following type assertion is performed on [line 217](https://github.com/TykTechnologies/tyk-pump/blob/master/pumps/resurface.go#L217):

```
decoded, ok := v.(analytics.AnalyticsRecord)
if !ok {
	rp.log.Error("Error decoding analytic record")
	continue
}
```

Which fails as the interface `v` does not hold an `analytics.AnalyticsRecord` type. This can just result in noisy logs as mentioned above, but for pumps that do not carry out a safe type assertion (`decoded := v.(analytics.AnalyticsRecord)` instead of `decoded, ok := v.(analytics.AnalyticsRecord)`), an unhandled [runtime panic could be triggered](https://go.dev/ref/spec#Type_assertions).

---

By tracing back the origin of these logs, we can see how:
 - the timeout error [is logged](https://github.com/TykTechnologies/tyk-pump/blob/master/storage/redis.go#L273) after [attempting to retrieve data from redis](https://github.com/TykTechnologies/tyk-pump/blob/master/storage/redis.go#L256) inside the `GetAndDeleteSet` method
 - the set of EOF errors [are logged](https://github.com/TykTechnologies/tyk-pump/blob/master/main.go#L256) after [attempting to unmarshal each record in the slice](https://github.com/TykTechnologies/tyk-pump/blob/master/main.go#L247)
 - multiple type assertion errors [are logged](https://github.com/TykTechnologies/tyk-pump/blob/master/pumps/resurface.go#L219) as shown above (one for each one these empty interfaces)

I believe that even though many empty records cause EOF errors at read time, many others do not, and they end up getting passed to the `writePumps` method as new [interface-wrapped decoded values](https://github.com/TykTechnologies/tyk-pump/blob/master/main.go#L259), which causes the type assertion errors.

This issue can be reproduced following the same steps described in PR #731, as the related issue can lead to a state where the  number of records builds up faster than they are purged out.






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TT-12074 Empty interfaces being passed to pump backends #768

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TT-12074 Empty interfaces being passed to pump backends #768

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions