1
- ## Anti-Patterns
2
-
3
- ### Dialing in gRPC
4
- [ ` grpc.Dial ` ] ( https://pkg.go.dev/google.golang.org/grpc#Dial ) is a function in
5
- the gRPC library that creates a virtual connection from the gRPC client to the
6
- gRPC server. It takes a target URI (which can represent the name of a logical
7
- backend service and could resolve to multiple actual addresses) and a list of
8
- options, and returns a
1
+ ## Anti-Patterns of Client creation
2
+
3
+ ### How to properly create a ` ClientConn ` : ` grpc.NewClient `
4
+
5
+ [ ` grpc.NewClient ` ] ( https://pkg.go.dev/google.golang.org/grpc#NewClient ) is the
6
+ function in the gRPC library that creates a virtual connection from a client
7
+ application to a gRPC server. It takes a target URI (which represents the name
8
+ of a logical backend service and resolves to one or more physical addresses) and
9
+ a list of options, and returns a
9
10
[ ` ClientConn ` ] ( https://pkg.go.dev/google.golang.org/grpc#ClientConn ) object that
10
- represents the connection to the server. The ` ClientConn ` contains one or more
11
- actual connections to real server backends and attempts to keep these
12
- connections healthy by automatically reconnecting to them when they break.
13
-
14
- The ` Dial ` function can also be configured with various options to customize the
15
- behavior of the client connection. For example, developers could use options
16
- such a
17
- [ ` WithTransportCredentials ` ] ( https://pkg.go.dev/google.golang.org/grpc#WithTransportCredentials )
18
- to configure the transport credentials to use.
19
-
20
- While ` Dial ` is commonly referred to as a "dialing" function, it doesn't
21
- actually perform the low-level network dialing operation like
22
- [ ` net.Dial ` ] ( https://pkg.go.dev/net#Dial ) would. Instead, it creates a virtual
23
- connection from the gRPC client to the gRPC server.
24
-
25
- ` Dial ` does initiate the process of connecting to the server, but it uses the
26
- ClientConn object to manage and maintain that connection over time. This is why
27
- errors encountered during the initial connection are no different from those
28
- that occur later on, and why it's important to handle errors from RPCs rather
29
- than relying on options like
30
- [ ` FailOnNonTempDialError ` ] ( https://pkg.go.dev/google.golang.org/grpc#FailOnNonTempDialError ) ,
31
- [ ` WithBlock ` ] ( https://pkg.go.dev/google.golang.org/grpc#WithBlock ) , and
32
- [ ` WithReturnConnectionError ` ] ( https://pkg.go.dev/google.golang.org/grpc#WithReturnConnectionError ) .
33
- In fact, ` Dial ` does not always establish a connection to servers by default.
34
- The connection behavior is determined by the load balancing policy being used.
35
- For instance, an "active" load balancing policy such as Round Robin attempts to
36
- maintain a constant connection, while the default "pick first" policy delays
37
- connection until an RPC is executed. Instead of using the WithBlock option, which
38
- may not be recommended in some cases, you can call the
39
- [ ` ClientConn.Connect ` ] ( https://pkg.go.dev/google.golang.org/grpc#ClientConn.Connect )
40
- method to explicitly initiate a connection.
41
-
42
- ### Using ` FailOnNonTempDialError ` , ` WithBlock ` , and ` WithReturnConnectionError `
43
-
44
- The gRPC API provides several options that can be used to configure the behavior
45
- of dialing and connecting to a gRPC server. Some of these options, such as
46
- ` FailOnNonTempDialError ` , ` WithBlock ` , and ` WithReturnConnectionError ` , rely on
47
- failures at dial time. However, we strongly discourage developers from using
48
- these options, as they can introduce race conditions and result in unreliable
49
- and difficult-to-debug code.
50
-
51
- One of the most important reasons for avoiding these options, which is often
52
- overlooked, is that connections can fail at any point in time. This means that
53
- you need to handle RPC failures caused by connection issues, regardless of
54
- whether a connection was never established in the first place, or if it was
55
- created and then immediately lost. Implementing proper error handling for RPCs
56
- is crucial for maintaining the reliability and stability of your gRPC
57
- communication.
58
-
59
- ### Why we discourage using ` FailOnNonTempDialError ` , ` WithBlock ` , and ` WithReturnConnectionError `
60
-
61
- When a client attempts to connect to a gRPC server, it can encounter a variety
62
- of errors, including network connectivity issues, server-side errors, and
63
- incorrect usage of the gRPC API. The options ` FailOnNonTempDialError ` ,
64
- ` WithBlock ` , and ` WithReturnConnectionError ` are designed to handle some of
65
- these errors, but they do so by relying on failures at dial time. This means
66
- that they may not provide reliable or accurate information about the status of
67
- the connection.
68
-
69
- For example, if a client uses ` WithBlock ` to wait for a connection to be
70
- established, it may end up waiting indefinitely if the server is not responding.
71
- Similarly, if a client uses ` WithReturnConnectionError ` to return a connection
72
- error if dialing fails, it may miss opportunities to recover from transient
73
- network issues that are resolved shortly after the initial dial attempt.
11
+ represents the virtual connection to the server. The ` ClientConn ` contains one
12
+ or more actual connections to real servers and attempts to maintain these
13
+ connections by automatically reconnecting to them when they break. ` NewClient `
14
+ was introduced in gRPC-Go v1.63.
15
+
16
+ ### The wrong way: ` grpc.Dial `
17
+
18
+ [ ` grpc.Dial ` ] ( https://pkg.go.dev/google.golang.org/grpc#Dial ) is a deprecated
19
+ function that also creates the same virtual connection pool as ` grpc.NewClient ` .
20
+ However, unlike ` grpc.NewClient ` , it immediately starts connecting and supports
21
+ a few additional ` DialOption ` s that control this initial connection attempt.
22
+ These are: ` WithBlock ` , ` WithTimeout ` , ` WithReturnConnectionError ` , and
23
+ `FailOnNonTempDialError.
24
+
25
+ That ` grpc.Dial ` creates connections immediately is not a problem in and of
26
+ itself, but this behavior differs from how gRPC works in all other languages,
27
+ and it can be convenient to have a constructor that does not perform I/O. It
28
+ can also be confusing to users, as most people expect a function called ` Dial `
29
+ to create _ a_ connection which may need to be recreated if it is lost.
30
+
31
+ ` grpc.Dial ` uses "passthrough" as the default name resolver for backward
32
+ compatibility while ` grpc.NewClient ` uses "dns" as its default name resolver.
33
+ This subtle diffrence is important to legacy systems that also specified a
34
+ custom dialer and expected it to receive the target string directly.
35
+
36
+ For these reasons, using ` grpc.Dial ` is discouraged. Even though it is marked
37
+ as deprecated, we will continue to support it until a v2 is released (and no
38
+ plans for a v2 exist at the time this was written).
39
+
40
+ ### Especially bad: using deprecated ` DialOptions `
41
+
42
+ ` FailOnNonTempDialError ` , ` WithBlock ` , and ` WithReturnConnectionError ` are three
43
+ ` DialOption ` s that are only supported by ` Dial ` because they only affect the
44
+ behavior of ` Dial ` itself. ` WithBlock ` causes ` Dial ` to wait until the
45
+ ` ClientConn ` reports its ` State ` as ` connectivity.Connected ` . The other two deal
46
+ with returning connection errors before the timeout (` WithTimeout ` or on the
47
+ context when using ` DialContext ` ).
48
+
49
+ The reason these options can be a problem is that connections with a
50
+ ` ClientConn ` are dynamic -- they may come and go over time. If your client
51
+ successfully connects, the server could go down 1 second later, and your RPCs
52
+ will fail. "Knowing you are connected" does not tell you much in this regard.
53
+
54
+ Additionally, _ all_ RPCs created on an "idle" or a "connecting" ` ClientConn `
55
+ will wait until their deadline or until a connection is established before
56
+ failing. This means that you don't need to check that a ` ClientConn ` is "ready"
57
+ before starting your RPCs. By default, RPCs will fail if the ` ClientConn `
58
+ enters the "transient failure" state, but setting ` WaitForReady(true) ` on a
59
+ call will cause it to queue even in the "transient failure" state, and it will
60
+ only ever fail due to a deadline, a server response, or a connection loss after
61
+ the RPC was sent to a server.
62
+
63
+ Some users of ` Dial ` use it as a way to validate the configuration of their
64
+ system. If you wish to maintain this behavior but migrate to ` NewClient ` , you
65
+ can call ` State ` and ` WaitForStateChange ` until the channel is connected.
66
+ However, if this fails, it does not mean that your configuration was bad - it
67
+ could also mean the service is not reachable by the client due to connectivity
68
+ reasons.
74
69
75
70
## Best practices for error handling in gRPC
76
71
77
72
Instead of relying on failures at dial time, we strongly encourage developers to
78
- rely on errors from RPCs. When a client makes an RPC, it can receive an error
79
- response from the server. These errors can provide valuable information about
73
+ rely on errors from RPCs. When a client makes an RPC, it can receive an error
74
+ response from the server. These errors can provide valuable information about
80
75
what went wrong, including information about network issues, server-side errors,
81
76
and incorrect usage of the gRPC API.
82
77
83
78
By handling errors from RPCs correctly, developers can write more reliable and
84
- robust gRPC applications. Here are some best practices for error handling in
79
+ robust gRPC applications. Here are some best practices for error handling in
85
80
gRPC:
86
81
87
- - Always check for error responses from RPCs and handle them appropriately.
88
- - Use the ` status ` field of the error response to determine the type of error that
89
- occurred.
82
+ - Always check for error responses from RPCs and handle them appropriately.
83
+ - Use the ` status ` field of the error response to determine the type of error
84
+ that occurred.
90
85
- When retrying failed RPCs, consider using the built-in retry mechanism
91
86
provided by gRPC-Go, if available, instead of manually implementing retries.
92
87
Refer to the [ gRPC-Go retry example
93
88
documentation] ( https://github.com/grpc/grpc-go/blob/master/examples/features/retry/README.md )
94
- for more information.
95
- - Avoid using ` FailOnNonTempDialError ` , ` WithBlock ` , and
96
- ` WithReturnConnectionError ` , as these options can introduce race conditions and
97
- result in unreliable and difficult-to-debug code.
98
- - If making the outgoing RPC in order to handle an incoming RPC, be sure to
99
- translate the status code before returning the error from your method handler.
100
- For example, if the error is an ` INVALID_ARGUMENT ` error, that probably means
89
+ for more information. Note that this is not a substitute for client-side
90
+ retries as errors that occur after an RPC starts on a server cannot be
91
+ retried through gRPC's built-in mechanism.
92
+ - If making an outgoing RPC from a server handler, be sure to translate the
93
+ status code before returning the error from your method handler. For example,
94
+ if the error is an ` INVALID_ARGUMENT ` status code, that probably means
101
95
your service has a bug (otherwise it shouldn't have triggered this error), in
102
96
which case ` INTERNAL ` is more appropriate to return back to your users.
103
97
@@ -106,7 +100,7 @@ gRPC:
106
100
The following code snippet demonstrates how to handle errors from an RPC in
107
101
gRPC:
108
102
109
- ``` go
103
+ ``` go
110
104
ctx , cancel := context.WithTimeout (context.Background (), time.Second )
111
105
defer cancel ()
112
106
@@ -118,89 +112,72 @@ if err != nil {
118
112
return nil , err
119
113
}
120
114
121
- // Use the response as appropriate
115
+ // Use the response as appropriate
122
116
log.Printf (" MyRPC response: %v " , res)
123
117
```
124
118
125
119
To determine the type of error that occurred, you can use the status field of
126
120
the error response:
127
121
128
-
129
122
``` go
130
- resp , err := client.MakeRPC (context.Background (), request)
123
+ resp , err := client.MakeRPC (context.TODO (), request)
131
124
if err != nil {
132
- status , ok := status.FromError (err)
133
- if ok {
134
- // Handle the error based on its status code
125
+ if status , ok := status.FromError (err); ok {
126
+ // Handle the error based on its status code
135
127
if status.Code () == codes.NotFound {
136
128
log.Println (" Requested resource not found" )
137
129
} else {
138
130
log.Printf (" RPC error: %v " , status.Message ())
139
131
}
140
132
} else {
141
- // Handle non-RPC errors
133
+ // Handle non-RPC errors
142
134
log.Printf (" Non-RPC error: %v " , err)
143
135
}
144
136
return
145
- }
137
+ }
146
138
147
- // Use the response as needed
148
- log.Printf (" Response received: %v " , resp)
139
+ // Use the response as needed
140
+ log.Printf (" Response received: %v " , resp)
149
141
```
150
142
151
143
### Example: Using a backoff strategy
152
144
153
-
154
145
When retrying failed RPCs, use a backoff strategy to avoid overwhelming the
155
146
server or exacerbating network issues:
156
147
157
-
158
- ``` go
148
+ ``` go
159
149
var res *MyResponse
160
150
var err error
161
151
162
- // If the user doesn't have a context with a deadline, create one
163
- ctx , cancel := context. WithTimeout (context. Background (), time. Second )
164
- defer cancel ()
152
+ retryableStatusCodes := map [codes. Code ] bool {
153
+ codes. Unavailable : true , // etc
154
+ }
165
155
166
- // Retry the RPC call a maximum number of times
156
+ // Retry the RPC a maximum number of times.
167
157
for i := 0 ; i < maxRetries; i++ {
168
-
169
- // Make the RPC call
170
- res, err = client. MyRPC (ctx, &MyRequest{})
171
-
172
- // Check if the RPC call was successful
173
- if err == nil {
174
- // The RPC was successful, so break out of the loop
158
+ // Make the RPC.
159
+ res, err = client. MyRPC (context. TODO (), &MyRequest{})
160
+
161
+ // Check if the RPC was successful.
162
+ if !retryableStatusCodes[status. Code (err)] {
163
+ // The RPC was successful or errored in a non-retryable way;
164
+ // do not retry.
175
165
break
176
166
}
177
-
178
- // The RPC failed, so wait for a backoff period before retrying
179
- backoff := time.Duration (i) * time.Second
167
+
168
+ // The RPC is retryable; wait for a backoff period before retrying.
169
+ backoff := time.Duration (i+ 1 ) * time.Second
180
170
log.Printf (" Error calling MyRPC: %v ; retrying in %v " , err, backoff)
181
171
time.Sleep (backoff)
182
172
}
183
173
184
- // Check if the RPC call was successful after all retries
174
+ // Check if the RPC was successful after all retries.
185
175
if err != nil {
186
176
// All retries failed, so handle the error appropriately
187
177
log.Printf (" Error calling MyRPC: %v " , err)
188
178
return nil , err
189
179
}
190
180
191
- // Use the response as appropriate
181
+ // Use the response as appropriate.
192
182
log.Printf (" MyRPC response: %v " , res)
193
183
```
194
-
195
-
196
- ## Conclusion
197
-
198
- The
199
- [ ` FailOnNonTempDialError ` ] ( https://pkg.go.dev/google.golang.org/grpc#FailOnNonTempDialError ) ,
200
- [ ` WithBlock ` ] ( https://pkg.go.dev/google.golang.org/grpc#WithBlock ) , and
201
- [ ` WithReturnConnectionError ` ] ( https://pkg.go.dev/google.golang.org/grpc#WithReturnConnectionError )
202
- options are designed to handle errors at dial time, but they can introduce race
203
- conditions and result in unreliable and difficult-to-debug code. Instead of
204
- relying on these options, we strongly encourage developers to rely on errors
205
- from RPCs for error handling. By following best practices for error handling in
206
- gRPC, developers can write more reliable and robust gRPC applications.
0 commit comments