Commit 5ec2a0f
authored
PPO-fix (#145)
* Investigating PPO crashing
* Removing debugging prints
* generalize calc_log_probs and refactor for SIL
* improve reliability of ppo and sil loss calc
* add log_prob nanguard at creation
* improve logger
* add computation logging
* improve debug logging
* use base_case_openai for test
* fix SIL log_probs
* fix singleton cont action separate AC output unit
* fix PPO weight copy
* replace clone with detach properly
* revert detach to clone to fix PPO
* typo
* refactor log_probs to policy_util
* add net arg to calc_pdparam function
* add PPOSIL
* refactor calc_pdparams in policy_util
* fix typo1 parent 925c1d2 commit 5ec2a0f
File tree
25 files changed
+721
-142
lines changed- slm_lab
- agent
- algorithm
- net
- env
- experiment
- lib
- spec
- test
- spec
25 files changed
+721
-142
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
122 | 122 | | |
123 | 123 | | |
124 | 124 | | |
| 125 | + | |
125 | 126 | | |
126 | 127 | | |
127 | 128 | | |
128 | 129 | | |
129 | 130 | | |
130 | 131 | | |
131 | 132 | | |
| 133 | + | |
132 | 134 | | |
133 | 135 | | |
134 | 136 | | |
| |||
144 | 146 | | |
145 | 147 | | |
146 | 148 | | |
| 149 | + | |
147 | 150 | | |
148 | 151 | | |
149 | 152 | | |
| |||
179 | 182 | | |
180 | 183 | | |
181 | 184 | | |
182 | | - | |
| 185 | + | |
183 | 186 | | |
184 | 187 | | |
185 | 188 | | |
186 | 189 | | |
187 | 190 | | |
| 191 | + | |
188 | 192 | | |
189 | 193 | | |
190 | 194 | | |
| |||
197 | 201 | | |
198 | 202 | | |
199 | 203 | | |
200 | | - | |
| 204 | + | |
201 | 205 | | |
202 | 206 | | |
203 | 207 | | |
| |||
214 | 218 | | |
215 | 219 | | |
216 | 220 | | |
217 | | - | |
| 221 | + | |
218 | 222 | | |
219 | 223 | | |
220 | 224 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
161 | 161 | | |
162 | 162 | | |
163 | 163 | | |
| 164 | + | |
| 165 | + | |
164 | 166 | | |
165 | 167 | | |
166 | 168 | | |
| |||
195 | 197 | | |
196 | 198 | | |
197 | 199 | | |
198 | | - | |
| 200 | + | |
199 | 201 | | |
200 | 202 | | |
201 | 203 | | |
| 204 | + | |
202 | 205 | | |
203 | | - | |
| 206 | + | |
204 | 207 | | |
205 | | - | |
206 | | - | |
| 208 | + | |
| 209 | + | |
207 | 210 | | |
208 | 211 | | |
209 | 212 | | |
210 | | - | |
| 213 | + | |
211 | 214 | | |
212 | 215 | | |
213 | | - | |
| 216 | + | |
214 | 217 | | |
215 | | - | |
216 | | - | |
217 | | - | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
218 | 221 | | |
219 | | - | |
| 222 | + | |
220 | 223 | | |
221 | 224 | | |
222 | 225 | | |
| 226 | + | |
223 | 227 | | |
224 | 228 | | |
225 | | - | |
| 229 | + | |
226 | 230 | | |
227 | | - | |
228 | | - | |
| 231 | + | |
| 232 | + | |
229 | 233 | | |
230 | 234 | | |
231 | 235 | | |
| |||
235 | 239 | | |
236 | 240 | | |
237 | 241 | | |
| 242 | + | |
238 | 243 | | |
239 | 244 | | |
240 | 245 | | |
| |||
264 | 269 | | |
265 | 270 | | |
266 | 271 | | |
267 | | - | |
| 272 | + | |
268 | 273 | | |
269 | 274 | | |
270 | 275 | | |
| |||
282 | 287 | | |
283 | 288 | | |
284 | 289 | | |
285 | | - | |
| 290 | + | |
286 | 291 | | |
287 | 292 | | |
288 | 293 | | |
| |||
309 | 314 | | |
310 | 315 | | |
311 | 316 | | |
312 | | - | |
| 317 | + | |
313 | 318 | | |
314 | 319 | | |
315 | 320 | | |
| |||
318 | 323 | | |
319 | 324 | | |
320 | 325 | | |
321 | | - | |
| 326 | + | |
322 | 327 | | |
323 | 328 | | |
324 | 329 | | |
| |||
329 | 334 | | |
330 | 335 | | |
331 | 336 | | |
332 | | - | |
| 337 | + | |
333 | 338 | | |
334 | 339 | | |
335 | 340 | | |
| |||
360 | 365 | | |
361 | 366 | | |
362 | 367 | | |
| 368 | + | |
363 | 369 | | |
364 | 370 | | |
365 | 371 | | |
| |||
375 | 381 | | |
376 | 382 | | |
377 | 383 | | |
| 384 | + | |
378 | 385 | | |
379 | 386 | | |
380 | 387 | | |
| |||
388 | 395 | | |
389 | 396 | | |
390 | 397 | | |
| 398 | + | |
391 | 399 | | |
392 | 400 | | |
393 | 401 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
62 | | - | |
| 62 | + | |
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
109 | 109 | | |
110 | 110 | | |
111 | 111 | | |
| 112 | + | |
112 | 113 | | |
113 | 114 | | |
114 | 115 | | |
| |||
221 | 222 | | |
222 | 223 | | |
223 | 224 | | |
| 225 | + | |
224 | 226 | | |
225 | 227 | | |
226 | 228 | | |
| |||
333 | 335 | | |
334 | 336 | | |
335 | 337 | | |
336 | | - | |
| 338 | + | |
337 | 339 | | |
338 | 340 | | |
339 | 341 | | |
340 | | - | |
| 342 | + | |
341 | 343 | | |
| 344 | + | |
342 | 345 | | |
343 | 346 | | |
344 | 347 | | |
| |||
359 | 362 | | |
360 | 363 | | |
361 | 364 | | |
| 365 | + | |
362 | 366 | | |
363 | 367 | | |
364 | 368 | | |
| |||
410 | 414 | | |
411 | 415 | | |
412 | 416 | | |
| 417 | + | |
413 | 418 | | |
414 | 419 | | |
415 | 420 | | |
| |||
432 | 437 | | |
433 | 438 | | |
434 | 439 | | |
435 | | - | |
| 440 | + | |
436 | 441 | | |
437 | 442 | | |
438 | 443 | | |
439 | 444 | | |
440 | | - | |
| 445 | + | |
441 | 446 | | |
442 | 447 | | |
443 | 448 | | |
| |||
479 | 484 | | |
480 | 485 | | |
481 | 486 | | |
| 487 | + | |
482 | 488 | | |
483 | 489 | | |
484 | 490 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
89 | | - | |
90 | | - | |
91 | | - | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| |||
155 | 156 | | |
156 | 157 | | |
157 | 158 | | |
158 | | - | |
159 | | - | |
160 | | - | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
161 | 163 | | |
162 | 164 | | |
163 | 165 | | |
164 | 166 | | |
165 | 167 | | |
166 | 168 | | |
| 169 | + | |
167 | 170 | | |
168 | 171 | | |
169 | 172 | | |
| |||
341 | 344 | | |
342 | 345 | | |
343 | 346 | | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
0 commit comments