-
Notifications
You must be signed in to change notification settings - Fork 13
/
Copy pathmlstm_word2vec_embedding.log
6460 lines (6453 loc) · 240 KB
/
mlstm_word2vec_embedding.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
(root)[junfeng@heartmaster reasoning_attention]$ python snli_match_lstm.py
Loading data ...
Loading train ...
550152
550149
549364
Loading dev ...
10000
10000
9842
Loading test ...
10000
10000
9824
num_epochs: 20
k: 300
batch_size: 30
display_frequency: 100
save_frequency: 1000
load previous: False
Building network ...
unchanged_W.shape: (34283, 300)
oov_in_train_W.shape: (9166, 300)
apply dropout mask id 140256215453992 to embedding matrix ...
dropout rate is 0.3
input var is hypo_var
apply dropout mask id 140256215453992 to embedding matrix ...
dropout rate is 0.3
input var is premise_var
Computing updates ...
Compiling functions ...
Training ...
train_df.shape: (549364, 4)
dev_df.shape: (9842, 4)
test_df.shape: (9824, 4)
Starting training...
Seen 3000 samples, time used: 113.717s
current training loss: 1.055556
current training accuracy: 0.457000
Seen 6000 samples, time used: 116.094s
current training loss: 1.039166
current training accuracy: 0.480833
Seen 9000 samples, time used: 113.380s
current training loss: 1.020142
current training accuracy: 0.503778
Seen 12000 samples, time used: 113.598s
current training loss: 1.008102
current training accuracy: 0.515500
Seen 15000 samples, time used: 112.010s
current training loss: 1.000636
current training accuracy: 0.522933
Seen 18000 samples, time used: 112.207s
current training loss: 0.990290
current training accuracy: 0.533111
Seen 21000 samples, time used: 114.853s
current training loss: 0.985887
current training accuracy: 0.538429
Seen 24000 samples, time used: 114.861s
current training loss: 0.981200
current training accuracy: 0.543292
Seen 27000 samples, time used: 117.284s
current training loss: 0.975245
current training accuracy: 0.548519
Seen 30000 samples, time used: 116.270s
current training loss: 0.968526
current training accuracy: 0.554067
saving to ..., time used: 1144.274s
Seen 33000 samples, time used: 115.634s
current training loss: 0.965249
current training accuracy: 0.557879
Seen 36000 samples, time used: 119.970s
current training loss: 0.961199
current training accuracy: 0.560056
Seen 39000 samples, time used: 121.854s
current training loss: 0.956373
current training accuracy: 0.563282
Seen 42000 samples, time used: 121.315s
current training loss: 0.952917
current training accuracy: 0.565857
Seen 45000 samples, time used: 118.552s
current training loss: 0.948475
current training accuracy: 0.568644
Seen 48000 samples, time used: 118.751s
current training loss: 0.944854
current training accuracy: 0.571667
Seen 51000 samples, time used: 118.196s
current training loss: 0.941263
current training accuracy: 0.574314
Seen 54000 samples, time used: 121.252s
current training loss: 0.939563
current training accuracy: 0.575037
Seen 57000 samples, time used: 118.232s
current training loss: 0.937118
current training accuracy: 0.576614
Seen 60000 samples, time used: 118.307s
current training loss: 0.934130
current training accuracy: 0.578683
saving to ..., time used: 1191.687s
Seen 63000 samples, time used: 120.935s
current training loss: 0.931266
current training accuracy: 0.580302
Seen 66000 samples, time used: 120.415s
current training loss: 0.928540
current training accuracy: 0.581909
Seen 69000 samples, time used: 119.993s
current training loss: 0.926223
current training accuracy: 0.583493
Seen 72000 samples, time used: 119.140s
current training loss: 0.923555
current training accuracy: 0.585194
Seen 75000 samples, time used: 119.911s
current training loss: 0.921569
current training accuracy: 0.586320
Seen 78000 samples, time used: 121.211s
current training loss: 0.919275
current training accuracy: 0.588218
Seen 81000 samples, time used: 123.604s
current training loss: 0.916753
current training accuracy: 0.589827
Seen 84000 samples, time used: 118.953s
current training loss: 0.915034
current training accuracy: 0.591060
Seen 87000 samples, time used: 121.948s
current training loss: 0.913255
current training accuracy: 0.591874
Seen 90000 samples, time used: 120.588s
current training loss: 0.911505
current training accuracy: 0.592922
saving to ..., time used: 1206.219s
Seen 93000 samples, time used: 120.409s
current training loss: 0.909816
current training accuracy: 0.594108
Seen 96000 samples, time used: 118.475s
current training loss: 0.907884
current training accuracy: 0.595469
Seen 99000 samples, time used: 122.721s
current training loss: 0.906557
current training accuracy: 0.596333
Seen 102000 samples, time used: 121.617s
current training loss: 0.905518
current training accuracy: 0.597529
Seen 105000 samples, time used: 116.831s
current training loss: 0.903430
current training accuracy: 0.598905
Seen 108000 samples, time used: 119.690s
current training loss: 0.902281
current training accuracy: 0.599750
Seen 111000 samples, time used: 119.180s
current training loss: 0.900586
current training accuracy: 0.600811
Seen 114000 samples, time used: 122.361s
current training loss: 0.899388
current training accuracy: 0.601693
Seen 117000 samples, time used: 119.394s
current training loss: 0.897500
current training accuracy: 0.602991
Seen 120000 samples, time used: 122.555s
current training loss: 0.895813
current training accuracy: 0.604008
saving to ..., time used: 1202.652s
Seen 123000 samples, time used: 118.910s
current training loss: 0.894277
current training accuracy: 0.604789
Seen 126000 samples, time used: 123.527s
current training loss: 0.893011
current training accuracy: 0.605310
Seen 129000 samples, time used: 122.258s
current training loss: 0.891226
current training accuracy: 0.606519
Seen 132000 samples, time used: 123.195s
current training loss: 0.889625
current training accuracy: 0.607720
Seen 135000 samples, time used: 125.319s
current training loss: 0.888271
current training accuracy: 0.608607
Seen 138000 samples, time used: 121.250s
current training loss: 0.886478
current training accuracy: 0.609891
Seen 141000 samples, time used: 122.738s
current training loss: 0.884963
current training accuracy: 0.610929
Seen 144000 samples, time used: 120.876s
current training loss: 0.883712
current training accuracy: 0.611743
Seen 147000 samples, time used: 119.668s
current training loss: 0.882778
current training accuracy: 0.612279
Seen 150000 samples, time used: 120.729s
current training loss: 0.881116
current training accuracy: 0.613420
saving to ..., time used: 1218.269s
Seen 153000 samples, time used: 121.819s
current training loss: 0.879520
current training accuracy: 0.614477
Seen 156000 samples, time used: 125.764s
current training loss: 0.877817
current training accuracy: 0.615628
Seen 159000 samples, time used: 124.872s
current training loss: 0.876970
current training accuracy: 0.616170
Seen 162000 samples, time used: 124.079s
current training loss: 0.875892
current training accuracy: 0.616840
Seen 165000 samples, time used: 122.320s
current training loss: 0.874768
current training accuracy: 0.617679
Seen 168000 samples, time used: 120.716s
current training loss: 0.873780
current training accuracy: 0.618452
Seen 171000 samples, time used: 121.592s
current training loss: 0.872167
current training accuracy: 0.619322
Seen 174000 samples, time used: 122.147s
current training loss: 0.870908
current training accuracy: 0.620149
Seen 177000 samples, time used: 125.237s
current training loss: 0.869541
current training accuracy: 0.621028
Seen 180000 samples, time used: 121.913s
current training loss: 0.868122
current training accuracy: 0.621939
saving to ..., time used: 1230.259s
Seen 183000 samples, time used: 123.007s
current training loss: 0.866726
current training accuracy: 0.622689
Seen 186000 samples, time used: 121.683s
current training loss: 0.865618
current training accuracy: 0.623344
Seen 189000 samples, time used: 121.089s
current training loss: 0.864227
current training accuracy: 0.624243
Seen 192000 samples, time used: 123.213s
current training loss: 0.863083
current training accuracy: 0.624938
Seen 195000 samples, time used: 122.517s
current training loss: 0.862133
current training accuracy: 0.625569
Seen 198000 samples, time used: 123.873s
current training loss: 0.860529
current training accuracy: 0.626424
Seen 201000 samples, time used: 121.525s
current training loss: 0.859022
current training accuracy: 0.627363
Seen 204000 samples, time used: 120.627s
current training loss: 0.857605
current training accuracy: 0.628299
Seen 207000 samples, time used: 121.792s
current training loss: 0.856297
current training accuracy: 0.629063
Seen 210000 samples, time used: 122.873s
current training loss: 0.854969
current training accuracy: 0.629895
saving to ..., time used: 1221.999s
Seen 213000 samples, time used: 122.960s
current training loss: 0.853590
current training accuracy: 0.630493
Seen 216000 samples, time used: 122.127s
current training loss: 0.852267
current training accuracy: 0.631301
Seen 219000 samples, time used: 123.525s
current training loss: 0.850964
current training accuracy: 0.632000
Seen 222000 samples, time used: 123.463s
current training loss: 0.849976
current training accuracy: 0.632707
Seen 225000 samples, time used: 121.915s
current training loss: 0.848996
current training accuracy: 0.633178
Seen 228000 samples, time used: 122.220s
current training loss: 0.847807
current training accuracy: 0.633908
Seen 231000 samples, time used: 124.871s
current training loss: 0.846406
current training accuracy: 0.634810
Seen 234000 samples, time used: 121.232s
current training loss: 0.845046
current training accuracy: 0.635727
Seen 237000 samples, time used: 120.771s
current training loss: 0.843380
current training accuracy: 0.636751
Seen 240000 samples, time used: 121.574s
current training loss: 0.841685
current training accuracy: 0.637658
saving to ..., time used: 1224.459s
Seen 243000 samples, time used: 123.243s
current training loss: 0.840422
current training accuracy: 0.638494
Seen 246000 samples, time used: 120.193s
current training loss: 0.839191
current training accuracy: 0.639305
Seen 249000 samples, time used: 121.114s
current training loss: 0.838233
current training accuracy: 0.639924
Seen 252000 samples, time used: 121.881s
current training loss: 0.836944
current training accuracy: 0.640790
Seen 255000 samples, time used: 123.720s
current training loss: 0.835541
current training accuracy: 0.641604
Seen 258000 samples, time used: 124.214s
current training loss: 0.834274
current training accuracy: 0.642384
Seen 261000 samples, time used: 121.748s
current training loss: 0.833093
current training accuracy: 0.643188
Seen 264000 samples, time used: 124.034s
current training loss: 0.831732
current training accuracy: 0.643924
Seen 267000 samples, time used: 124.763s
current training loss: 0.830350
current training accuracy: 0.644708
Seen 270000 samples, time used: 121.751s
current training loss: 0.829407
current training accuracy: 0.645374
saving to ..., time used: 1226.460s
Seen 273000 samples, time used: 122.118s
current training loss: 0.828195
current training accuracy: 0.646114
Seen 276000 samples, time used: 119.052s
current training loss: 0.826938
current training accuracy: 0.646855
Seen 279000 samples, time used: 122.783s
current training loss: 0.825529
current training accuracy: 0.647581
Seen 282000 samples, time used: 123.553s
current training loss: 0.824360
current training accuracy: 0.648316
Seen 285000 samples, time used: 121.666s
current training loss: 0.823088
current training accuracy: 0.649151
Seen 288000 samples, time used: 125.444s
current training loss: 0.821680
current training accuracy: 0.650042
Seen 291000 samples, time used: 123.149s
current training loss: 0.820333
current training accuracy: 0.650828
Seen 294000 samples, time used: 123.951s
current training loss: 0.819042
current training accuracy: 0.651650
Seen 297000 samples, time used: 123.375s
current training loss: 0.817833
current training accuracy: 0.652508
Seen 300000 samples, time used: 123.242s
current training loss: 0.816582
current training accuracy: 0.653283
saving to ..., time used: 1228.130s
Seen 303000 samples, time used: 125.682s
current training loss: 0.815524
current training accuracy: 0.653947
Seen 306000 samples, time used: 124.738s
current training loss: 0.814327
current training accuracy: 0.654680
Seen 309000 samples, time used: 125.369s
current training loss: 0.813227
current training accuracy: 0.655359
Seen 312000 samples, time used: 122.588s
current training loss: 0.811997
current training accuracy: 0.656109
Seen 315000 samples, time used: 121.939s
current training loss: 0.810775
current training accuracy: 0.656860
Seen 318000 samples, time used: 125.391s
current training loss: 0.809612
current training accuracy: 0.657538
Seen 321000 samples, time used: 122.555s
current training loss: 0.808341
current training accuracy: 0.658287
Seen 324000 samples, time used: 118.184s
current training loss: 0.807036
current training accuracy: 0.659056
Seen 327000 samples, time used: 124.918s
current training loss: 0.805978
current training accuracy: 0.659691
Seen 330000 samples, time used: 124.013s
current training loss: 0.805030
current training accuracy: 0.660261
saving to ..., time used: 1235.156s
Seen 333000 samples, time used: 123.029s
current training loss: 0.803763
current training accuracy: 0.660988
Seen 336000 samples, time used: 124.069s
current training loss: 0.802506
current training accuracy: 0.661655
Seen 339000 samples, time used: 122.483s
current training loss: 0.801164
current training accuracy: 0.662378
Seen 342000 samples, time used: 123.777s
current training loss: 0.799946
current training accuracy: 0.663073
Seen 345000 samples, time used: 122.113s
current training loss: 0.798706
current training accuracy: 0.663806
Seen 348000 samples, time used: 126.011s
current training loss: 0.797484
current training accuracy: 0.664468
Seen 351000 samples, time used: 122.982s
current training loss: 0.796238
current training accuracy: 0.665217
Seen 354000 samples, time used: 122.017s
current training loss: 0.794959
current training accuracy: 0.665949
Seen 357000 samples, time used: 122.439s
current training loss: 0.793822
current training accuracy: 0.666625
Seen 360000 samples, time used: 123.041s
current training loss: 0.792745
current training accuracy: 0.667250
saving to ..., time used: 1231.761s
Seen 363000 samples, time used: 123.784s
current training loss: 0.791565
current training accuracy: 0.667871
Seen 366000 samples, time used: 126.724s
current training loss: 0.790328
current training accuracy: 0.668557
Seen 369000 samples, time used: 123.305s
current training loss: 0.789312
current training accuracy: 0.669184
Seen 372000 samples, time used: 124.038s
current training loss: 0.788230
current training accuracy: 0.669844
Seen 375000 samples, time used: 125.794s
current training loss: 0.787027
current training accuracy: 0.670541
Seen 378000 samples, time used: 123.254s
current training loss: 0.785968
current training accuracy: 0.671093
Seen 381000 samples, time used: 126.057s
current training loss: 0.784709
current training accuracy: 0.671864
Seen 384000 samples, time used: 121.305s
current training loss: 0.783628
current training accuracy: 0.672563
Seen 387000 samples, time used: 125.177s
current training loss: 0.782466
current training accuracy: 0.673194
Seen 390000 samples, time used: 123.002s
current training loss: 0.781434
current training accuracy: 0.673728
saving to ..., time used: 1242.239s
Seen 393000 samples, time used: 123.856s
current training loss: 0.780396
current training accuracy: 0.674349
Seen 396000 samples, time used: 124.928s
current training loss: 0.779411
current training accuracy: 0.674917
Seen 399000 samples, time used: 125.600s
current training loss: 0.778450
current training accuracy: 0.675559
Seen 402000 samples, time used: 124.035s
current training loss: 0.777556
current training accuracy: 0.676112
Seen 405000 samples, time used: 124.812s
current training loss: 0.776470
current training accuracy: 0.676714
Seen 408000 samples, time used: 123.139s
current training loss: 0.775512
current training accuracy: 0.677257
Seen 411000 samples, time used: 123.532s
current training loss: 0.774459
current training accuracy: 0.677888
Seen 414000 samples, time used: 126.011s
current training loss: 0.773531
current training accuracy: 0.678367
Seen 417000 samples, time used: 121.756s
current training loss: 0.772426
current training accuracy: 0.678921
Seen 420000 samples, time used: 125.095s
current training loss: 0.771377
current training accuracy: 0.679455
saving to ..., time used: 1242.562s
Seen 423000 samples, time used: 124.254s
current training loss: 0.770408
current training accuracy: 0.680000
Seen 426000 samples, time used: 123.470s
current training loss: 0.769403
current training accuracy: 0.680542
Seen 429000 samples, time used: 122.641s
current training loss: 0.768523
current training accuracy: 0.681096
Seen 432000 samples, time used: 122.907s
current training loss: 0.767416
current training accuracy: 0.681660
Seen 435000 samples, time used: 121.848s
current training loss: 0.766504
current training accuracy: 0.682126
Seen 438000 samples, time used: 124.139s
current training loss: 0.765672
current training accuracy: 0.682543
Seen 441000 samples, time used: 125.023s
current training loss: 0.764798
current training accuracy: 0.683091
Seen 444000 samples, time used: 125.742s
current training loss: 0.763911
current training accuracy: 0.683653
Seen 447000 samples, time used: 124.462s
current training loss: 0.763041
current training accuracy: 0.684105
Seen 450000 samples, time used: 123.657s
current training loss: 0.761980
current training accuracy: 0.684716
saving to ..., time used: 1237.939s
Seen 453000 samples, time used: 122.279s
current training loss: 0.760903
current training accuracy: 0.685283
Seen 456000 samples, time used: 122.928s
current training loss: 0.759881
current training accuracy: 0.685849
Seen 459000 samples, time used: 120.162s
current training loss: 0.759038
current training accuracy: 0.686296
Seen 462000 samples, time used: 125.776s
current training loss: 0.758042
current training accuracy: 0.686866
Seen 465000 samples, time used: 123.477s
current training loss: 0.757185
current training accuracy: 0.687357
Seen 468000 samples, time used: 124.706s
current training loss: 0.756206
current training accuracy: 0.687897
Seen 471000 samples, time used: 122.313s
current training loss: 0.755157
current training accuracy: 0.688427
Seen 474000 samples, time used: 122.192s
current training loss: 0.754370
current training accuracy: 0.688867
Seen 477000 samples, time used: 125.028s
current training loss: 0.753586
current training accuracy: 0.689289
Seen 480000 samples, time used: 124.759s
current training loss: 0.752657
current training accuracy: 0.689821
saving to ..., time used: 1233.419s
Seen 483000 samples, time used: 121.838s
current training loss: 0.751635
current training accuracy: 0.690321
Seen 486000 samples, time used: 123.389s
current training loss: 0.750671
current training accuracy: 0.690879
Seen 489000 samples, time used: 125.393s
current training loss: 0.749880
current training accuracy: 0.691317
Seen 492000 samples, time used: 125.058s
current training loss: 0.749010
current training accuracy: 0.691785
Seen 495000 samples, time used: 121.820s
current training loss: 0.748090
current training accuracy: 0.692313
Seen 498000 samples, time used: 123.121s
current training loss: 0.747257
current training accuracy: 0.692759
Seen 501000 samples, time used: 124.257s
current training loss: 0.746500
current training accuracy: 0.693140
Seen 504000 samples, time used: 124.249s
current training loss: 0.745535
current training accuracy: 0.693692
Seen 507000 samples, time used: 123.955s
current training loss: 0.744505
current training accuracy: 0.694233
Seen 510000 samples, time used: 127.118s
current training loss: 0.743662
current training accuracy: 0.694622
saving to ..., time used: 1239.996s
Seen 513000 samples, time used: 124.177s
current training loss: 0.742706
current training accuracy: 0.695170
Seen 516000 samples, time used: 125.315s
current training loss: 0.741841
current training accuracy: 0.695636
Seen 519000 samples, time used: 123.407s
current training loss: 0.740943
current training accuracy: 0.696135
Seen 522000 samples, time used: 126.236s
current training loss: 0.740193
current training accuracy: 0.696544
Seen 525000 samples, time used: 125.235s
current training loss: 0.739390
current training accuracy: 0.696977
Seen 528000 samples, time used: 124.031s
current training loss: 0.738405
current training accuracy: 0.697528
Seen 531000 samples, time used: 123.827s
current training loss: 0.737563
current training accuracy: 0.697994
Seen 534000 samples, time used: 124.134s
current training loss: 0.736699
current training accuracy: 0.698451
Seen 537000 samples, time used: 126.495s
current training loss: 0.735948
current training accuracy: 0.698888
Seen 540000 samples, time used: 126.930s
current training loss: 0.735264
current training accuracy: 0.699187
saving to ..., time used: 1249.585s
Seen 543000 samples, time used: 119.587s
current training loss: 0.734478
current training accuracy: 0.699621
Seen 546000 samples, time used: 128.906s
current training loss: 0.733757
current training accuracy: 0.699978
Seen 549000 samples, time used: 123.525s
current training loss: 0.732967
current training accuracy: 0.700421
Epoch 1 of 20 took 22428.237s
training loss: 0.732875
training accuracy: 70.05 %
validation loss: 0.533690
validation accuracy: 79.24 %
Seen 3000 samples, time used: 125.579s
current training loss: 0.557400
current training accuracy: 0.792000
Seen 6000 samples, time used: 122.986s
current training loss: 0.567410
current training accuracy: 0.790000
Seen 9000 samples, time used: 124.620s
current training loss: 0.571989
current training accuracy: 0.787556
Seen 12000 samples, time used: 124.700s
current training loss: 0.570846
current training accuracy: 0.789833
Seen 15000 samples, time used: 127.579s
current training loss: 0.567116
current training accuracy: 0.791333
Seen 18000 samples, time used: 123.973s
current training loss: 0.566237
current training accuracy: 0.792167
Seen 21000 samples, time used: 123.974s
current training loss: 0.567234
current training accuracy: 0.792095
Seen 24000 samples, time used: 123.111s
current training loss: 0.563961
current training accuracy: 0.793083
Seen 27000 samples, time used: 125.848s
current training loss: 0.564304
current training accuracy: 0.792815
Seen 30000 samples, time used: 123.011s
current training loss: 0.566729
current training accuracy: 0.791367
saving to ..., time used: 1245.384s
Seen 33000 samples, time used: 123.351s
current training loss: 0.566130
current training accuracy: 0.791545
Seen 36000 samples, time used: 123.709s
current training loss: 0.564811
current training accuracy: 0.791889
Seen 39000 samples, time used: 124.315s
current training loss: 0.562805
current training accuracy: 0.792641
Seen 42000 samples, time used: 126.277s
current training loss: 0.562145
current training accuracy: 0.792762
Seen 45000 samples, time used: 123.056s
current training loss: 0.563214
current training accuracy: 0.792311
Seen 48000 samples, time used: 125.306s
current training loss: 0.562709
current training accuracy: 0.792833
Seen 51000 samples, time used: 123.609s
current training loss: 0.562533
current training accuracy: 0.792843
Seen 54000 samples, time used: 124.674s
current training loss: 0.562634
current training accuracy: 0.792741
Seen 57000 samples, time used: 129.124s
current training loss: 0.563577
current training accuracy: 0.792123
Seen 60000 samples, time used: 122.974s
current training loss: 0.562541
current training accuracy: 0.793033
saving to ..., time used: 1246.181s
Seen 63000 samples, time used: 123.382s
current training loss: 0.562816
current training accuracy: 0.792746
Seen 66000 samples, time used: 127.295s
current training loss: 0.563294
current training accuracy: 0.792348
Seen 69000 samples, time used: 127.964s
current training loss: 0.564582
current training accuracy: 0.791870
Seen 72000 samples, time used: 125.874s
current training loss: 0.564426
current training accuracy: 0.792181
Seen 75000 samples, time used: 128.213s
current training loss: 0.563493
current training accuracy: 0.792507
Seen 78000 samples, time used: 123.910s
current training loss: 0.563100
current training accuracy: 0.792538
Seen 81000 samples, time used: 123.107s
current training loss: 0.563725
current training accuracy: 0.792667
Seen 84000 samples, time used: 122.250s
current training loss: 0.563254
current training accuracy: 0.792893
Seen 87000 samples, time used: 124.313s
current training loss: 0.563425
current training accuracy: 0.792966
Seen 90000 samples, time used: 121.288s
current training loss: 0.563915
current training accuracy: 0.792611
saving to ..., time used: 1247.393s
Seen 93000 samples, time used: 124.726s
current training loss: 0.563595
current training accuracy: 0.792989
Seen 96000 samples, time used: 126.783s
current training loss: 0.562982
current training accuracy: 0.793250
Seen 99000 samples, time used: 127.218s
current training loss: 0.563152
current training accuracy: 0.793263
Seen 102000 samples, time used: 125.325s
current training loss: 0.562477
current training accuracy: 0.793510
Seen 105000 samples, time used: 122.345s
current training loss: 0.562052
current training accuracy: 0.793667
Seen 108000 samples, time used: 125.245s
current training loss: 0.561585
current training accuracy: 0.793889
Seen 111000 samples, time used: 125.343s
current training loss: 0.561511
current training accuracy: 0.794207
Seen 114000 samples, time used: 123.046s
current training loss: 0.560873
current training accuracy: 0.794526
Seen 117000 samples, time used: 123.274s
current training loss: 0.560337
current training accuracy: 0.794786
Seen 120000 samples, time used: 124.271s
current training loss: 0.560475
current training accuracy: 0.794792
saving to ..., time used: 1247.120s
Seen 123000 samples, time used: 123.645s
current training loss: 0.560191
current training accuracy: 0.794756
Seen 126000 samples, time used: 121.857s
current training loss: 0.559793
current training accuracy: 0.795048
Seen 129000 samples, time used: 124.105s
current training loss: 0.559643
current training accuracy: 0.795109
Seen 132000 samples, time used: 126.508s
current training loss: 0.559057
current training accuracy: 0.795326
Seen 135000 samples, time used: 127.752s
current training loss: 0.558942
current training accuracy: 0.795163
Seen 138000 samples, time used: 127.490s
current training loss: 0.558671
current training accuracy: 0.795406
Seen 141000 samples, time used: 124.277s
current training loss: 0.558010
current training accuracy: 0.795667
Seen 144000 samples, time used: 125.189s
current training loss: 0.557373
current training accuracy: 0.795896
Seen 147000 samples, time used: 124.958s
current training loss: 0.557018
current training accuracy: 0.795939
Seen 150000 samples, time used: 126.339s
current training loss: 0.556682
current training accuracy: 0.796233
saving to ..., time used: 1251.914s
Seen 153000 samples, time used: 122.864s
current training loss: 0.556046
current training accuracy: 0.796477
Seen 156000 samples, time used: 124.007s
current training loss: 0.555981
current training accuracy: 0.796494
Seen 159000 samples, time used: 125.496s
current training loss: 0.555637
current training accuracy: 0.796553
Seen 162000 samples, time used: 123.663s
current training loss: 0.554748
current training accuracy: 0.797025
Seen 165000 samples, time used: 122.688s
current training loss: 0.554503
current training accuracy: 0.797079
Seen 168000 samples, time used: 124.208s
current training loss: 0.553895
current training accuracy: 0.797238
Seen 171000 samples, time used: 122.991s
current training loss: 0.553506
current training accuracy: 0.797333
Seen 174000 samples, time used: 126.740s
current training loss: 0.553609
current training accuracy: 0.797253
Seen 177000 samples, time used: 127.478s
current training loss: 0.553652
current training accuracy: 0.797322
Seen 180000 samples, time used: 123.268s
current training loss: 0.553318
current training accuracy: 0.797511
saving to ..., time used: 1243.202s
Seen 183000 samples, time used: 121.653s
current training loss: 0.553227
current training accuracy: 0.797443
Seen 186000 samples, time used: 123.950s
current training loss: 0.552563
current training accuracy: 0.797742
Seen 189000 samples, time used: 122.707s
current training loss: 0.552437
current training accuracy: 0.797942
Seen 192000 samples, time used: 125.109s
current training loss: 0.551930
current training accuracy: 0.798135
Seen 195000 samples, time used: 125.598s
current training loss: 0.551489
current training accuracy: 0.798349
Seen 198000 samples, time used: 127.895s
current training loss: 0.551098
current training accuracy: 0.798616
Seen 201000 samples, time used: 124.878s
current training loss: 0.550763
current training accuracy: 0.798736
Seen 204000 samples, time used: 124.177s
current training loss: 0.550677
current training accuracy: 0.798804
Seen 207000 samples, time used: 125.565s
current training loss: 0.550874
current training accuracy: 0.798710
Seen 210000 samples, time used: 124.309s
current training loss: 0.550728
current training accuracy: 0.798781
saving to ..., time used: 1245.641s
Seen 213000 samples, time used: 122.939s
current training loss: 0.550445
current training accuracy: 0.798808
Seen 216000 samples, time used: 125.089s
current training loss: 0.549908
current training accuracy: 0.799093
Seen 219000 samples, time used: 124.284s
current training loss: 0.549557
current training accuracy: 0.799224
Seen 222000 samples, time used: 123.999s
current training loss: 0.549403
current training accuracy: 0.799320
Seen 225000 samples, time used: 133.539s
current training loss: 0.549564
current training accuracy: 0.799284
Seen 228000 samples, time used: 126.162s
current training loss: 0.549147
current training accuracy: 0.799478
Seen 231000 samples, time used: 123.660s
current training loss: 0.548734
current training accuracy: 0.799727
Seen 234000 samples, time used: 128.673s
current training loss: 0.548342
current training accuracy: 0.799932
Seen 237000 samples, time used: 124.071s
current training loss: 0.548011
current training accuracy: 0.800051
Seen 240000 samples, time used: 126.850s
current training loss: 0.547945
current training accuracy: 0.800108
saving to ..., time used: 1259.066s
Seen 243000 samples, time used: 127.058s
current training loss: 0.547607
current training accuracy: 0.800160
Seen 246000 samples, time used: 126.253s
current training loss: 0.547440
current training accuracy: 0.800130
Seen 249000 samples, time used: 125.352s
current training loss: 0.547322
current training accuracy: 0.800157
Seen 252000 samples, time used: 123.823s
current training loss: 0.546847
current training accuracy: 0.800361
Seen 255000 samples, time used: 124.388s
current training loss: 0.546619
current training accuracy: 0.800463
Seen 258000 samples, time used: 125.377s
current training loss: 0.546632
current training accuracy: 0.800535
Seen 261000 samples, time used: 128.114s
current training loss: 0.546443
current training accuracy: 0.800540
Seen 264000 samples, time used: 124.668s
current training loss: 0.546481
current training accuracy: 0.800534
Seen 267000 samples, time used: 127.507s
current training loss: 0.546292
current training accuracy: 0.800670
Seen 270000 samples, time used: 123.009s
current training loss: 0.546276
current training accuracy: 0.800704
saving to ..., time used: 1255.303s
Seen 273000 samples, time used: 123.079s
current training loss: 0.546053
current training accuracy: 0.800912
Seen 276000 samples, time used: 124.250s
current training loss: 0.545938
current training accuracy: 0.800953
Seen 279000 samples, time used: 123.827s
current training loss: 0.545683
current training accuracy: 0.801097
Seen 282000 samples, time used: 123.692s
current training loss: 0.545361
current training accuracy: 0.801209
Seen 285000 samples, time used: 124.439s
current training loss: 0.545292
current training accuracy: 0.801221
Seen 288000 samples, time used: 124.823s
current training loss: 0.544954
current training accuracy: 0.801375
Seen 291000 samples, time used: 124.521s
current training loss: 0.544681
current training accuracy: 0.801450
Seen 294000 samples, time used: 123.502s
current training loss: 0.544374
current training accuracy: 0.801578
Seen 297000 samples, time used: 124.037s
current training loss: 0.544411
current training accuracy: 0.801515
Seen 300000 samples, time used: 125.618s
current training loss: 0.544075
current training accuracy: 0.801657
saving to ..., time used: 1241.590s
Seen 303000 samples, time used: 127.595s
current training loss: 0.543950
current training accuracy: 0.801680
Seen 306000 samples, time used: 127.936s
current training loss: 0.543954
current training accuracy: 0.801739
Seen 309000 samples, time used: 123.266s
current training loss: 0.543756
current training accuracy: 0.801851
Seen 312000 samples, time used: 124.229s
current training loss: 0.543623
current training accuracy: 0.801859
Seen 315000 samples, time used: 125.194s
current training loss: 0.543538
current training accuracy: 0.801838
Seen 318000 samples, time used: 126.357s
current training loss: 0.543352
current training accuracy: 0.801950
Seen 321000 samples, time used: 125.642s
current training loss: 0.543053
current training accuracy: 0.802115
Seen 324000 samples, time used: 124.395s
current training loss: 0.542812
current training accuracy: 0.802235
Seen 327000 samples, time used: 124.673s
current training loss: 0.542491
current training accuracy: 0.802376
Seen 330000 samples, time used: 124.575s
current training loss: 0.542353
current training accuracy: 0.802467
saving to ..., time used: 1253.663s
Seen 333000 samples, time used: 128.186s
current training loss: 0.541986
current training accuracy: 0.802613
Seen 336000 samples, time used: 125.880s
current training loss: 0.541965
current training accuracy: 0.802637
Seen 339000 samples, time used: 125.476s
current training loss: 0.541835
current training accuracy: 0.802673
Seen 342000 samples, time used: 125.173s
current training loss: 0.541755
current training accuracy: 0.802620
Seen 345000 samples, time used: 124.179s
current training loss: 0.541626
current training accuracy: 0.802696
Seen 348000 samples, time used: 122.348s
current training loss: 0.541486
current training accuracy: 0.802787
Seen 351000 samples, time used: 123.786s
current training loss: 0.541309
current training accuracy: 0.802980
Seen 354000 samples, time used: 124.615s
current training loss: 0.540942
current training accuracy: 0.803198
Seen 357000 samples, time used: 123.575s
current training loss: 0.540763
current training accuracy: 0.803305
Seen 360000 samples, time used: 121.136s
current training loss: 0.540776
current training accuracy: 0.803369
saving to ..., time used: 1244.153s
Seen 363000 samples, time used: 125.144s
current training loss: 0.540984
current training accuracy: 0.803350
Seen 366000 samples, time used: 125.137s
current training loss: 0.540820
current training accuracy: 0.803484
Seen 369000 samples, time used: 124.781s
current training loss: 0.540531
current training accuracy: 0.803610
Seen 372000 samples, time used: 123.645s
current training loss: 0.540383
current training accuracy: 0.803675
Seen 375000 samples, time used: 123.889s
current training loss: 0.540344
current training accuracy: 0.803725
Seen 378000 samples, time used: 123.939s