Commit dca2e85
committed
pdf_utils: send browser User-Agent on PDF downloads
Three different submissions failed at the download step
because publisher servers reject Python's default urllib
User-Agent. Confirmed cases:
- werbos.com → HTTP 465 without UA, 200 OK with browser UA
- royalsocietypublishing.org → 403 (CF bot management;
not fixable without session cookies, needs PDF upload)
- qwen.ai/blog → not a PDF, blog HTML; user error
Send a real Chrome User-Agent and Accept: application/pdf on
every download. This unblocks werbos and the broader class of
sites that filter on UA. Royal Society and the Qwen blog are
not server-side bugs to fix in our code:
- Royal Society needs an admin PDF-upload (already shipped)
- Qwen blog isn't a paper
Add a test asserting the User-Agent contains "Mozilla" and
that the Accept header carries application/pdf.
Generated-by: Claude AI
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>1 parent cd83798 commit dca2e85
2 files changed
Lines changed: 45 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
37 | 47 | | |
38 | 48 | | |
39 | 49 | | |
| |||
49 | 59 | | |
50 | 60 | | |
51 | 61 | | |
| 62 | + | |
52 | 63 | | |
53 | | - | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
54 | 67 | | |
55 | 68 | | |
56 | 69 | | |
| |||
66 | 79 | | |
67 | 80 | | |
68 | 81 | | |
| 82 | + | |
69 | 83 | | |
70 | 84 | | |
71 | 85 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | | - | |
| 57 | + | |
58 | 58 | | |
| 59 | + | |
59 | 60 | | |
60 | 61 | | |
61 | 62 | | |
| |||
69 | 70 | | |
70 | 71 | | |
71 | 72 | | |
72 | | - | |
| 73 | + | |
73 | 74 | | |
74 | 75 | | |
75 | 76 | | |
| |||
78 | 79 | | |
79 | 80 | | |
80 | 81 | | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
81 | 109 | | |
82 | 110 | | |
83 | 111 | | |
| |||
0 commit comments