You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -122,6 +123,25 @@ Parameter message for BlockQueueUntil *
122
123
123
124
124
125
126
+
<aname="urlfrontier-CountUrlParams"></a>
127
+
128
+
### CountUrlParams
129
+
130
+
131
+
132
+
| Field | Type | Label | Description |
133
+
| ----- | ---- | ----- | ----------- |
134
+
| key |[string](#string)|| ID for the queue * |
135
+
| crawlID |[string](#string)|| crawl ID |
136
+
| filter |[string](#string)| optional | Search filter on url (can be empty, default is empty) |
137
+
| ignoreCase |[bool](#bool)| optional | Ignore Case sensitivity for search filter (default is false -> case sensitive) |
138
+
| local |[bool](#bool)| optional | only for the current local instance (default is false) |
139
+
140
+
141
+
142
+
143
+
144
+
125
145
<aname="urlfrontier-CrawlLimitParams"></a>
126
146
127
147
### CrawlLimitParams
@@ -158,6 +178,7 @@ Parameter message for SetCrawlLimit *
158
178
<aname="urlfrontier-DiscoveredURLItem"></a>
159
179
160
180
### DiscoveredURLItem
181
+
161
182
URL discovered during the crawl, might already be known in the URL Frontier or not.
162
183
163
184
@@ -203,6 +224,7 @@ Parameter message for GetURLs *
203
224
<aname="urlfrontier-KnownURLItem"></a>
204
225
205
226
### KnownURLItem
227
+
206
228
URL which was already known in the frontier, was returned by GetURLs() and processed by the crawler. Used for updating the information
207
229
about it in the frontier. If the date is not set, the URL will be considered done and won't be resubmitted for fetching, otherwise
208
230
it will be elligible for fetching after the delay has elapsed.
@@ -231,6 +253,8 @@ it will be elligible for fetching after the delay has elapsed.
231
253
| key |[string](#string)|| ID for the queue * |
232
254
| crawlID |[string](#string)|| crawl ID |
233
255
| local |[bool](#bool)|| only for the current local instance |
256
+
| filter |[string](#string)| optional | Search filter on url (can be empty, default is empty) |
257
+
| ignoreCase |[bool](#bool)| optional | Ignore Case sensitivity for search filter (default is false -> case sensitive) |
234
258
235
259
236
260
@@ -255,6 +279,7 @@ it will be elligible for fetching after the delay has elapsed.
255
279
<aname="urlfrontier-LogLevelParams"></a>
256
280
257
281
### LogLevelParams
282
+
258
283
Configuration of the log level for a particular package, e.g.
259
284
crawlercommons.urlfrontier.service.rocksdb DEBUG
260
285
@@ -361,6 +386,7 @@ Returned by ListQueues *
361
386
<aname="urlfrontier-Stats"></a>
362
387
363
388
### Stats
389
+
364
390
Message returned by the GetStats method
365
391
366
392
@@ -418,7 +444,7 @@ Message returned by the GetStats method
418
444
| ----- | ---- | ----- | ----------- |
419
445
| url |[string](#string)|| URL * |
420
446
| key |[string](#string)|| The key is used to put the URLs into queues, the value can be anything set by the client but would typically be the hostname, domain name or IP or the URL. If not set, the service will use a sensible default like hostname. |
421
-
| metadata |[URLInfo.MetadataEntry](#urlfrontier-URLInfo-MetadataEntry)| repeated | Arbitrary key / values stored alongside the URL. Can be anything needed by the crawler like http status, source URL etc... |
447
+
| metadata |[URLInfo.MetadataEntry](#urlfrontier-URLInfo-MetadataEntry)| repeated |Arbitrary key / values stored alongside the URL. Can be anything needed by the crawler like http status, source URL etc... |
422
448
| crawlID |[string](#string)|| crawl ID * |
423
449
424
450
@@ -533,6 +559,7 @@ Wrapper for a KnownURLItem or DiscoveredURLItem *
533
559
| SetCrawlLimit |[CrawlLimitParams](#urlfrontier-CrawlLimitParams)|[Empty](#urlfrontier-Empty)| Sets crawl limit for domain * |
534
560
| GetURLStatus |[URLStatusRequest](#urlfrontier-URLStatusRequest)|[URLItem](#urlfrontier-URLItem)| Get status of a particular URL This does not take into account URL scheduling. Used to check current status of an URL within the frontier |
535
561
| ListURLs |[ListUrlParams](#urlfrontier-ListUrlParams)|[URLItem](#urlfrontier-URLItem) stream | List all URLs currently in the frontier This does not take into account URL scheduling. Used to check current status of all URLs within the frontier |
562
+
| CountURLs |[CountUrlParams](#urlfrontier-CountUrlParams)|[Long](#urlfrontier-Long)| Count URLs currently in the frontier * |
0 commit comments