Skip to content

Commit a7b876d

Browse files
authored
Update Protobuf documentation (#135)
Signed-off-by: Laurent Klock <[email protected]>
1 parent 32291d8 commit a7b876d

File tree

1 file changed

+28
-1
lines changed

1 file changed

+28
-1
lines changed

API/urlfrontier.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
- [AnyCrawlID](#urlfrontier-AnyCrawlID)
1010
- [BlockQueueParams](#urlfrontier-BlockQueueParams)
1111
- [Boolean](#urlfrontier-Boolean)
12+
- [CountUrlParams](#urlfrontier-CountUrlParams)
1213
- [CrawlLimitParams](#urlfrontier-CrawlLimitParams)
1314
- [DeleteCrawlMessage](#urlfrontier-DeleteCrawlMessage)
1415
- [DiscoveredURLItem](#urlfrontier-DiscoveredURLItem)
@@ -122,6 +123,25 @@ Parameter message for BlockQueueUntil *
122123

123124

124125

126+
<a name="urlfrontier-CountUrlParams"></a>
127+
128+
### CountUrlParams
129+
130+
131+
132+
| Field | Type | Label | Description |
133+
| ----- | ---- | ----- | ----------- |
134+
| key | [string](#string) | | ID for the queue * |
135+
| crawlID | [string](#string) | | crawl ID |
136+
| filter | [string](#string) | optional | Search filter on url (can be empty, default is empty) |
137+
| ignoreCase | [bool](#bool) | optional | Ignore Case sensitivity for search filter (default is false -&gt; case sensitive) |
138+
| local | [bool](#bool) | optional | only for the current local instance (default is false) |
139+
140+
141+
142+
143+
144+
125145
<a name="urlfrontier-CrawlLimitParams"></a>
126146

127147
### CrawlLimitParams
@@ -158,6 +178,7 @@ Parameter message for SetCrawlLimit *
158178
<a name="urlfrontier-DiscoveredURLItem"></a>
159179

160180
### DiscoveredURLItem
181+
161182
URL discovered during the crawl, might already be known in the URL Frontier or not.
162183

163184

@@ -203,6 +224,7 @@ Parameter message for GetURLs *
203224
<a name="urlfrontier-KnownURLItem"></a>
204225

205226
### KnownURLItem
227+
206228
URL which was already known in the frontier, was returned by GetURLs() and processed by the crawler. Used for updating the information
207229
about it in the frontier. If the date is not set, the URL will be considered done and won&#39;t be resubmitted for fetching, otherwise
208230
it will be elligible for fetching after the delay has elapsed.
@@ -231,6 +253,8 @@ it will be elligible for fetching after the delay has elapsed.
231253
| key | [string](#string) | | ID for the queue * |
232254
| crawlID | [string](#string) | | crawl ID |
233255
| local | [bool](#bool) | | only for the current local instance |
256+
| filter | [string](#string) | optional | Search filter on url (can be empty, default is empty) |
257+
| ignoreCase | [bool](#bool) | optional | Ignore Case sensitivity for search filter (default is false -&gt; case sensitive) |
234258

235259

236260

@@ -255,6 +279,7 @@ it will be elligible for fetching after the delay has elapsed.
255279
<a name="urlfrontier-LogLevelParams"></a>
256280

257281
### LogLevelParams
282+
258283
Configuration of the log level for a particular package, e.g.
259284
crawlercommons.urlfrontier.service.rocksdb DEBUG
260285

@@ -361,6 +386,7 @@ Returned by ListQueues *
361386
<a name="urlfrontier-Stats"></a>
362387

363388
### Stats
389+
364390
Message returned by the GetStats method
365391

366392

@@ -418,7 +444,7 @@ Message returned by the GetStats method
418444
| ----- | ---- | ----- | ----------- |
419445
| url | [string](#string) | | URL * |
420446
| key | [string](#string) | | The key is used to put the URLs into queues, the value can be anything set by the client but would typically be the hostname, domain name or IP or the URL. If not set, the service will use a sensible default like hostname. |
421-
| metadata | [URLInfo.MetadataEntry](#urlfrontier-URLInfo-MetadataEntry) | repeated | Arbitrary key / values stored alongside the URL. Can be anything needed by the crawler like http status, source URL etc... |
447+
| metadata | [URLInfo.MetadataEntry](#urlfrontier-URLInfo-MetadataEntry) | repeated | Arbitrary key / values stored alongside the URL. Can be anything needed by the crawler like http status, source URL etc... |
422448
| crawlID | [string](#string) | | crawl ID * |
423449

424450

@@ -533,6 +559,7 @@ Wrapper for a KnownURLItem or DiscoveredURLItem *
533559
| SetCrawlLimit | [CrawlLimitParams](#urlfrontier-CrawlLimitParams) | [Empty](#urlfrontier-Empty) | Sets crawl limit for domain * |
534560
| GetURLStatus | [URLStatusRequest](#urlfrontier-URLStatusRequest) | [URLItem](#urlfrontier-URLItem) | Get status of a particular URL This does not take into account URL scheduling. Used to check current status of an URL within the frontier |
535561
| ListURLs | [ListUrlParams](#urlfrontier-ListUrlParams) | [URLItem](#urlfrontier-URLItem) stream | List all URLs currently in the frontier This does not take into account URL scheduling. Used to check current status of all URLs within the frontier |
562+
| CountURLs | [CountUrlParams](#urlfrontier-CountUrlParams) | [Long](#urlfrontier-Long) | Count URLs currently in the frontier * |
536563

537564

538565

0 commit comments

Comments
 (0)