-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too messy commit structure to merge, check other PR #14
base: master
Are you sure you want to change the base?
Conversation
Updating scrapers before introducing keywords funcionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functionality is there and good, it's just not implemented in the right place. It needs a small rework so that future developers can understand the code quickly.
For bonuspoints: use a !fixup
commit to remove any errant code like just_scrape.py
so that it doesn't clutter the repo
forward43/scraper_kickstarter.py
Outdated
url_term_list = [] | ||
|
||
for term in keywords: | ||
search_term = 'term=' + term.replace(' ', '+') | ||
random_seed = 'seed=' + str(random.randint(1, 65536)) | ||
get_params = self.default_url_params + [f"category_id={category}", f"page={page}", random_seed, search_term] # added a keyword parameter to the url | ||
url = self.base_url + '&'.join(get_params) | ||
url_term_list.append(url) | ||
|
||
return url_term_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea, however it's implemented in an illogical place: the function you've placed it in is called get_url
which is singular. Anyone that reads this that hasn't been part of this PR will (rightfully) assume that this function will return exactly one URL, and the docstring will validate that belief. When it gets used they will get a list of URLs.
I'm not too sure if we want to do a search of each combination of category_id
and search_term
, but I haven't had enough of a dive into kickstarter data to know what to decide on that front
I'd place this in another function like get_urls_by_keyword
and extend the scrape script to do both a loop of the original crawl by category_id
functionality and a loop of searching by keyword with(or without) category_id
forward43/scraper_kickstarter.py
Outdated
except Exception as e: | ||
self.logger.exception('Failed to get projects from current page') | ||
|
||
self.write_to_file(projects, str(category)) | ||
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting: keep whitespaces out of empty lines
forward43/scraper_kickstarter.py
Outdated
|
||
url_term_list = self.get_url(category, page) | ||
|
||
print(url_term_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the logger if you want to log what it's doing, otherwise remove the print statement
PS: please remember to request a review from someone (top of the right sidebar on github) so that they know they should read your code and comment on it, otherwise no one knows it's ready to be reviewed |
No description provided.