Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too messy commit structure to merge, check other PR #14

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Too messy commit structure to merge, check other PR #14

wants to merge 6 commits into from

Conversation

SimonBerend
Copy link
Contributor

No description provided.

Copy link
Contributor

@andrewsutjahjo andrewsutjahjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functionality is there and good, it's just not implemented in the right place. It needs a small rework so that future developers can understand the code quickly.

For bonuspoints: use a !fixup commit to remove any errant code like just_scrape.py so that it doesn't clutter the repo

Comment on lines 28 to 37
url_term_list = []

for term in keywords:
search_term = 'term=' + term.replace(' ', '+')
random_seed = 'seed=' + str(random.randint(1, 65536))
get_params = self.default_url_params + [f"category_id={category}", f"page={page}", random_seed, search_term] # added a keyword parameter to the url
url = self.base_url + '&'.join(get_params)
url_term_list.append(url)

return url_term_list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, however it's implemented in an illogical place: the function you've placed it in is called get_url which is singular. Anyone that reads this that hasn't been part of this PR will (rightfully) assume that this function will return exactly one URL, and the docstring will validate that belief. When it gets used they will get a list of URLs.

I'm not too sure if we want to do a search of each combination of category_id and search_term, but I haven't had enough of a dive into kickstarter data to know what to decide on that front

I'd place this in another function like get_urls_by_keyword and extend the scrape script to do both a loop of the original crawl by category_id functionality and a loop of searching by keyword with(or without) category_id

except Exception as e:
self.logger.exception('Failed to get projects from current page')

self.write_to_file(projects, str(category))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting: keep whitespaces out of empty lines


url_term_list = self.get_url(category, page)

print(url_term_list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the logger if you want to log what it's doing, otherwise remove the print statement

@andrewsutjahjo
Copy link
Contributor

PS: please remember to request a review from someone (top of the right sidebar on github) so that they know they should read your code and comment on it, otherwise no one knows it's ready to be reviewed

@SimonBerend SimonBerend reopened this Aug 11, 2021
@SimonBerend SimonBerend changed the title Keywords functionality in scraper kickstarter v1 Keywords search in kickstarter scraper Aug 11, 2021
@SimonBerend SimonBerend changed the title Keywords search in kickstarter scraper Too messy commit structure to merge, check other PR Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants