Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containerized nvidia-snatcher #114

Closed
matthewLee711 opened this issue Sep 20, 2020 · 9 comments
Closed

Containerized nvidia-snatcher #114

matthewLee711 opened this issue Sep 20, 2020 · 9 comments

Comments

@matthewLee711
Copy link

For anyone that doesn't have node.js on their system or wants to drop this on AWS, here's a Dockerfile I threw together. Feel free to make improvements on this!

Usage notes:

  • The chromium version (~83) being used is not a happy camper and certain stores will randomly crash 20% of the time.
  • You need to make changes to the puppeteer launcher + package.json

Dockerfile

FROM node:14-alpine3.12

# Chrome setup
RUN apk update && apk add --no-cache nmap && \
  echo @edge http://nl.alpinelinux.org/alpine/edge/community >> /etc/apk/repositories && \
  echo @edge http://nl.alpinelinux.org/alpine/edge/main >> /etc/apk/repositories && \
  apk update && \
  apk add --no-cache \
  "chromium>81" \
  harfbuzz \
  ca-certificates \
  freetype \
  freetype-dev \
  ttf-freefont \
  nss

ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true
ENV CHROMIUM_PATH=/usr/bin/chromium-browser

# Create working directory for node app
RUN mkdir -p /usr/src/app

# CD into working directory and copy package.json into it
WORKDIR /usr/src/app
COPY package.json package.json

# Install and clean cache
RUN npm install

# Copy all files into working directory
COPY . .

# Start node app when container started
CMD [ "npm", "start" ]

index.ts > main()

async function main() {
	const results = [];
	const browser = await puppeteer.launch({
		headless: true,
		executablePath: process.env.CHROMIUM_PATH,
		args: ['--no-sandbox', '--disable-dev-shm-usage'],
	});

	for (const store of Stores) {
		Logger.debug(store.links);
		results.push(lookup(browser, store));
	}

	await Promise.all(results);
	await browser.close();

	Logger.info('↗ trying stores again');
	setTimeout(main, Config.rateLimitTimeout);
}

package.json

"puppeteer": "^3.1.0"
@jef
Copy link
Owner

jef commented Sep 20, 2020

I think getting this to not use puppeteer and separating logic would help with containerization. Related to #113

@ljmerza
Copy link

ljmerza commented Sep 20, 2020

puppeteer needs a display to work. xvfb will do the trick

 RUN apt-get update && \ 
     apt-get install -yq --no-install-recommends \ 
     libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 \ 
     libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 \ 
     libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 \ 
     libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 \ 
     libnss3 

# Start script on Xvfb
CMD Xvfb :99 -screen 0 1024x768x16 & npm start

I have a docker container for nodejs and xvfb as well: lmerza/xvfb

@matthewLee711
Copy link
Author

matthewLee711 commented Sep 20, 2020

As a quick heads up everyone, my nvidia-snatcher on AWS got blocked. I don't have much experience when it comes to scraping pages, but I recently found this guide to help your bot run a little while longer.

https://www.reddit.com/r/programming/comments/ecvc42/a_guide_to_web_scraping_without_getting_blocked/fbei5dp?utm_source=share&utm_medium=web2x&context=3

I do want to mentioned, the easiest way prevent getting blocked, is by limiting your rate. Change your request interval to 30+ seconds.

@geman220
Copy link
Contributor

Do you mind if we close this out?

@matthewLee711
Copy link
Author

I recommend we keep it open, as I feel the information/discussion is helpful for others going down the path of containerization and web scraping on cloud providers. However, if you feel it is necessary to close, I'm 100% fine with you going forward on it.

@geman220
Copy link
Contributor

It will still be accessible, just closed. Or you can make a Wiki page https://github.com/jef/nvidia-snatcher/wiki . I'm just not sure what to do with this in Issue tracking.

@jef
Copy link
Owner

jef commented Sep 22, 2020

Related: #174

@jef
Copy link
Owner

jef commented Sep 22, 2020

Not sure what I want do with Docker yet. There are some hurdles and hacks and I don't know if I want to put anything inside the repository.

Feel free to create a Wiki page with instructions on installation. I don't know if I want to support this yet.

@geman220
Copy link
Contributor

Closing due to the merge of #209 there is also #174. Feel free to create a Wiki page for instructions on docker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants