Containerized nvidia-snatcher #114

matthewLee711 · 2020-09-20T17:19:48Z

For anyone that doesn't have node.js on their system or wants to drop this on AWS, here's a Dockerfile I threw together. Feel free to make improvements on this!

Usage notes:

The chromium version (~83) being used is not a happy camper and certain stores will randomly crash 20% of the time.
You need to make changes to the puppeteer launcher + package.json

Dockerfile

FROM node:14-alpine3.12

# Chrome setup
RUN apk update && apk add --no-cache nmap && \
  echo @edge http://nl.alpinelinux.org/alpine/edge/community >> /etc/apk/repositories && \
  echo @edge http://nl.alpinelinux.org/alpine/edge/main >> /etc/apk/repositories && \
  apk update && \
  apk add --no-cache \
  "chromium>81" \
  harfbuzz \
  ca-certificates \
  freetype \
  freetype-dev \
  ttf-freefont \
  nss

ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true
ENV CHROMIUM_PATH=/usr/bin/chromium-browser

# Create working directory for node app
RUN mkdir -p /usr/src/app

# CD into working directory and copy package.json into it
WORKDIR /usr/src/app
COPY package.json package.json

# Install and clean cache
RUN npm install

# Copy all files into working directory
COPY . .

# Start node app when container started
CMD [ "npm", "start" ]

index.ts > main()

async function main() {
	const results = [];
	const browser = await puppeteer.launch({
		headless: true,
		executablePath: process.env.CHROMIUM_PATH,
		args: ['--no-sandbox', '--disable-dev-shm-usage'],
	});

	for (const store of Stores) {
		Logger.debug(store.links);
		results.push(lookup(browser, store));
	}

	await Promise.all(results);
	await browser.close();

	Logger.info('↗ trying stores again');
	setTimeout(main, Config.rateLimitTimeout);
}

package.json

"puppeteer": "^3.1.0"

jef · 2020-09-20T17:44:33Z

I think getting this to not use puppeteer and separating logic would help with containerization. Related to #113

ljmerza · 2020-09-20T20:21:02Z

puppeteer needs a display to work. xvfb will do the trick

 RUN apt-get update && \ 
     apt-get install -yq --no-install-recommends \ 
     libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 \ 
     libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 \ 
     libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 \ 
     libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 \ 
     libnss3 

# Start script on Xvfb
CMD Xvfb :99 -screen 0 1024x768x16 & npm start

I have a docker container for nodejs and xvfb as well: lmerza/xvfb

matthewLee711 · 2020-09-20T23:24:54Z

As a quick heads up everyone, my nvidia-snatcher on AWS got blocked. I don't have much experience when it comes to scraping pages, but I recently found this guide to help your bot run a little while longer.

https://www.reddit.com/r/programming/comments/ecvc42/a_guide_to_web_scraping_without_getting_blocked/fbei5dp?utm_source=share&utm_medium=web2x&context=3

I do want to mentioned, the easiest way prevent getting blocked, is by limiting your rate. Change your request interval to 30+ seconds.

geman220 · 2020-09-21T20:50:25Z

Do you mind if we close this out?

matthewLee711 · 2020-09-21T21:09:40Z

I recommend we keep it open, as I feel the information/discussion is helpful for others going down the path of containerization and web scraping on cloud providers. However, if you feel it is necessary to close, I'm 100% fine with you going forward on it.

geman220 · 2020-09-21T22:04:32Z

It will still be accessible, just closed. Or you can make a Wiki page https://github.com/jef/nvidia-snatcher/wiki . I'm just not sure what to do with this in Issue tracking.

jef · 2020-09-22T04:02:30Z

Related: #174

jef · 2020-09-22T04:03:13Z

Not sure what I want do with Docker yet. There are some hurdles and hacks and I don't know if I want to put anything inside the repository.

Feel free to create a Wiki page with instructions on installation. I don't know if I want to support this yet.

geman220 · 2020-09-23T17:20:00Z

Closing due to the merge of #209 there is also #174. Feel free to create a Wiki page for instructions on docker.

andrewmackrodt mentioned this issue Sep 22, 2020

feat: add chromium sandbox skipping #209

Merged

geman220 closed this as completed Sep 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Containerized nvidia-snatcher #114

Containerized nvidia-snatcher #114

matthewLee711 commented Sep 20, 2020

jef commented Sep 20, 2020

ljmerza commented Sep 20, 2020 •

edited

Loading

matthewLee711 commented Sep 20, 2020 •

edited

Loading

geman220 commented Sep 21, 2020

matthewLee711 commented Sep 21, 2020

geman220 commented Sep 21, 2020

jef commented Sep 22, 2020

jef commented Sep 22, 2020

geman220 commented Sep 23, 2020

Containerized nvidia-snatcher #114

Containerized nvidia-snatcher #114

Comments

matthewLee711 commented Sep 20, 2020

jef commented Sep 20, 2020

ljmerza commented Sep 20, 2020 • edited Loading

matthewLee711 commented Sep 20, 2020 • edited Loading

geman220 commented Sep 21, 2020

matthewLee711 commented Sep 21, 2020

geman220 commented Sep 21, 2020

jef commented Sep 22, 2020

jef commented Sep 22, 2020

geman220 commented Sep 23, 2020

ljmerza commented Sep 20, 2020 •

edited

Loading

matthewLee711 commented Sep 20, 2020 •

edited

Loading