Skip to content

BNCollege uses Javascript redirect to prevent scraping #9

@ravirahman

Description

@ravirahman

It appears that BNCollege, and potentially others, use a javascript redirect to prevent scraping. Here is a workaround (on Android) using a webview:
WebView view;

Then in, for example, onCreate,
view = new WebView(getApplicationContext()); view.getSettings().setJavaScriptEnabled(true); view.getSettings().setUserAgentString("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36"); view.getSettings().setLoadsImagesAutomatically(true); CookieManager.getInstance().setAcceptCookie(true); view.loadUrl("http://milton.bncollege.com/webapp/wcs/stores/servlet/TBWizardView?catalogId=10001&langId=-1&storeId=82238"); CookieManager.getInstance().setAcceptCookie(true); CookieManager.getInstance().setAcceptThirdPartyCookies(view,true); start();

Then a timer to check when the javascript redirect is complete
`
private Timer timer;
private TimerTask timerTask = new TimerTask() {

    @Override
    public void run() {
        runOnUiThread(new Runnable() {
            @Override
            public void run() {
                view.evaluateJavascript("(function() { return document.getElementsByClassName(\"bncbOptionItem\")[0].outerHTML; })()", new ValueCallback<String>() {
                    @Override
                    public void onReceiveValue(String value) {
                        System.out.println("hi there");
                        System.out.println(value);
                    }
                });
            }
        });

    }
};

public void start() {
    if(timer != null) {
        return;
    }
    timer = new Timer();
    timer.scheduleAtFixedRate(timerTask, 0, 2000);
}

public void stop() {
    timer.cancel();
    timer = null;
}`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions