-
Notifications
You must be signed in to change notification settings - Fork 0
Erik's manual scraping process
Tim Loderhose edited this page Nov 23, 2021
·
1 revision
I (Tim) wrote these (mostly chronological) notes as Erik was showcasing his process.
BankTrack bank profiles
- Visit banktrack bank profile. ie. banktrack.org/bank/bank_of_america
- Policies tab (below about)
- Investment policies are all listed
- also occur in Documents tab - policies are often in documents
- Policies are added here (in documents), tagged as 'csr policy'
- When an old one is obsolete, it is tagged as 'out of date'
- Also added to BT share (a virtual harddrive)
Searching and updating the profiles:
-
Bank of America
- 60-70% Investment policies link to a CSR web page
- otherwise 'unavailable' will be stated
- Clicks on link (BoA)
- CSR framework available as PDF (mostly PDF, sometimes word, sometimes html)
- already tracked
- using google site search
- google: human rights site:bankofamerica.com
- finds pdf document, downloads to get document date
- google: policy site:bankofamerica.com
- ...
- 60-70% Investment policies link to a CSR web page
-
ANZ
- clicks link in BankTrack investment profile
- finds list of policy links
- clicks all PDFs, finds dates
- finds updated Energy policy
- downloads to BT share
- adds to banktrack website
- marks old energy policy as 'outdated'
- google: human rights site:anz.com
-
Bank of China
- clicks on link from investment profile
- ! link not found
- goes to bank website itself, linked from BankTrack - english version
- Clicks About us
- scrolls, finds nothing new
- Clicks on Investor relations
- 'they probably don't have any policies, and the ones listed are from the subsidiary' - Bank of China Hong Kong
- google: policy site:boc.cn - nothing first page, doesn't check second
- google: sustainability site:boc.cn - nothing first page, doesn't check second
Erik has local contacts in China to check local sites
-
but they didn't appear to have linked anything for the bank of China
-
CaixaBank
-
clicks on BT link
-
principles linking to document already logged
Bank Policies
- BT website divided into 'banks and climate', etc.
- Banks are scored on scale on which you can get points
- output of our tool would be reviewed by ie. climate team
'I do site search when desperate'
- usually when no information found easily
- when bank website is not well-structured and informed about policies
What do you do when that link in investment profile doesn't exist?
- Answer: site search Are document origins (URLs) stored?
- Answer: no, documents are downloaded to BT share
- in case document is HTML, link to the page You say "sometimes I do this" - there is no protocol?
- Answer: not really, intuition is built, but search is not exhaustive