Official submission repo from team Ternity AI for the AI EarthHack competition
Link to walkthrough video: https://drive.google.com/file/d/1SY-F1VMOpy8lFOixTF_z2X_1nTG5iKKc/view
For all: Run the following to install requirements, preferably on Python 3.11.
pip install -r requirements.txt
-
Edit
DATA_FILENAMEandWORDCLOUD_FILENAMEincleanup.py.DATA_FILENAMEshould point to a file with the same format asAI EarthHack Dataset.csv, the training dataset provided for the AI EarthHack. OptionalCLEANED_DATASETcan be specified to load a pickled cleaned dataset. -
Run
cleanup.pyto generate two wordclouds of the key words in both problems and solutions within the dataset. This gives the user a way to visualize the key concepts and ideas in the dataset. One could then be on the lookout for pitches that don't include these most common ideas, if one were to seek innovative solutions.python3 cleanup.py -
(Optional) You will likely notice the wordclouds dominated by a few big words. Some have already been manually excluded, but you can add more words to exclude by adding them to the
EXCLUDElist inutils.py. This allows you to focus on whichever level of generality you want (e.g. one could exclude many of the most general words to get a sense of specific ideas).
Example chat (including generation of rubric): https://chat.openai.com/share/989bec29-784e-4013-8074-885da3c241fc
-
Provide the following instructions to ChatGPT:
You are a former teacher who is familiar with assignment grading rubrics for student assessment. You have since had a career as both an academic researcher studying the environmental impact of new technologies and businesses, as well as a startup founder and investor who understands the market and industry around green business ventures. You will now transfer all those skills, knowledge, and expertise, and use the provided rubric to evaluate new circular economy business ideas based on 6 evaluation criteria. -
Copy and paste the evaluation rubric from
Evaluation Rubric.md. -
Provide a proposed business idea in the following format:
Business Idea Name Problem: [problem description goes here] Solution: [proposed solution goes here] -
Adjust the output by asking ChatGPT to be more (or possibly less?) constructively critical.
-
(Optional) Ask ChatGPT to generate and evaluate a circular economy business idea of its own.
-
Download knowledge base of articles into local copy of repository. Some articles are not open access, hence the exclusion of the entire
rag_kbfolder from this public repo. The folder contains two sub-folders, one for each role ("internal monologue") that CirconBot can take on, "investor" and "scholar".
Link: https://www.4shared.com/folder/Ja_zfK4n/rag_kb.html
pw: reach out via email, see https://ternity.education -
Follow the instructions at https://platform.openai.com/docs/assistants/tools/knowledge-retrieval to augment code in
CirconBot.pyto train two separate assistants, one on the 54 documents inrag_kb/investor, the other on the 65 documents inrag_kb/investor. -
Run
CirconBot.pyin CLI and follow prompts to input problem and solution, with combined evaluation from both experts.python3 CirconBot.py -
(Optional) Load
CirconBot.pyas a module in your code. Available functions to call:get_assistant_response(assistant, prompt), for getting a single expert's opinion, andcombined_evaluation(problem, solution).
For all the claimed advancements in GenAI and LLMs these past few years, we would argue that relatively little true progress has been made since the Transformer and BERT in 2017-18, upon which all major models today are built. While there have been some model improvements, the bulk of the performance gains has been from simply larger models with more parameters and compute to train. As Ternity AI founder Haihao Liu proposed for his doctoral thesis research back in 2020 (full thesis proposal available at https://tinyurl.com/HHLTAE), we believe it will be the explicit incorporation of domain knowledge into models, deeply embedded and encoded in the model architecture itself, that will finally bring us out of this plateau.
Rose, Stuart, et al. "Automatic keyword extraction from individual documents." Text mining: applications and theory (2010): 1-20. doi:10.1002/9780470689646.ch1.
Krenn, Mario, and Anton Zeilinger. "Predicting research trends with semantic and neural networks with an application in quantum physics." Proceedings of the National Academy of Sciences 117.4 (2020): 1910-1916. doi:10.1073/pnas.1914370116.
Qingyun Wu, et al. "AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation", Oct 2023. arXiv:2308.08155.
Alexandra Sasha Luccioni, Yacine Jernite, Emma Strubell. "Power Hungry Processing: Watts Driving the Cost of AI Deployment?", Nov 2023. arXiv:2311.16863.
Human Restoration Project. "AI Handbook: Artificial Intelligence in the CLassroom", Oct 2023. https://www.humanrestorationproject.org/resources/ai-handbook
Sumit Kumal. "A Survey of Retrieval-Augmented Generation for LLMs, Improving Sequential Recommendation via Fourier Transform, and More!", Dec 2023. https://recsys.substack.com/p/a-survey-of-retrieval-augmented-generation
Singh, Pardeep, et al. "Green Circular Economy: A New Paradigm for Sustainable Development." (2023). ISBN:978-3-031-40304-0
Other works consulted for general technical background information:
https://www.4shared.com/folder/12qMHT7K/bib.html
pw: reach out via email, see https://ternity.education