Preliminary, Subject to Change Posted: 10/29/2024
30135 - AI and Financial Information
Professor: Bradford Levy bradford.levy@chicagobooth.edu
Harper Center (Booth) 347
Teaching Assistants Alec Guthrie
Joseph Gorman
Section 1: 1: Th 8:30-11:30 AM Harper Center
Office Hours: TBD – Will be optimized based on students’ preferences
Also by appointment via Zoom. Email me to setup a time.
Course Description
Are you interested in building an AI-first company? Managing an AI product? Evaluating investments
in AI ventures? If so, this course will provide a foundation for achieving those goals.
Today’s AI systems such as ChatGPT and Claude are a complex web of components: knowledge
bases, language and multi-modal models, preference alignment mechanisms, retrieval augmented
generation (RAG) pipelines, re-rankers, etc. As such, this course will cover a wide range of topics and
how each one fits into the big picture of “AI.” Hands-on experience with the topics will result from
applying methods to processing large volumes of financial information. That said, most of these
methods are general, and even students interested in other domains will gain valuable experience.
Preparation for each class will involve concise pre-reading and a thought-provoking assignment. The
first half of each class will be dedicated to discussion, lecture, and current advancements. The second
half will be a lab. Throughout term, student teams will work on a final project of their choosing. For
this reason, other assignments will be kept to the minimum sufficient to facilitate in-class discussion.
The goal of the final project is to provide students with (i) experience to enhance their portfolio of
work, (ii) the opportunity to study a problem of personal interest, and (iii) demonstrate what they have
learnt. At the end of term, teams will present their projects to the class and a panel of industry experts.
Frequently Asked Questions 🙋
1. I thought Python was a snake, should I take this course?
Yes. Half of each class will be discussion/lecture which you will find useful and the other half a lab
where we apply the lecture to a practical problem. If you genuinely have zero programming experience
and/or have never programmed in Python then the lab will be challenging. That said, the labs will be
team based so if you have team members who can help guide you, then that will help.
2. In my free time, I leverage QLORA to PEFT Llama 405B while using my 4090’s waste heat
to keep my pet python’s terrarium at a balmy 29C, should I take this course?
The information contained in these documents is confidential, privileged and only for the information of the intended recipient and may not be used,
published or redistributed without the prior written consent of the Booth faculty member(s) teaching the course.
Preliminary, Subject to Change Posted: 10/29/2024
Yes. If you are already familiar with advanced methods, the final project will be an opportunity for
you to build something spectacular. Along these lines, I will make myself available to provide as much
guidance as you like on the project. In addition, I believe the lectures will be sufficiently thought
provoking to further cement your expertise and provide an opportunity for you to share with the class.
3. I might take (or already took) Booth course X on AI/ML. Should I take this course?
Yes. The Booth curriculum has been thoughtfully designed such that courses are compliments rather
than substitutes. You must decide how much of your Booth education to dedicate to studying AI, but
I believe that taking this course, along with any of my colleagues’ courses, will only better prepare you
for an AI-related career.
4. Do you have a technology policy?
I prefer to avoid regulation unless necessary. That said, I was once an MBA student and remember
what it was like rushing from class to recruiting events to mock casing, AND needing to write a million
thank you notes after those recruiting events… Along these lines, if you know you are going to do
something during lecture which is distracting to your peers, please sit at the back of the room.
5. Can we use AI when completing coursework?
Students are welcome to use AI, but they must disclose that they did so. Also, during the first lecture
we will discuss whether we think this is a good idea, when it might be beneficial, and when we should
just roll up our sleeves and get to work.
Course Grade 💯
Grades for this course will be based heavily on project performance. The following weights will be
used when calculating your final grade:
Final Project 60%
Pre-class Assignments 20%
Participation/Attendance 20%
Total 100%
Participation/Attendance 🧑🎓
Attendance for all lectures and labs is mandatory. For labs, attendance will be measured based upon
the successful completion of the lab exercises. There are three ways students can earn participation
points (i) by asking and answering questions during lecture, (ii) by sharing their relevant experiences
during lecture, or (iii) participating in Canvas discussions which help other students learn.
Pre-class Assignments 📖
Each class will have a concise reading and brief assignment designed to facilitate class discussion. Since
Prof. Levy must aggregate the responses to these assignments and adapt his teaching notes
accordingly, each assignment will be due at 1PM CT the day prior to class (e.g., if class is on Thursday,
then the assignment is due Wednesday at 1PM CT). Late submissions will not be accepted.
Final Project 🚧
The information contained in these documents is confidential, privileged and only for the information of the intended recipient and may not be used,
published or redistributed without the prior written consent of the Booth faculty member(s) teaching the course.
Preliminary, Subject to Change Posted: 10/29/2024
The labs and final project will be completed by teams of ~4 students. The importance of choosing a
good team cannot be overstated. To this end, choose your teammates wisely: assemble a team with
diverse talents and experiences. Team formation must be complete by the end of Week 1. If you do
not have a team at that time, Prof. Levy will assign you one.
Each student team will develop a final project which leverages and extends the topics covered in the
course. In developing their project, students are encouraged to think broadly about the course material,
the current challenges of applying AI to processing financial information, and their own skillsets. For
example, one challenge faced in this domain is a lack of benchmarks to evaluate performance of AI
systems. Students who are comparatively strong on institutional details of financial reporting could
carry out a final project where they develop a new benchmark and evaluate state of the art models on
it—notably, such a benchmark is likely worthy of publication. Teams must discuss their final project
idea with Prof. Levy no later than the end of Week 3.
Course Materials and Software 📚👩💻
All necessary course materials will be provided by Canvas. There is no textbook for the course. All
teaching materials will be provided to students via Canvas. To minimize environment waste, students
will have the opportunity to indicate that they would like print outs of the lecture notes rather than
relying on digital copies. Unless otherwise indicated, I will assume that you do not want paper copies.
Python will be the language of choice. While there are other languages that are also good for AI/ML,
Python is overwhelmingly the dominant choice. That said, students are free to use other tools so long
as they are still able to complete the labs and homework assignments. For example, some students
may prefer to use a low/no code tool such as Alteryx instead of programming in Python. This is fine
so long as students recognize that Prof. Levy’s assistance will likely be less effective.
About The Professor 👨💻 🚵 🍕
Prof. Levy is interested in AI, financial markets, and regulation. His research has been published in
top computer science venues, influenced regulation and lawmaking, and cited by media outlets such
as Bloomberg, FT, and WSJ. He is passionate about teaching; a random sample of past feedback:
“Professor Levy was very down–to–earth and willing to help every member of the class no matter what stage of
learning they were at. Professor Levy is somebody who truly, genuinely cares not only about his students, but
also about the quality of his teaching[,]” “Best professor I have had at UChicago[,]” “Very approachable,
knowledgeable, and cares deeply about meeting you where you are and making sure you understand the
material[,]” “Professor Levy was exceptional. He was super helpful both inside and outside of class and always
made himself available to answer questions and challenge us to figure things out.”
Prior to pursuing an academic career, Prof. Levy worked as an engineer and manager at Fortune 100s
including Amazon (Alexa) and Microsoft. During this time, he was granted several patents on
technology spanning internal combustion engines to AI. Outside of work, he likes to bike with his
partner (they did a self-supported 800+ mi trip through Washington State in 10 days when they were
MBA students), spend time with their new daughter, think about the implications of using possessive
pronouns to refer to sentient creatures, eat pizza (which he makes himself and takes very seriously),
and work.
The information contained in these documents is confidential, privileged and only for the information of the intended recipient and may not be used,
published or redistributed without the prior written consent of the Booth faculty member(s) teaching the course.
Preliminary, Subject to Change Posted: 10/29/2024
Course Outline 🗓
Week 1 – Philosophy of AGI and Overview of Language Models
This week we will challenge ourselves to think deeply about what it means to achieve Artificial General
Intelligence (AGI). We will frame the problem abstractly, consider one solution to the problem which
suggests AGI is possible, and introduce language models within this framework. As a class, we will
discuss how reasonable this approach is and why AGI doesn’t currently exist (or does it?).
👩🔬 Lab: Implement and experiment with a variety of language models, small and large.
🐝 Buzzwords: AGI, data efficiency, generalization, LLMs, GPTs, attention, transformers
Week 2 – What are you thinking? Building Representations of the World
Human beings largely perceive the world through sight, smell, touch, taste, and hearing. These
perceptions are then converted into internal representations used in decision making. This week, we
will examine how language and multi-modal models construct such representations, how we can
leverage them for fun and profit, and the challenges (and solutions) when working with long content.
👩🔬 Lab: Create representations of financial documents, e.g., PowerPoint decks, PDFs, images, and text
produced by companies. Evaluate the efficacy of representations for a variety of tasks.
🐝 Buzzwords: Embeddings, chunking, encoder vs. decoder models, multi-modal
Week 3 – Prompting Generative Models: From Auto-completion to Following Instructions
This week we will dive into the evolution of LLMs from glorified auto-completion systems to being
able take direction from user provided instructions. We will look at the big picture implications of
highly technical topics such as alignment, explore prompt engineering, and revisit the limitations of
AI systems.
👩🔬 Lab: Align an LLM to the finance domain. Optimize prompts to elicit desired output. Explore
robustness (or fragility?) of output.
🐝 Buzzwords: CoT, alignment, RLHF, prompt engineering, role play, SolidGoldMagikarp 🎣
Week 4 – Improving Results Through Context
When responding to our queries, LLMs have two primary sources of memory: their parameters and
the query context. While the parameters are expensive to update since doing so requires training, we
can change the context provided to the LLM cheaply at inference time. This week we will focus on
leveraging the context to get better results, e.g., answers to queries that depend on current events
which happened after the LLM was trained, are more factually accurate, less biased, and/or toxic, etc.
👩🔬 Lab: Embed financial documents using a text embedding model. Implement a RAG pipeline to
provide query-specific context. Evaluate RAG and non-RAG outputs for temporal biases.
🐝 Buzzwords: GTE, RAG, re-ranking, bias, toxicity, vector DBs, quantization
Week 5 – Structuring Unstructured Data
Extracting structured data from unstructured content, e.g., videos, is a common challenge. This week
we will examine how AI systems can extract structured content from unstructured text, images, and
videos.
The information contained in these documents is confidential, privileged and only for the information of the intended recipient and may not be used,
published or redistributed without the prior written consent of the Booth faculty member(s) teaching the course.
Preliminary, Subject to Change Posted: 10/29/2024
👩🔬 Lab: Apply multi-modal models to convert unstructured financial disclosures into structured data
for use in financial statement analysis, valuation models, etc.
🐝 Buzzwords: Multi-modal, ViT, cross-attention
Week 6 – Identifying New Information
When faced with a barrage of content from a variety of sources each competing for our attention, it
is useful to be able to separate new content, or information, from stale content which we have already
seen. This week we will think about what “information” is, and how generative models can enable us
to spend our time more effectively by identifying it for us.
👩🔬 Lab: Explore a variety of methods for identifying information—from old-school, but efficient,
methods developed by Google to modern, more costly, LLM-based methods—and evaluate their
usefulness for processing lengthy financial disclosures.
🐝 Buzzwords: Perplexity, approximate nearest neighbors and deduplication, HNSW
Week 7 – Do you hear me? How to speak so AI will listen
As tools such as ChatGPT gain adoption and more content is consumed through the lens of AI
systems, this begs the question of whether content producers’ original messages are being retained or
lost by the AI. This week we will explore different ways users leverage AI to consume content and the
implications these have for producers of content. Of particular interest are implications for investor
relations teams responsible for communicating with external stakeholders.
👩🔬 Lab: Apply common transformations of content, e.g., summarization, to financial documents and
evaluate the usefulness of the transformed content relative to the canon material. Explore
permutations of the canon material to elicit desired output from the model.
🐝 Buzzwords: Prompt (or content?) engineering, SolidGoldMagikarp 🎣
Week 8 – AI Safety
The self-regulating property of AI systems ensures that humans have no reason to concern themselves
with the safety of AI and should devote their time to other pursuits. Wait, is that right? Is Prof. Levy
captured by AI? Jokes about Skynet aside, there are real immediate risks associated with AI, e.g., the
invention of novel harmful chemicals. This week we will think about risks associated with AI such as
public misinformation, evasion of regulations, fraudulent financial data, and exacerbation of societal
biases. We will then think about how we can mitigate these risks—sometimes with the help of AI.
👩🔬 Lab: Use LLMs for generation of fake financial information. Explore ability of LLMs to police
themselves.
🐝 Buzzwords: Alignment, adversarial robustness, jail breaks, synthetic data
Week 9 – Final Presentations
You don’t have to go home (but you can’t stay here). Sadly, our time together is winding down. In the
final week of class each team will show off what they have been working on throughout the term!
Note: I have received mixed feedback from students about how and when final presentations should
occur. Thus, during our first class, we will finalize the specifics of the presentations by majority vote.
The information contained in these documents is confidential, privileged and only for the information of the intended recipient and may not be used,
published or redistributed without the prior written consent of the Booth faculty member(s) teaching the course.