ֱ

Using Machine Learning and Natural Language Processing to Identify Firm-Level Supply Shocks in Textual Data

, Econ PhD Student
, Joint Program of Financial Economics PhD Student

The goal of this project is to create a novel dataset by extracting records of firm-level idiosyncratic supply shocks from the text of Form 8-K “current report” filings filed with the Securities and Exchange Commission. To do this, we propose a method that utilizes an combination of machine learning, natural language processing, and crowdsourcing. We adopt machine learning techniques to identify documents that contain words indicative of certain types of firm-level supply shocks, use natural language processing to identify relevant entities described in the documents, and utilize crowd-sourcing to cheaply and accurately verify the categorization of the identified documents and to connect the extracted entities to the identified shock. Compared to a previous attempt that created such a database, our method greatly improves the accuracy of these firm-level shocks, and provides a clean and easily verifiable interpretation of the recorded shocks.