Using Machine Learning and Natural Language Processing to Identify Firm-Level Supply Shocks in Textual Data

逗逼直播

NECESSARY COOKIES These cookies are essential to enable the services to provide the requested feature, such as remembering you have logged in.	ALWAYS ACTIVE
	Reject \| Accept
PERFORMANCE AND ANALYTIC COOKIES These cookies are used to collect information on how users interact with Chicago Booth websites allowing us to improve the user experience and optimize our site where needed based on these interactions. All information these cookies collect is aggregated and therefore anonymous.
FUNCTIONAL COOKIES These cookies enable the website to provide enhanced functionality and personalization. They may be set by third-party providers whose services we have added to our pages or by us.
TARGETING OR ADVERTISING COOKIES These cookies collect information about your browsing habits to make advertising relevant to you and your interests. The cookies will remember the website you have visited, and this information is shared with other parties such as advertising technology service providers and advertisers.
SOCIAL MEDIA COOKIES These cookies are used when you share information using a social media sharing button or 鈥渓ike鈥� button on our websites, or you link your account or engage with our content on or through a social media site. The social network will record that you have done this. This information may be linked to targeting/advertising activities.

Using Machine Learning and Natural Language Processing to Identify Firm-Level Supply Shocks in Textual Data

, Econ PhD Student
, Joint Program of Financial Economics PhD Student

The goal of this project is to create a novel dataset by extracting records of firm-level idiosyncratic supply shocks from the text of Form 8-K “current report” filings filed with the Securities and Exchange Commission. To do this, we propose a method that utilizes an combination of machine learning, natural language processing, and crowdsourcing. We adopt machine learning techniques to identify documents that contain words indicative of certain types of firm-level supply shocks, use natural language processing to identify relevant entities described in the documents, and utilize crowd-sourcing to cheaply and accurately verify the categorization of the identified documents and to connect the extracted entities to the identified shock. Compared to a previous attempt that created such a database, our method greatly improves the accuracy of these firm-level shocks, and provides a clean and easily verifiable interpretation of the recorded shocks.