The large amount of business-related data available today allows researchers in entrepreneurship to explore new methodologies for data analysis. This paper aims to present an overview of the database provided by Crunchbase for research purposes. Founded in 2007, Crunchbase collects worldwide data on companies, investors, funding rounds and key people of the entrepreneurial ecosystem. As of May 2019, Crunchbase had collected records on 760,590 organizations (of which 708,558 companies), 121,509 investors of different types, 263,426 funding rounds, 890,429 people, 17,068 initial public offerings (IPO) and 89,959 acquisitions. The main purpose of this work is to give a detailed description of the Crunchbase database in order to highlight its potential and facilitate future researchers who intend to use this source of data. In order to achieve this goal, three main topics are covered. Since the database is provided in seventeen independent datasets, the linking logics have been reconstructed applying a reverse engineering approach. The relationships between the individual files have been identified and then summarized in an original diagram. For each dataset all the available variables are provided. Afterwards, in order to quantify the scope and coverage of the database, some key variables have been analysed, resulting in descriptive statistics for three areas of interest: companies, funding rounds and investors. Specifically, analysis is provided about the geographical distribution of companies, the number of companies per year of foundation and current operating status, the number of companies by amount and number of investments raised and as well as the number of investors by number and amount of investments made. Finally, some indications on the potential uses of Crunchbase for research in entrepreneurship are given. Considering the characteristics of the available variables we focused on the applications of machine learning algorithms for the analysis and modeling of equity investment processes.
Using Crunchbase for research in Entrepreneurship: data content and structure
Francesco Ferrati
;Moreno Muffatto
2020
Abstract
The large amount of business-related data available today allows researchers in entrepreneurship to explore new methodologies for data analysis. This paper aims to present an overview of the database provided by Crunchbase for research purposes. Founded in 2007, Crunchbase collects worldwide data on companies, investors, funding rounds and key people of the entrepreneurial ecosystem. As of May 2019, Crunchbase had collected records on 760,590 organizations (of which 708,558 companies), 121,509 investors of different types, 263,426 funding rounds, 890,429 people, 17,068 initial public offerings (IPO) and 89,959 acquisitions. The main purpose of this work is to give a detailed description of the Crunchbase database in order to highlight its potential and facilitate future researchers who intend to use this source of data. In order to achieve this goal, three main topics are covered. Since the database is provided in seventeen independent datasets, the linking logics have been reconstructed applying a reverse engineering approach. The relationships between the individual files have been identified and then summarized in an original diagram. For each dataset all the available variables are provided. Afterwards, in order to quantify the scope and coverage of the database, some key variables have been analysed, resulting in descriptive statistics for three areas of interest: companies, funding rounds and investors. Specifically, analysis is provided about the geographical distribution of companies, the number of companies per year of foundation and current operating status, the number of companies by amount and number of investments raised and as well as the number of investors by number and amount of investments made. Finally, some indications on the potential uses of Crunchbase for research in entrepreneurship are given. Considering the characteristics of the available variables we focused on the applications of machine learning algorithms for the analysis and modeling of equity investment processes.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.