The large amount of business-related data available today allows researchers in entrepreneurship to explore new methodologies for data analysis. This paper aims to present an overview of the database provided by Crunchbase for research purposes. Founded in 2007, Crunchbase collects worldwide data on companies, investors, funding rounds and key people of the entrepreneurial ecosystem. As of May 2019, Crunchbase had collected records on 760,590 organizations (of which 708,558 companies), 121,509 investors of different types, 263,426 funding rounds, 890,429 people, 17,068 initial public offerings (IPO) and 89,959 acquisitions. The main purpose of this work is to give a detailed description of the Crunchbase database in order to highlight its potential and facilitate future researchers who intend to use this source of data. In order to achieve this goal, three main topics are covered. Since the database is provided in seventeen independent datasets, the linking logics have been reconstructed applying a reverse engineering approach. The relationships between the individual files have been identified and then summarized in an original diagram. For each dataset all the available variables are provided. Afterwards, in order to quantify the scope and coverage of the database, some key variables have been analysed, resulting in descriptive statistics for three areas of interest: companies, funding rounds and investors. Specifically, analysis is provided about the geographical distribution of companies, the number of companies per year of foundation and current operating status, the number of companies by amount and number of investments raised and as well as the number of investors by number and amount of investments made. Finally, some indications on the potential uses of Crunchbase for research in entrepreneurship are given. Considering the characteristics of the available variables we focused on the applications of machine learning algorithms for the analysis and modeling of equity investment processes.

Using Crunchbase for research in Entrepreneurship: data content and structure

Francesco Ferrati
;
Moreno Muffatto
2020

Abstract

The large amount of business-related data available today allows researchers in entrepreneurship to explore new methodologies for data analysis. This paper aims to present an overview of the database provided by Crunchbase for research purposes. Founded in 2007, Crunchbase collects worldwide data on companies, investors, funding rounds and key people of the entrepreneurial ecosystem. As of May 2019, Crunchbase had collected records on 760,590 organizations (of which 708,558 companies), 121,509 investors of different types, 263,426 funding rounds, 890,429 people, 17,068 initial public offerings (IPO) and 89,959 acquisitions. The main purpose of this work is to give a detailed description of the Crunchbase database in order to highlight its potential and facilitate future researchers who intend to use this source of data. In order to achieve this goal, three main topics are covered. Since the database is provided in seventeen independent datasets, the linking logics have been reconstructed applying a reverse engineering approach. The relationships between the individual files have been identified and then summarized in an original diagram. For each dataset all the available variables are provided. Afterwards, in order to quantify the scope and coverage of the database, some key variables have been analysed, resulting in descriptive statistics for three areas of interest: companies, funding rounds and investors. Specifically, analysis is provided about the geographical distribution of companies, the number of companies per year of foundation and current operating status, the number of companies by amount and number of investments raised and as well as the number of investors by number and amount of investments made. Finally, some indications on the potential uses of Crunchbase for research in entrepreneurship are given. Considering the characteristics of the available variables we focused on the applications of machine learning algorithms for the analysis and modeling of equity investment processes.
2020
Proceedings of the 19th European Conference on Research Methodology for Business and Management Studies (ECRM)
19th European Conference on Research Methodology for Business and Management Studies
9781912764594
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3341496
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 17
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact