Our Approach

Computer science is different from other sciences. Computer scientists usually focus more on algorithms and less on the datasets they are using. In addition, in computer science everything is either false or true. This is rarely the case in other scientific disciplines (Skiena 2017).

Data science is more like the other sciences in this regard. It is usually very complicated to construct data sets and probabilistic approaches play an important role. Data science is also the search for insights based on data and there are two different approaches (Skiena 2017).

The first approach is the hypotheses-driven paradigm. A research paper usually starts with a hypothesis based on the literature. A hypothesis might be that "an increase in the number of vaccinated people reduces the overall number of infections". The researcher would then search the relevant data and verify or falsify the hypothesis.

Another researcher might postulate that refugee children are more likely to become entrepreneurs as they are used to manage scarce resources. The researcher might then search data to verify or falsify the hypothesis.

The second approach is the data-driven paradigm which starts with a dataset and asks what kind of interesting questions can be addressed with it. This makes sense when there are too many variables to consider. Findings from this approach should always be taken with some caution as some effects might be purely random and non-replicable in other contexts. However, these findings can be corroborated with further analyses.

There are two common problems which can be seen in this space. Models either try to classify or predict with a sample of data. The first common problem is the classification of data points. The second common problem is to predict the outcome with a set of input variables. This is usually done with regression analyses.

The new data-driven approach changes the traditional operating model. Data is at the core of the business and operating model. These business models would not be possible without access to data.