Recently we’ve started working on an health & wellbeing app whose aim is to increase the overall quality of life of individuals. In the app, a Machine Learning module is applied to help understand the factors impacting the quality of life.
To build the module we used a large set of data from ca. 2500 real persons, who completed a complex questionnaire about their habits, behaviours and conditions. An additional question was asked about the perceived overall quality of life.
The questions asked were eg. “I smoke a lot of cigarettes”, “I get at least 7 hours of sleep everyday”, “I spend more than 9 hours at work per day”.
The aim of our initial work was to:
- predict how certain behaviours and conditions impact the perceived quality of life,
- understand how modifications to certain behaviours and conditions influence the overall quality of life
We started by selecting 7 key factors to test the approach, and the selection was made by an academic expert in the field.
We tested a few open source tools to solve our problem.
Keras is an API, it’s high level and written in Python. The original idea behind Keras was to enable fast experimentation with deep neural networks and to be able to get quick results, without being bogged down during the process. It sounded like a reasonable starting point for our test-drive. It is supported by Google.
PyTorch is the relatively newest solution (released in late 2016), but is based on a much more established Torch (2002). PyTorch has quickly gained popularity among academic researchers and other specialists who require optimisation of custom expressions.It is supported by Facebook.
SciKit learn is another relatively simple and efficient tool. It has multiple applications, and it’s website shows multiple applications: face recognition, outlier detection, visualising the stock market structure, etc. It is supported by France’s INRIA and a group of other organisations.
In initial prototyping, where where both results and costs (time, effort) are a factor, we had to base our choice on two essential factors:
- precision of prediction of the overall quality of life;
- ease of use.
While all solutions provided similar precision (around 90%, without a huge amount of work spent on fine-tuning), SciKit learn was the most effective in terms of configuration and use - and we chose it as our primary option.
To be fair, Keras and PyTorch play in a different league than SciKit. The former two are Deep Learning tools, which cannot be said about SciKit. Perhaps as the project develops, and more resources are allocated to feed more complex and sizeable data into Deep Learning tools, we will conduct solutions offering greater capacity.
However, as long as SciKit learn is doing the job, we encourage you to use it.
by Tomasz Gdula and Grzegorz Motriuk