When your Fitbit speaks more than it seems

Margot Marchal
10 min readAug 22, 2021

--

Lately, I have been reading few articles on the booming of the activity trackers. One more smart device that is now part of our life, that tracks all kind of parameters, from the distance you walk to your heartbeat rate and even your sleep. Some companies are claiming that those devices could actually make you fitter and healthier, and sometimes, they even mention some medical usages or purposes. So I was curious to see what the Fitbit tracker that I am wearing since February 2017 was saying about myself. On top, this source of data is a perfect opportunity for me to leverage my skills and knowledge on data analysis.

Link to access the Python code file

Getting the data

Getting the data from the device in order to analyse them was not that easy. Two options were possible to do it.

  • The first one would have been to go to your Fitbit personal account, and extract the data from there. With this method, you will have to extract parameter by parameter (distance, steps, floor,….) in json files. It is not possible to extract in one shot all the data. On top, you will have to extract every time the data of the full horizon, you cannot extract only the missing data. Overall, this process is time consuming and not really user friendly.
  • The second option consists in using the Fitbit API with your credentials. It allows you to extract only the data you miss (and not everything since the beginning) and add them to the already existing data frame that you have. On top, you can download all the parameters in one shot. It is then faster to do than the first method, but you need to rely on few functions that have to be coded.

In this analysis, the second option was chosen, because faster and more automatic once the proper hard codes have been developed. You can find all the details of the code to process the extraction in the GitHub folder linked to the article. The extraction of the data was split into two steps, the first one to collect the daily activity data and the second one to collect the trainings (workouts) related information. Once all the data extracted from the API, there were gathered in two data frames:

  • One data frame for the information on the steps, the distance, the calories, the floors and the different active minutes
  • And another one for the information on the workouts / trainings done

Data crunching

The data frames contained more than 4,5 years of data (from February 2017 until July 2021). The next step was to crunch and pre-process the data before going to the analysis.

First of all, to ease the process, we split the two big data frames into smaller ones, each of them focusing on one parameter (one for the walking related information, one for the elevation, one for the calories, one for the active minutes) and the final one for the trainings / workouts.

The second steps was to build different levels of granularity. The data was gathered at daily, monthly and yearly level, for each data frames. While grouping the information together, we used different statistical parameters: mean, min, max and sum. Here is one example for the monthly waking information :

Data analysis

A lot of informations are available in the data frames, here are only a sample of the parameters we have and their distribution.

This analysis will only focus on the steps / distance information in a first part, and then on the workouts in a second part. We will not go further into the exploration of the other parameters in this analysis.

We will try, exploring those data, to find out if there is any pattern or interesting facts that we can highlight.

Walking information

The Fitbit device used was the Fitbit One, very basic one, that is counting the steps, and calculating the distance from them. So the distance parameter is fully proportional to the step one. As a result, only one will be analyzed here: the distance (in km).

Looking at the distance, I walked on the last 4,5 years 16.798 km. It means that I walked approximately one third of the circumference of the Earth, or 5 times the perimeter of France, or even a return Amsterdam — Beijing.

Here is the distribution of the daily distance walked during the last 4,5 years.

On average, I walked 10,4 km a day. It is more than the 8 km advised by the World Health Organization.

Here is a split at the monthly level:

Let’s have a look at the distance walked by day of the week, no matter the year nor the month.

We can see from those two graphs that Fridays, Saturdays and Sundays are the days when I walk the more, on average. Knowing that I have an office job, and that I work from Monday until Friday, it is not surprising.

I did the same analysis focusing on the month of the year. Nothing significant can be flagged. I assumed that maybe the summer months would have a higher average distance than the other months, but it is not the case.

Once that being done, I was curious to cross those information of distance walked my location. Indeed, I really like to travel and I am used to travel a lot (before the worldwide COVID crisis), so it can be interesting to add this layer to the analysis. After few days going back in my photos, my flight tickets and my train tickets from the past 4,5 years, I managed to put together, at a daily bucket, the city where I was over the last 4,5 years. Linking this database together with the walking information ends up with some interesting outputs.

First of all, I displayed on a world map the countries with a color scale to indicates where I walked the more. The darker is the country, the more distance I walked there during the last 4,5 years. I had to adjust the scales to see a relevant coloration, it is not linear. At this stage, the city level is not used, so even if I visited only one city within the country, the whole country will be colored.

Not a surprise for me to see that the darker countries are France, Ireland and The Netherlands, the three countries where I lived in in the last 4,5 years. We can noticed a strong orange for the United States, even though I spent there only few weeks. Let’s keep that in mind and go back to it in few minutes.

As mention above, I managed to gather city level information, not only country level. The bigger the circle is, the more distance I walked there. Once again, I had to use a logarithmical scale to have a readable picture for the circle. Here is the overall picture:

In order to have a better view, let’s zoom in Europe to see what is the picture.

Here again, we see that the 3 biggest circles are on my home towns. The fourth one is on my family region, where I go quite often. No big surprise. For the other spots, we cannot see big differences between them.

Finally, I checked the top 10 days when I walked the more. Here are they, together with the city where I was at that time.

The fun fact is that none of the city we can see in this Top 10 were my “home city” at the time I reached this distances (We can see Amsterdam and Paris in the list, but when I reached those ‘records’, they were not my “home city”). Does it mean that I’m more lazy when at home, and I am less keen to go out walking ? I would not say so. Indeed, for me, the whole purpose of a city trip or holidays is to discover as much as I can the city where I am, the touristic attractions of course, but as well just ‘feel the Spirit’ of the place walking pointlessly and randomly in the streets. On top of that, we can see from the two previous map that the figures are speaking for themselves: I walked in my home cities / countries.

From that list, it is understandable that the United States came out that dark on the first map, 4 of my biggest days happened when I was there on holidays, it helps !

Workout data

In a second step I studied the data related to the workouts I registered in my Fitbit application for the last few years. The Fitbit device used to collect the data, as very basic, does not allow to capture automatically a training. The user have to fill and register manually the information of the training (duration, start time, type of exercise,…). Over the last 4,5 years, I did 1148 workouts.

In order to have some comparable data, I removed the type of exercises ‘Skiing’ and ‘biking’ that are day long exercises.

After a monthly consolidation, here are the results

It seems from this graph that from the first quarter of 2020, both the monthly number of workouts and their duration are higher. I tried to cross those information against the sanitary situation linked to COVID 19. Indeed, the lockdown started in March 2020. From that time, the gym facilities were closed and a lot of activities were impossible (travelling, shopping, cinemas…). Let’s have a look at the workouts data before and after the lockdown. I took as assumption two situations: Pre COVID versus Lockdown. Even though we had some months with looser restrictions after March 2020, those were nothing like the life before COVID, so I kept them in the Lockdown status.

Here are number of trainings and their duration in the two different status.

The results are quite obvious, we can clearly see that I increased the number of trainings I made in lockdown. To be honest, I did not really realized it before seeing the data. But looking at them, I can find some potential explanations. The first one is indeed the impossibility to have a normal life outside of working times. I mean, everything being closed, and having to stay at home, you have to find ways to kill time. Of course, the first activities I focused on were cooking, baking, watching TV, doing some Do It Yourself activities, but you can be quite fast bored about them. So I started to train at home. I had to find a new routine. I took this opportunity to explore new types of trainings, more diverse, that I found online. That may explains the increasing number of trainings I did. On top of that, the second explanation I see is that those lockdown months were not the more peaceful I knew. Indeed, with very few social interaction, working from home, without a proper ‘break’ between your professional time and your personal time, being stuck indoor, the level of stress was higher than in the former ‘normal’ situation. For me, doing sport was a very good way to release the tension, the stress that I could have experienced.

Conclusion

To conclude, exploring those data did not bring any big surprise in term of results. This put some facts and figures on ideas I could have guessed. But that allowed me to play a bit with data analysis tools and to leverage my skills. The next steps would be for me to explore the machine learning features. I tried to apply it on the Fitbit data I have, unfortunately, the dataset is not big enough to have any consistency outputs. I will have to use another project to explore that.

In the middle of this analysis, my (quite old) Fitbit Tracker passed away. I decided to replace it by another one, still a Fitbit, but one gathering more data, among them the heart rate, the sleep quality… So I may have more material in few years to be able to add another layer to this analysis.

--

--