This notebook reproduces every table in LaBrie et al's 2008 paper on casino gambling behaviour. To get started, download the raw data from the link below from the transparency project's website. The data we need is Raw Dataset 2 (text version) under the title 'Virtual Casino Gambling: February 2005 through February 2007' towards the bottom of the page.
Once you've downloaded and extracted it, you should see a file called RawDataSet2_DailyAggregCasinoTXT.txt - copy this into the same directory as this notebook to begin.
The first step is to import the gamba framework, run the cell below to do so. If this cell throws an error, see the install documentation page to make sure you have gamba installed.
import gamba as gb
With gamba loaded, the next step is to get the data into a usable format. To do this, we call the prepare_labrie_data
method from the data module. This does two things, first it renames the columns to values compatable with the gamba framework, then it saves this newly compatable dataframe as a new csv file (in case it's needed elsewhere).
all_player_bets = gb.data.prepare_labrie_data('RawDataSet2_DailyAggregCasinoTXT.txt')
In two lines of code we're ready to start the analysis, and have each player's transactions individually saved in-case anything goes wrong or we want to take a sample. The next step is to load in the data we just prepared, this uses some magic from the glob library to load every CSV file in the labrie_individuals/
folder into the variable all_player_bets
.
If we want to do any other analysis on all of the players this is where we would add new methods, but let's crack on with calculating each of the measures described in the paper - which includes things like frequency, duration, total amount wagered, etc. Heads up: this calculation can take up to 10 minutes on a normal computer, so now is a great time to share this page with a colleague, or tweet us your feedback!
measures_table = gb.measures.calculate_labrie_measures(all_player_bets, loud=True)
The cell above took a while to finish, to make sure we don't have to do that computation again the output has been saved as gamba_labrie_measures.csv
next to this notebook. We'll come back to this file later to make sure this recreation matches the original, but lets keep going! Time for the first meaningful output, the first table in the original paper - which describes the measures we just calculated using basic statistics;
measures_table = gb.read_csv('gamba_labrie_measures.csv')
labrie_table = gb.statistics.descriptive_table(measures_table)
display(labrie_table)
Nice! Looks like the original! Next up is the Spearman's R coefficient matrix, which tells us how the measures relate to one-another. Run the next cell;
spearman_coefficient_table = gb.statistics.spearmans_r(measures_table)
display(spearman_coefficient_table)
Nice x2! Now that the first two tables from the paper have been reproduced, the measures need splitting into the top 5% and remaining 95% of players by their total amount wagered. The split_labrie_measures
method from the gamba.studies
module does this, returning the two splits as dataframes.
labelled_measures = gb.labelling.top_split(measures_table, 'total_wagered', loud=True)
With the two cohorts seperated, the last part of the paper uses the same descriptive table to present their differences. To reproduce that using gamba, we simply call the same method as the first table on each of the cohorts;
labelled_groups = gb.labelling.get_labelled_groups(labelled_measures, 'top_total_wagered')
top5_table = gb.statistics.descriptive_table(labelled_groups[1])
other95_table = gb.statistics.descriptive_table(labelled_groups[0])
display(top5_table, other95_table)
That's it! In around 10 lines of code the gamba framework can fully replicate the findings of LaBrie et al's 2008 paper. The most interesting question now is how to expand this analysis to uncover more details from the data, or to calculate new behavioural measures and see if they are useful in any way.