This notebook reproduces every table in LaBrie et al's 2008 paper on casino gambling behaviour. To get started, download the raw data from the link below from the transparency project's website. The data we need is **Raw Dataset 2** (**text version**) under the title '*Virtual Casino Gambling: February 2005 through February 2007*' towards the bottom of the page.

Once you've downloaded and extracted it, you should see a file called **RawDataSet2_DailyAggregCasinoTXT.txt** - copy this into the same directory as this notebook to begin.

The first step is to import the *gamba* framework, run the cell below to do so. If this cell throws an error, see the install documentation page to make sure you have *gamba* installed.

```
import gamba as gb
```

With *gamba* loaded, the next step is to get the data into a usable format. To do this, we call the `prepare_labrie_data`

method from the data module. This does two things, first it renames the columns to values compatable with the *gamba* framework, then it saves this newly compatable dataframe as a new csv file (in case it's needed elsewhere).

```
all_player_bets = gb.data.prepare_labrie_data('RawDataSet2_DailyAggregCasinoTXT.txt')
```

In two lines of code we're ready to start the analysis, and have each player's transactions individually saved in-case anything goes wrong or we want to take a sample. The next step is to load in the data we just prepared, this uses some magic from the glob library to load every CSV file in the `labrie_individuals/`

folder into the variable `all_player_bets`

.

If we want to do any other analysis on all of the players this is where we would add new methods, but let's crack on with calculating each of the measures described in the paper - which includes things like **frequency**, **duration**, **total amount wagered**, etc. Heads up: this calculation can take up to 10 minutes on a normal computer, so now is a great time to share this page with a colleague, or tweet us your feedback!

```
measures_table = gb.measures.calculate_labrie_measures(all_player_bets, loud=True)
```

The cell above took a while to finish, to make sure we don't have to do that computation again the output has been saved as `gamba_labrie_measures.csv`

next to this notebook. We'll come back to this file later to make sure this recreation matches the original, but lets keep going! Time for the first meaningful output, the first table in the original paper - which describes the measures we just calculated using basic statistics;

```
measures_table = gb.read_csv('gamba_labrie_measures.csv')
labrie_table = gb.statistics.descriptive_table(measures_table)
display(labrie_table)
```

Nice! Looks like the original! Next up is the Spearman's R coefficient matrix, which tells us how the measures relate to one-another. Run the next cell;

```
spearman_coefficient_table = gb.statistics.spearmans_r(measures_table)
display(spearman_coefficient_table)
```

Nice x2! Now that the first two tables from the paper have been reproduced, the measures need splitting into the top 5% and remaining 95% of players by their total amount wagered. The `split_labrie_measures`

method from the `gamba.studies`

module does this, returning the two splits as dataframes.

```
labelled_measures = gb.labelling.top_split(measures_table, 'total_wagered', loud=True)
```

With the two cohorts seperated, the last part of the paper uses the same descriptive table to present their differences. To reproduce that using *gamba*, we simply call the same method as the first table on each of the cohorts;

```
labelled_groups = gb.labelling.get_labelled_groups(labelled_measures, 'top_total_wagered')
top5_table = gb.statistics.descriptive_table(labelled_groups[1])
other95_table = gb.statistics.descriptive_table(labelled_groups[0])
display(top5_table, other95_table)
```

That's it! In around 10 lines of code the *gamba* framework can fully replicate the findings of LaBrie et al's 2008 paper. The most interesting question now is how to expand this analysis to uncover more details from the data, or to calculate new behavioural measures and see if they are useful in any way.