Compute behavioural measures from raw transaction data
INFO:tensorflow:Enabling eager execution
INFO:tensorflow:Enabling v2 tensorshape
INFO:tensorflow:Enabling resource variables
INFO:tensorflow:Enabling tensor equality
INFO:tensorflow:Enabling control flow v2


## Overview

A behavioural measure is a number derived from a player's transaction data. This data needs to have specific columns depending on the measure you want to calculate. For example, measures in the bet domain require the bet_size column. For each of the measures you plan to use, make sure that the data you have in the gamba standard format (see check_data) has the correct values.

Once your data has all of the right columns, the simplest way to get started is to call the create_measures_table method below, giving it one or more players bets in the gamba standard format.

#### create_measures_table[source]

create_measures_table(all_player_bets, measures, daily=False)

Creates a measures table from a collection of bets in the gamba standard format. The second parameter 'measures' should be a list of names from the measures module. See available_measures or the table below for currently available options.

## Behavioural Measures

Not all measures need the same data to be computed. The gamba library distinguishes between three primary domains (time, cost, and loss), and includes a final other domain to include everything else.

### Time Domain

The time domain consists of behavioural measures that can be computed using only the knowledge of when the bets took place. This means using only values from the bet_time column in the player bet dataframe.

#### duration[source]

duration(player_bets)

The number of days between the first bet and the last.

#### frequency[source]

frequency(player_bets)

The number of days that included at least one bet. Note that this method returns the raw number of days.

#### frequency_aggregated[source]

frequency_aggregated(player_bets)

Number of active betting days in first month, if data set contains daily aggregate data.

#### frequency_percent[source]

frequency_percent(player_bets)

The percentage of days within the duration that included at least one bet. Note that this method returns the actual percentage value e.g. 75, not the raw number of days or values in the interval [0,1].

#### number_of_bets[source]

number_of_bets(player_bets, daily=False)

The total number of bets made.

#### mean_bets_per_day_aggregated[source]

mean_bets_per_day_aggregated(player_bets)

Mean number of bets per active betting day in first month (intensity), if data set contains daily aggregate data.

#### mean_bets_per_day[source]

mean_bets_per_day(player_bets)

#### median_bets_per_day[source]

median_bets_per_day(player_bets)

Aggregates daily bets and returns the median number of bets placed across all active betting days.

#### num_sessions[source]

num_sessions(player_bets, session_window=20)

The number of gambling sessions a player played, where a session is a sequence of bets each within some time (session_window) of the previous bet.

#### total_play_time[source]

total_play_time(player_bets, session_window=20)

The sum of all session lengths.

#### mean_sessions_per_day[source]

mean_sessions_per_day(player_bets)

#### median_sessions_per_day[source]

median_sessions_per_day(player_bets)

#### mean_session_duration[source]

mean_session_duration(player_bets)

#### median_session_duration[source]

median_session_duration(player_bets)

#### mean_play_time[source]

mean_play_time(player_bets)

#### median_play_time[source]

median_play_time(player_bets)

#### median_bets_per_day[source]

median_bets_per_day(player_bets)

Aggregates daily bets and returns the median number of bets placed across all active betting days.

#### num_bets_deviation_per_day[source]

num_bets_deviation_per_day(player_bets)

Standard deviation of the number of bets placed per active betting day.

#### inactive_day_streak_variance[source]

inactive_day_streak_variance(player_bets)

#### persistence[source]

persistence(player_bets)

#### bet_count_trajectory[source]

bet_count_trajectory(player_bets)

#### mean_bets_per_session[source]

mean_bets_per_session(player_bets)

#### median_bets_per_session[source]

median_bets_per_session(player_bets)

#### bets_per_session_variance[source]

bets_per_session_variance(player_bets)

#### session_duration_variance[source]

session_duration_variance(player_bets)

#### session_duration_trajectory[source]

session_duration_trajectory(player_bets)

#### sawtooth_occurances[source]

sawtooth_occurances(player_bets, threshold=1)

#### active_day_trajectory[source]

active_day_trajectory(player_bets, time_period_length=datetime.timedelta(days=30))

### Cost Domain

The cost domain contains all behavioural measures which can be computed using additional values from the bet_size column. As such, they describe spending behaviours as opposed to engagement behaviours as above.

#### total_wagered[source]

total_wagered(player_bets)

The total amount wagered (sum of bet sizes).

#### mean_bet_size[source]

mean_bet_size(player_bets, daily=False)

The average (mean) size of bets.

#### median_bet_size[source]

median_bet_size(player_bets)

Median bet size, as used in Xuan 2009

#### max_bet[source]

max_bet(player_bets)

#### min_bet[source]

min_bet(player_bets)

#### bet_size_range[source]

bet_size_range(player_bets)

#### bet_size_deviation[source]

bet_size_deviation(player_bets, daily=False)

Standard deviation of stake size in first month, if data set contains daily aggregate data.

#### trajectory[source]

trajectory(player_bets, daily=False, plot=False)

Gradient of a linear regression fitted to the sequence of daily aggredated bet sizes.

#### mean_amount_wagered_per_day[source]

mean_amount_wagered_per_day(player_bets)

#### median_amount_wagered_per_day[source]

median_amount_wagered_per_day(player_bets)

#### amount_wagered_per_day_variance[source]

amount_wagered_per_day_variance(player_bets)

#### mean_amount_wagered_per_session[source]

mean_amount_wagered_per_session(player_bets)

#### median_amount_wagered_per_session[source]

median_amount_wagered_per_session(player_bets)

#### total_amount_wagered_across_duration[source]

total_amount_wagered_across_duration(player_bets)

#### amount_wagered_per_session_variance[source]

amount_wagered_per_session_variance(player_bets)

#### bet_size_trajectory[source]

bet_size_trajectory(player_bets)

### Loss Domain

The loss domain includes all behavioural measures which require the additional information of the size of the payout received as a result of each bet. This data should be held in the payout_size column if available.

#### net_loss[source]

net_loss(player_bets)

The net amount lost (sum of bet sizes minus sum of payout sizes). This is functionally identical to the negative of total amount won

#### percent_loss[source]

percent_loss(player_bets)

The net_loss as a percentage of total_wagered.

#### payout_bet_count_ratio[source]

payout_bet_count_ratio(player_bets)

#### payout_bet_size_ratio[source]

payout_bet_size_ratio(player_bets)

#### mean_net_loss_per_session[source]

mean_net_loss_per_session(player_bets)

#### median_net_loss_per_session[source]

median_net_loss_per_session(player_bets)

#### sum_of_payouts[source]

sum_of_payouts(player_bets)

#### relative_big_win[source]

relative_big_win(player_bets, top)

#### big_win[source]

big_win(player_bets)

#### mean_payout_size[source]

mean_payout_size(player_bets)

#### median_payout_size[source]

median_payout_size(player_bets)

#### overall_binary_loser[source]

overall_binary_loser(player_bets)

#### clamped_net_win[source]

clamped_net_win(player_bets)

#### net_loss_on_last_day[source]

net_loss_on_last_day(player_bets)

#### mean_loss_per_bet[source]

mean_loss_per_bet(player_bets)

The median amount lost per bet (bet size minus payout).

#### median_loss_per_bet[source]

median_loss_per_bet(player_bets)

The median amount lost per bet (bet size minus payout).

### Other Domain

All measures not in the root, bet, or loss domains, require some additional information such as the house edge for each bet, the game being played, etc. These are currently broadly grouped into the 'other' domain' but as the library (and academic)'s capabilities grow, this domain will likely be split further.

#### theoretical_loss[source]

theoretical_loss(player_bets)

The product of bet size and house advantage, also referred to as the gross gaming revenue.

#### mean_odds_per_bet[source]

mean_odds_per_bet(player_bets)

#### median_odds_per_bet[source]

median_odds_per_bet(player_bets)

#### num_providers_used[source]

num_providers_used(player_bets)

The number of different gambling game providers used by the player.

#### mean_game_types_per_day[source]

mean_game_types_per_day(player_bets)

The mean number of unique game types played per active betting day. Note this does not account for the number of each game type played, only that they have been played at least once.

## Study Collections

The measures module also contains some convenience methods which accept full existing data sets of players bets and compute the collection of behavioural measures used in a given study.

#### calculate_labrie_measures[source]

calculate_labrie_measures(all_player_bets, savedir='', filename='gamba_labrie_measures.csv', loud=False, daily=True)

Calculates the set of measures described in LaBrie et al's work in 2008 on casino gamblers. These measures include the durations, frequencies, number of bets, bets per day, value per bet (eth), total amount wagered, net loss, and percent loss for each player. As this method sits in the studies module, it accepts a list of dataframes representing each player's bets as input. By default, this method saves the resulting dataframe of each player's measures to 'gamba_labrie_measures.csv'. Be advised: this method can take some time for large numbers of players, the 'loud' parameter can be set to True to print out updates every 200 players.

#### calculate_braverman_measures[source]

calculate_braverman_measures(all_player_bets, savedir='', loud=False)

Calculates the set of measures described in Braverman and Shaffer's work in 2010 on high risk internet gamblers. These measures include the frequency, intensity, variability, and trajectories of each player. As this method sits in the studies module, it accepts a list of dataframes representing each player's bets as input. By default, this method saves the resulting dataframe of each player's measures to 'gamba_braverman_measures.csv'.

## Utility Functions

This module also contains a utility function for standardising each of the measures in a measures table, aggregating bets, checking measure data, and other things. It's unlikely that you'll need to use these methods directly in your own work, but are used in some of the higher-level methods in the library.

#### available_measures[source]

available_measures()

Returns a list of valid measures that the gamba library can compute. This list can be used in whole or in part by the create_measures_table method to compute these measures across one or more players bets.

#### check_measure_data[source]

check_measure_data(player_bets, required_columns)

Compares the columns found in a dataframe of player bets to a supplied list of column names. If any of the required_column names are not found, an exception is raised reporting the error.

#### get_sessions[source]

get_sessions(player_bets, session_window=30)

Splits a single players bet dataframe into a collection of dataframes, each representing a single session. The session_window parameter determines the maximum number of minutes between bets for them to be considered part of the same session.

#### get_daily_bets[source]

get_daily_bets(player_bets)

#### standardise_measures_table[source]

standardise_measures_table(measures_table)

Standardises all measures columns in a measures table by applying the scipy.stats.zscore function to each column. This is useful for column-wise comparisons and some clustering methods, but use with caution!

#### split_measures_table[source]

split_measures_table(measures_table, frac=0.7, loud=False)

Splits a measures table into two randomly selected groups. This is useful for machine learning methods where a train-test split is needed, and uses the Pandas library's sample method.

#### generic_trajectory[source]

generic_trajectory(sequence, plot=False)

Gradient of a linear regression fitted to a one dimensional sequence. Generic utility method to be used in more specific cases.

## Visualisation

The measures module also has a method for plotting the position of an individual in the context of the rest of the population. This is useful for exploratory work on outliers and similar things.

#### plot_individual[source]

plot_individual(measures_table, player_id)

Plots an individual's position in the population across each of the behavioural measures in the measures table. Note that this will output one bar chart for each behavioural measure.

#### plot_continuous_time_domain[source]

plot_continuous_time_domain(player_bets)

#### discretise_player_bets[source]

discretise_player_bets(player_bets, days=7)

#### plot_discrete_time_domain[source]

plot_discrete_time_domain(player_bets, days=7)

#### plot_session_domain[source]

plot_session_domain(player_bets, session_window=30)

#### plot_cost_domain[source]

plot_cost_domain(player_bets)

#### show_domain_differences[source]

show_domain_differences()