INFO:tensorflow:Enabling eager execution INFO:tensorflow:Enabling v2 tensorshape INFO:tensorflow:Enabling resource variables INFO:tensorflow:Enabling tensor equality INFO:tensorflow:Enabling control flow v2
A behavioural measure is a number derived from a player's transaction data. This data needs to have specific columns depending on the measure you want to calculate. For example, measures in the bet domain require the bet_size column. For each of the measures you plan to use, make sure that the data you have in the gamba standard format (see
check_data) has the correct values.
Once your data has all of the right columns, the simplest way to get started is to call the
create_measures_table method below, giving it one or more players bets in the gamba standard format.
Creates a measures table from a collection of bets in the gamba standard format. The second parameter 'measures' should be a list of names from the measures module. See
available_measures or the table below for currently available options.
Not all measures need the same data to be computed. The gamba library distinguishes between three primary domains (time, cost, and loss), and includes a final other domain to include everything else.
The time domain consists of behavioural measures that can be computed using only the knowledge of when the bets took place. This means using only values from the bet_time column in the player bet dataframe.
The number of gambling sessions a player played, where a session is a sequence of bets each within some time (session_window) of the previous bet.
The cost domain contains all behavioural measures which can be computed using additional values from the bet_size column. As such, they describe spending behaviours as opposed to engagement behaviours as above.
The loss domain includes all behavioural measures which require the additional information of the size of the payout received as a result of each bet. This data should be held in the payout_size column if available.
All measures not in the root, bet, or loss domains, require some additional information such as the house edge for each bet, the game being played, etc. These are currently broadly grouped into the 'other' domain' but as the library (and academic)'s capabilities grow, this domain will likely be split further.
The mean number of unique game types played per active betting day. Note this does not account for the number of each game type played, only that they have been played at least once.
The measures module also contains some convenience methods which accept full existing data sets of players bets and compute the collection of behavioural measures used in a given study.
Calculates the set of measures described in LaBrie et al's work in 2008 on casino gamblers. These measures include the durations, frequencies, number of bets, bets per day, value per bet (eth), total amount wagered, net loss, and percent loss for each player. As this method sits in the studies module, it accepts a list of dataframes representing each player's bets as input. By default, this method saves the resulting dataframe of each player's measures to 'gamba_labrie_measures.csv'. Be advised: this method can take some time for large numbers of players, the 'loud' parameter can be set to True to print out updates every 200 players.
Calculates the set of measures described in Braverman and Shaffer's work in 2010 on high risk internet gamblers. These measures include the frequency, intensity, variability, and trajectories of each player. As this method sits in the studies module, it accepts a list of dataframes representing each player's bets as input. By default, this method saves the resulting dataframe of each player's measures to 'gamba_braverman_measures.csv'.
This module also contains a utility function for standardising each of the measures in a measures table, aggregating bets, checking measure data, and other things. It's unlikely that you'll need to use these methods directly in your own work, but are used in some of the higher-level methods in the library.
Compares the columns found in a dataframe of player bets to a supplied list of column names. If any of the required_column names are not found, an exception is raised reporting the error.
Splits a single players bet dataframe into a collection of dataframes, each representing a single session. The session_window parameter determines the maximum number of minutes between bets for them to be considered part of the same session.
Standardises all measures columns in a measures table by applying the scipy.stats.zscore function to each column. This is useful for column-wise comparisons and some clustering methods, but use with caution!
Splits a measures table into two randomly selected groups. This is useful for machine learning methods where a train-test split is needed, and uses the Pandas library's sample method.
The measures module also has a method for plotting the position of an individual in the context of the rest of the population. This is useful for exploratory work on outliers and similar things.
Plots an individual's position in the population across each of the behavioural measures in the measures table. Note that this will output one bar chart for each behavioural measure.