INFO:tensorflow:Enabling eager execution INFO:tensorflow:Enabling v2 tensorshape INFO:tensorflow:Enabling resource variables INFO:tensorflow:Enabling tensor equality INFO:tensorflow:Enabling control flow v2
The data module can be used to load in transaction data sets from multiple sources into a format that the gamba library can use. This format is needed to create a measures table, and is simply a dataframe with fixed column names. Read through this page to find out how it works, plus some simple plotting at the end. This module also contains a number of data loading functions for existing public data repositories so you can replicate or extend work right from the start.
This can be used to read a regular CSV file as you'd expect, but can also be used for tab-separated files - as some of the transparency project's data sets are.
The gamba library's methods expect a dataframe with certain column names. The most important task after loading data as a pandas dataframe is to set these column names according to the type of data the column contains. These names should match the following table for basic data;
|player_id||a unique identifier for each player|
|bet_size||the size of the bet (in raw currency form, e.g. USD)|
|bet_time||the datetime the bet was placed|
|payout_size||the size of the payout (also in raw currency)|
Advanced data sets may contain more information about each bet. These additional columns can be included using names from the table below. Note that methods in other parts of the library will reject dataframes which contain column names not in one these two tables.
|payout_time||the timestamp that the payout was paid|
|decimal_odds||the decimal odds for the given bet|
|house_edge||the percentage taken by the house (value of 3 for 3% house edge)|
|game_type||the game being played as a string e.g. 'coinflip', 'roulette' - doesn't have to be one of a fixed set but should be unique per game type|
|provider||the operator's name - this is useful for mixed operator datasets|
Several public repositories provide transaction data that can be loaded by the gamba library (see Public Repositories in the menu). The data module contains methods for loading some of these sets into the correct format, which are used in the respective replications. If you're loading in a similar data set, feel free to explore the source code of these methods to see how it's done, and modify them for your own needs!
Splits the original labrie data into CSV files for each individual's transactions and renames the columns to be compatable with the rest of the gamba library.
Splits the original Braverman and Shaffer data into CSV files for each indivdiual's transactions, and renames the columns to be compatable with the rest of the gamba library.
Loads in the analytic data set of high-risk internet gamblers and removes the UserID, Sereason, random, and clustering columns as described in Philander's 2014 study.
There's also an
generate_bets method which returns a set of example transactions (entirely synthetic) to use as an example throughout the docs or to compare against your own data.
It's good practice to check that your column names match those used by the gamba library, and make sure that no extra columns exist. The
check_data method below can be given the dataframe, and it will raise an error if anything isn't as it should be;
Make sure that your data is in the gamba standard format (has the right column names). This method will raise an exception if the format is incorrect, and will do nothing if it is correct.
Prints out some basic information about a gambling or gambling-like application given a collection of bets made through that application. Data displayed includes the number of users, the number of game types provided, the number of bets placed, the total value of the bets and the payouts, the time of the first bet, and the time of the last.
['coinflip', 'onedice', 'twodice', 'roll'])
Create a table containing summary data for providers using a collection of player bets. Summary includes the number of unique users and games, the total value of bets and payouts, the starting and ending block numbers, and the time the starting and ending blocks occurred.
The data module contains some basic visualisation methods which can be applied before any behavioural measures are calculated. This is useful for showing the distributions of player bet sizes, times, payouts, and so on.
Creates a candlestick-style plot of a players betting activity over the course of their career. This works best on regularly-spaced sequential data but can also provide insight into intra-session win/loss patterns.
Plot a player's betting and payout trajectory on a single plot, with green indicating payouts (top) and red indicating bets (bottom). A cumulative value line is also plotted between the two. Note that the player_df must include both bet_size and payout_size columns.