COUPCAST: METHODOLOGY

Return to Main CoupCast page

This page provides a brief non-technical overview of the CoupCast methodology. For more extensive details, please visit our dataset page.

Coups, unlike other political crises that unfold over weeks, months, or years, are precisely timed events aimed at ousting a very specific individual from power. This precision means the risk of a coup may vary greatly over the course of a year. It can change instantaneously during transitions between leaders. For this reason, CoupCast estimates a unique risk of a coup attempt for every individual leader for each month he or she is in power.

To generate a predictive model, we collect data on coups and the conditions that decades of coup research link to coup plotting. These data include:

  • HISTORICAL COUP DATA on approximately 600 coup attempts dating back to 1920.
  • SOCIOECONOMIC CONDITIONS including GDP, economic growth, infant mortality, and extreme deviations in precipitation.
  • POLITICAL CONDITIONS including measures of militarization, democratization, regime longevity, and the timing and outcomes of elections and referendums.
  • POLITICAL VIOLENCE INDICATORS, both within a country and in the region.
  • REGIONAL SHOCKS that capture political instability, change, and economic shocks in nearby countries.
  • LEADER TRAITS, such as age, military experience, time in power, and method of entry.

We use historical data to find the statistical relationships between these variables and coup attempts and then use those relationships to forecast the risk of a coup against every world leader within the next month.

We produce monthly estimates in three stages:

Stage 1: Theoretically-Informed Models

Coup experts have worked for decades to quantify the correlations between coup attempts and numerous risk factors. We draw likely causes of coup attempts from this rich research literature and use a series of complementary log-log regressions to predict monthly coup risk. This form of regression assigns every month a coup probability ranging from 0% to 100% while allowing a large majority of the estimates to be very close to zero. This is ideal because between 1950 and 2016 a coup attempt occurred in only 0.3% of months at risk (less than 500 out of nearly 130,000 months a leader of a country was in power).

We train the model on the time period beginning in January 1950 and ending in December 1974. We use that period to predict the risk of a coup in 1975 and then add 1975 to our dataset to predict the risk of a coup during each month in 1976. This is repeated for every year so that the model is always making out-of-sample predictions about the future. Events in more recent years are weighted more heavily than events further in the past. In this way, the model draws on more historical data every year, yet recent coups influence estimates more than coups from decades earlier.

Stage 2: Machine Learning

The advantage of the first stage is that it draws on the large body of research on the causes of coup attempts to identify the factors that are widely expected to be drivers of coup plotting. This means it puts a strong emphasis on conditions such as democratization, recent instability, economic crises, and authoritarian regime types. However, predictions can be very sensitive to the researcher’s subjective decisions about how variables are used, transformed, and combined in a particular regression. Inductive “machine learning” can complement this approach with computing power.

By using a machine learning method called random forest, we can test hundreds or thousands of possible combinations of variables (“trees”) to identify combinations that predict coups especially well. A random forest model is built on an earlier idea in machine learning - “trees” or “tree-based” methods. A tree-based method selects the most powerful predictor (the ‘root’) of an event, such as a coup, and selects one of its values as a split point. It splits the data according to values above and below the split point, and selects the most powerful predictor in each of the new sets. Then it selects split points for those predictors and begins the process anew. Over some number of splits, the amount of data in any split gets sparse, or the tree gets very large, at which point the process halts. To make a prediction, one must simply start at the root and follow the splits until reaching the end of a branch. All the predictions at the end of a branch are the same.

Random forests, as one might suspect, are a collection of many trees. Each tree is constructed as described above, but with one key distinction: not all predictors can be considered at once. Each tree in a random forest is constructed by first randomly selecting a subset of the variables available for prediction, thus making it impossible to consider all variables at once. In so doing, each tree can be a little different, and each root can form with a variable that is not the most predictive. To make a prediction, random forests take the average splitting values from numerous trees to create predictions that represent the collective wisdom of many, many different explanatory models. These predictions often outperform the estimates produced by any single “tree.”

Stage 3: Combined Model

Interestingly enough, the more deductive Stage 1 approach and the more inductive Stage 2 approach are similarly good at predicting coup attempts, but they are much stronger when they are combined. We produce our final estimates by interacting the Stage 1 and Stage 2 results in a final complementary log-log regression. This effectively assigns a higher probability of a coup where both models agree coup risk is a high, a moderate coup risk where the models disagree, and an extremely low risk of a coup where both models agree coup risk is low. For more information on how the models perform against past coup activity, please visit our accuracy page.