Forecasting Newsletter: June 2021

Nuño Sempere

Jul 01, 2021

Highlights

Some Superforecasters start a substack, as does Dominic Cummings
Alex Lawsen and I published Alignment Problems With Current Forecasting Platforms on the arxiv.
What if Military AI is a Washout? considers a future in which AI ends up affecting war not because of its overwhelming dominance, but by changing war's tradeoffs and best practices.

Index

Prediction Markets & Forecasting Platforms
In The News
Papers
Blog Posts

Browse past newsletters here, or view this newsletter on the EA forum here.

Prediction Markets & Forecasting Platforms

CSET-Foretell

The Superforecasting workshop for "Foretell Pros" has ended. An tidbit I learnt from it is that, unlike prediction markets, forecasting platforms can look at the covariance between forecasters—whether two forecasters' predictions are closer or further apart than average—and update on it. That is, if two forecasters who often disagree instead agree on a question, that is evidence that their side is correct (h/t Eva Chen).

CSET has also been collaborating with Ought, and has given "Foretell Pros" access to Oughts GPT-3 assistant capabilities. I'm unclear on how often Ought's tools will be used in practice by forecasters.

Good Judgment Open

The World Ahead: What If? (a)—a new Good Judgment Open tournament in collaboration with The Economist—presents five long-term questions. This is something I don't recall seeing before, and I'm glad to see that Good Judgment Open is dipping its toes into the tricky business of long-term predictions. The questions will not be scored, perhaps because Good Judgment Open uses an improper scoring rule which gets worse for longer term questions, see below.

Good Judgment Open dips its toes into the tricky business of long-term predictions, and presents five questions in a new tournament The World Ahead: What If? (a), in collaboration with The Economist.

Metaculus

Metaculus has a new redesign (a) in progress, and an accompanying blogpost (a) by Metaculus' CEO. Some discussion can be found on Metaculus itself here (a).

Metaculus also launched the Trade Signal tournament (a), where Metaculus users attempt to predict economic indicators which might be used to make trades. For this, they are looking for a "Community Trader" (a). So far, the one candidate (a) seems very formidable.

Michael Aird, of Rethink Priorities, organized the Nuclear Risk Forecasting Tournament (a). Questions can be found here (a).

The 20/20 Insight Forecasting Contest (a) has concluded. Winners can be seen here (a).

SimonM kindly curated the top comments from Metaculus this past June. They are:

Koji writes at length about whether or not the Iran Nuclear deal will be restarted in 2021
SimonM models potential meat demand decreasing, by reusing historical data from the "decline" in smoking
chrisjbillington thinks the Metaculus community forecast hasn't adjusted enough for the delta variant
SimonM models the Senate as a random walk to estimate the odds the GOP hold it for the next 10 years
elifland_ought and EvanHarper share some forecasts for how Japan will perform at their home Olympics
NunoSempere brings together a collection of other forecasts to estimate both the likelihood of the lab leak hypothesis AND whether or not the government will acknowledge it.
[whaffner] (https://www.metaculus.com/questions/7330/community-trader-election/#comment-64119) runs for the Community Trader position
Trey Goff, Chief of Staff at Próspera, weighs in on the forecasting question about Próspera

Polymarket

Polymarket has at times been nigh-unusable because of network congestion and dependency failures. Polygon, the second layer solution for Ethereum which Polymarket uses, has been becoming more popular, so costs to make transactions (gas costs) have increased, and the infrastructure needed to process those transactions has at times been taxed beyond capacity. In response, Polymarket has increased the gas prices which its contracts were willing to pay; this doesn't really affect users because even comparatively high gas prices on Polygon are at most cents.

In addition, The Graph (a)—a service which Polymarket was relying on to let its webpage know what its blockchain contracts were doing—has also been suffering from constant failures, presumably also as a result of scaling pains.

Polygon itself has also been accused of being insecure, because 5 out 8 developers and early community members ("multisignature key holders") could conspire to upgrade Polygon's protocol. Here are two (a) letters (a) from "DeFi Watch".

Polymarket contentiously resolved its "Will NYC fully reopen by July 1?" (a) positively. Here (a) is someone on twitter making the case for a "No" resolution, and here (a) is the case for a "Yes" resolution, whereas here (a) is Polymarket's rationale for resolving it positively, as they did. Polymarket also prematurely resolved Will Joe Biden be President of the USA on June 30, 2021? (a) as a "Yes", and they are reimbursing market participants who held "No" positions.

On the positive side, Polymarket passed its one year aniversary (a) this month, and organized a party. Some community members were invited and reimbursed for their travel expenses.

A Polymarket community member has released a polymarket trading tool (a), which allows users to interact with Polymarket's Polygon contracts directly, without having to use Polymarket's frontend. Polymarket has also added some rudimentary search functionality to their frontpage.

In more detail: Why and how could Polygon multisignature key holders steal user's funds?

A key point of contention is whether upgrading Polygon's protocol could be used to straight-out steal user's assets (or just make the platform unusable). Answering that question would require understanding some of the finer points on cross-chain communication, which are a bit beyond me. In particular, what the multisignature key holders would be stealing wouldn't directly be the valuable USDC, or ETH assets, but rather a Doppelgänger (a) of those assets, a clone asset on the Polygon Chain which is guaranteed to be redeemable for original tokens, originals which are safely stashed away in the Ethereum Chain. See: Moving assets to Polygon (a), and wrapped tokens (a).

Its possible that stealing the Doppelgänger tokens would just make them instantly worthless. More specifically, because USDC is controlled by a central authority (a), it could just refuse to honor stolen tokens. However, the malicious multisignature key holders could steal users' assets, and then very quickly swap those assets for decentralized assets (like DAI)), using Uniswap (a); they could then disappear using Tornado Cash (a). This would normally not be possible, but in this case, the process to upgrade Polygon's protocol is not under a timelock (a): there is no enforced waiting period between the announcement of an upgrade and when that upgrade takes effect.

In the short term, I'm not actually too worried, and I'm keeping my assets on Polymarket, on Polygon. But in the medium to long term, the probability of things like regulatory attacks or plain old human unreliability or malice start to add up.

Superforecasters

A shrewdness (a) of superforecasters has started a substack (a), so far featuring fortnightly forecasts of in fashion affairs (a).

Others

I added Rootclaim (a) to Metaforecast (a), and fixed a bug due to which some CSET-Foretell questions were not getting included (h/t Michał Dubrawski). I've also rewritten the back-end code to make keeping a history of predictions feasible. This might produce some interesting comparative research in the coming months.
Kalshi (a) may have started to allow trades, but I can't verify this because its only open to US residents.

In the News

NPR (a) has gotten some economists—who disagree with each other—to make quantifiable predictions, and to promise to come back in a couple of months to analyze what they got right or wrong: h/t @CrunchWrapSupreme.

OK, so here's what's going to happen on today's show. We're going to have two economic forecasters, Diane and Alfredo, who are going to make specific predictions about what is going to happen with jobs, with inflation and with housing in the United States for the rest of the year. Also, unlike some economic forecasters who make their predictions and then sort of disappear if they get things wrong, Alfredo and Diane have courageously agreed to come back on the show in January and talk about both what they got right and what they got wrong. And when they come back, we'll see whose forecasts were closer to reality.

The Rise Fund Announces $100 Million Strategic Investment in Climavision (a). The Rise Fund is one of the largest, if not the largest, impact investment funds. The investment is supposed to improve weather forecasting. Taken directly from the press release:

Climavision was formed out of Enterprise Electronics Corporation (EEC), the world’s largest privately held commercial supplier of weather radar systems. EEC, which is majority controlled by the Cookes family, has [...] more than 1,200 installations across 95 countries. By combining lower altitude, proprietary data with [...] machine learning and AI technology, Climavision [...] provides [...] higher resolution and more accurate forecasting to address [...] coverage gaps left by existing radar networks across the U.S.
"As weather patterns become increasingly unpredictable and volatile due to climate change, the need for higher-quality regional and hyper local weather data has never been more pronounced," said Climavision Co-Founder and CEO Chris Goode. "Climavision’s increased coverage and improved weather information enables earlier and more accurate weather forecasts that can save lives, limit business disruption, and improve the lives of people and communities across the country."

There has recently been a heat wave in the US. Compare coverage from Fox (a), from the Associated Press (a) and from Reuters (a).

European data monopoly hurt forecasts of deadly eruption, Congolese researchers charge (a).

On 22 May, Mount Nyiragongo, perhaps the most dangerous volcano in the world, erupted in a show of fire. Lava swept toward the city of Goma in the Democratic Republic of the Congo (DRC), pushing thousands from their homes and killing dozens. Although the volcano has since settled down, a new flashpoint has erupted at the geophysical observatory that monitors it.
In a 2 June open letter addressed to the DRC’s president, staff at the Goma Volcano Observatory (GVO) have condemned what they say is corruption by the observatory’s Congolese leadership. They also accuse European partners of a "neocolonial" attitude and of depriving them of timely data that might have allowed them to provide early warnings of eruptions.
Signed by union leader Zirirane Bijandwa Innocent on behalf of the dozens of staff researchers and technicians, the letter alleges that GVO leaders squandered money from international donors, failed to pay staff for months, and even had some researchers arrested for complaining about the situation. It also charges that the Royal Museum for Central Africa (MRAC) in Belgium and the European Centre for Geodynamics and Seismology (ECGS) in Luxembourg, long-term partners with GVO, wield too much influence over its leadership. The letter says the observatory "was taken hostage... by a small group of scientific neo-colonialists" who shut out local experts and focused on their own volcanology research at the expense of developing local capacity to monitor geohazards.
The cuts left the observatory unable to afford even an internet connection. That deprived GVO of real-time data from a network of seismometers and GPS stations deployed across the region by MRAC and ECGS since 2012. These devices can detect the small tremors and movements of Earth’s surface that can precede eruptions, as magma rises inside a volcano. The sensors send their data directly to ECGS before being returned to GVO.
Another dispute concerns whether the eruption could have been predicted. In presentations at GVO on 26 April and 10 May after they regained access to the data, staff seismologists highlighted tremor activity that might indicate magma rising through cracks, according to the letter and to Science’s source at the observatory. They urged GVO leadership to send teams out to make field observations, but nothing happened. The complainants allege that GVO leaders deferred to advice from their European partners.

Papers

In Alignment Problems With Current Forecasting Platforms (a), my coauthor Alex Lawsen and I expand upon our earlier Incentive Problems With Current Forecasting Competitions (a). We classify current problems as more or less either reward specification problems or more or less principal-agent problems. Reward specification problems are those which incentivize forecasters to behave in ways which are not useful from the perspective of the accuracy of the broader system.

For instance, some platforms:

incentivize people to make forecasts on lots of questions even if they have no particular information advantage,
disincentivize forecasters to forecast even if they know the true word-from-God probability exactly,
strongly disincentivize people from sharing information,
etc.

With regards to principal-agent problems, forecasters also sometimes stop trying to maximize their expected score, and instead start optimizing for other metrics. For example, discrete prizes create incentives to be in the top people who get prizes, or in the top few spots where people can brag that they won a tournament. We try to analyze this effect quantitatively. We also prove that some platforms, like Good Judgment Open or CSET-Foretell, straight out use an improper scoring rule, where participants can get a better score in expectation by inputting something other than their true probability.

I thought that this was going to be a big deal, because Superforecasters are chosen from Good Judgment Open, but per Good Judgment Inc, the effect probably turns out to be small. As a tidbit from history, IARPA's ACE tournament also used an improper scoring rule, but other groups besides the Good Judgment Project thought that it would be too much of a hassle to change.

In any case, each of the alignment problems we identify can manifest itself in different ways. Forecasters can consciously follow their flawed incentives. But it is also the case that each alignment failure adds noise to the ranking of forecasters (even if the noise is random). More spookily, forecasters also interpret their scores (or the monetary reward in the case of a tournament) as feedback. So to the extent that this feedback is flawed, forecasters might implicitly learn the wrong lessons.

This last possibility is particularly worrisome to me because "the feeling of a 80%", or "the feeling of updating from an 80% to a 60%" is for me something fairly intuitive. Thus, it is something which I could imagine could be vulnerable to flawed training. See Unconscious Economics (a) for an elaboration of the point that incentives don't have to consciously be followed to affect outcomes.

Many of the problems above are solved by prediction markets. But prediction markets have their own problems (a) and inefficiencies (a). For example, prediction markets also greatly disincentivize collaboration and thus greatly incentivize redundancy in research (a.k.a. "have you ever seen good comments on PredictIt?" h/t Marc Koehler.)

We also propose solutions for these problems. My preferred solution right now is one in which:

forecasters are rewarded in proportion to how well they and their team or their community do
against a prior selected by the forecasting platform,
the winners are not revealed, and
rewards are either continuous or probabilistic (and as a result, proper).

However, in the setup I have in mind, the forecasting platform ends up paying money proportionally to the number of forecasters (and is thus easily exploitable), or forecasters are disincentivized to bring other people in even if they would improve probabilities. Additionally, forecasters have with an incentive to "slack-off"—to wait until someone else shares their hard work and reap similar rewards as them.

The conclusion section makes some comparisons between aligning forecasting systems and aligning machine systems. They both have a chain of proxies between the original goal and what ends up being maximized. And even though the human forecasters aren't being trained or optimized, there still seems to be a comparison to be made between the inner alignment (a) problem for reinforcement learners and the principal/agent problem for forecasters. Similarly, reward specification seems fairly equivalent to outer alignment, though I might be missing some nuance. I'm not really sure to what extent I'm shooting from the hip here, but I suggest that alignment proposals which would apply to superhuman systems could be tested on human forecasters with the goal of making them produce useful forecasts.

Blog Posts

Dominic Cummings (a) has started a substack. On the one hand, he appears to have deep insight about the inner workings of Britain's political machinery. On the other hand, it's difficult to say how Machiavellian he is, what proportion of what he communicates is intended to shape public opinion in a certain way, or how distorted his models of the world are by a goal of communicating information to have some effect. One of the things the British leave campaign did under his direction was to run randomized trials/focus groups on the most persuasive arguments for Brexit were. I remember reading them, and finding them very persuasive, and then realizing that such persuasiveness was probably fairly uncorrelated with the truth. In LessWrong lingo, I'm unsure about which Simulacrum Level (a) Cummings is operating at.

Event-driven mission hedging and the 2020 US election (a) considers a case where it is cheaper to buy some altruistic good if Biden wins, so one could bet on his success and buy it only if he wins. The post makes the mistake of ignoring market dynamics, but this doesn't change the thrust of its argument.

If Biden wins the election then, based on your research, you expect the effectiveness of your donation will rise by about 10x. Suppose the baseline cost-effectiveness of the CCF is roughly $1/tCO2e (tonne of CO2 equivalent), so under Biden you expect it to be better at $0.1/tCO2e.
You believe Biden has a 70% chance of winning (see NYT article (a)). You see that on Betfair you can get 1.5:1 odds on Biden.
If you just donate, then your donation averts 7.3 million tCO2e in expectation. See below the main post for the calculations.
But if you bet $1m on Biden, with the commitment to donate the potential $1.5m win, then your expected impact is to avert 10.5 million tCO2e.
So, because almost all the impact you expect from your donation occurs when Biden wins, you can increase your expected impact by more than 40%.

The Ultimate Guide to Decentralized Prediction Markets (a), an old Augur blog post that covers the topic in depth.

What to Expect When You're Expecting Inflation (a):

In this post, I want to explore the different measures of inflation expectations in the United States and their relative accuracy in predicting actual inflation in the hopes of informing an evaluation of today’s inflation expectations. I will show that inflation expectations have been well-anchored and fairly accurate, often overestimating realized inflation over the last two decades.

Jason Crawford on precognition (a):

Most people are slow to notice and accept change. If you can just be faster than most people at seeing what’s going on, updating your model of the world, and reacting accordingly, it’s almost as good as seeing the future.

Taboo "Outside View" (a):

The term is easily abused and its meaning has expanded too much. I recommend we permanently taboo “Outside view,” i.e. stop using the word and use more precise, less confused concepts instead. This post explains why.

The Generalized Product Rule (a) outlines how a certain step in Cox's theorem (a)—the step which proves that probability updating is multiplicative—can be applied to other problems as well.

The Perils of Forecasting (a):

As mentioned earlier, the intelligence and business communities tend to be much more seasoned and thorough in their analyses of ground-breaking paradigms. That is the case because they are not involved in public grandstanding about their own cleverness to the degree that some journalists and academics are. It is also because intelligence agencies and corporations are on a mission to try to get the future right: whether for reasons of national security or the commercial profit motive. Intelligence services and businesses also know that forecasting a middle-term future of five-to-15 years is essential, and yet they are aware just how difficult it is. They know that linear thinking is hard to escape from, since extrapolating from current trends is often all one ever has to go on. So, they are understanding of attempts at non-linear analysis, even when flawed. And because corporations and businesses meet behind closed doors, they are more willing to countenance blunt, hard-nosed assessments about such things as national cultures than journalists and academics are.
Journalists, contrarily, are consumed with presentness. They tend to judge everything from the vantage point of the current news cycle. And it is this obsession with presentness that obscures historical context, from which the future can be discerned, however imperfectly.
Interestingly, at a time when even the finest elite publications do not cover foreign affairs as seriously and as disinterestedly as they once did, corporations have been reaching out to private forecasting companies to get a cold-blooded sense of the middle-term future in many places. Having worked for two such firms—Stratfor and Eurasia Group—over the course of the recent decade, I can confirm that even when wrong, what such firms really bring to the table is an old-fashioned and comprehensive seriousness about the news of the world and where it is headed, regardless of its human interest value. They also are deliberately amoral: whether an outcome is good or bad does not interest them as a firm. The point is whether they predicted it or not.
And the more that the media as a whole declines—trafficking in the trivial and remaining within predictable philosophical comfort zones—the more necessary such firms will be. Indeed, the media is dominated by liberal arts majors, who are driven by the need to turn the stories of individuals into narratives; whereas analysis—the weighing of harsh, unpleasant truths that require abstractions and generalizations—is often the pursuit of math minds.

What if Military AI is a Washout? (a). The author presents his "hunches" on the future of military AI, in which it does improve, but it ends up affecting war not because of its overwhelming dominance, but by changing the tradeoffs and best practices of war. For instance, war might move more and more into cities, because they are an environment in which classifier systems might be more uncertain about whether someone is a civilian or an enemy combatant.

Here are some hunches about how I think military AI plays out. I use hunches here because I think hunches are a better currency than predictions. Predictions are ten-a-penny and always subject to retroactive revision (“Well I said that AI was going to transform warfare but if you look at it this way then cleaning the operations room with a Roomba is transformative”). Hunches are like predictions but without the veneer of professional expertise. Everyone can have hunches. Hunches are often more descriptive of underlying thinking than they are of the end product, so to speak. Like predictions, hunches require little to no support, but in terms of plain language they are far more open about this fact.
Integrated AI systems into organisational processes distorts them where it is possible (just as integrating desktop computers, or typewriters, or new production processes changes an organisation). If you have a factory where you can present a computer model of a required output and the factory itself will optimise the tooling and production lines to make it, then you’ll get a jump on competitors if you compete in terms of taking novel items to market. If, on the other hand, you make an error in the model, then model errors are now likely much more expensive, as there would be less time to identify and rectify them before producing the (expensive) tools to mass produce them. Result: less people in tooling, more people in model quality assurance. In my view, the same goes for targeting processes. If bits of the “kill chain” get automated with AI, then it increases risks of prior incorrect human (or machine) judgement. Result: re-shaping military organisations to account for potential optimisations offered by AI, and to minimise risks of errors.
In this view artificial intelligence is essentially automation. We take something that would require human cognition and action, instantiate it in a physical system, and then something that used to require a human being no longer requires a human being. “That is not AI” I hear you say, well, in response, consider how many automatic things were once autonomous things. Fire-and-forget missiles have gone from being discussed as autonomous systems to simply being an automatic function of a system. Automated Target Recognition systems are performing equivalent cognitive work (recognising objects from sense data) to human beings. It’s just that they can make sense of many different types of data, and do it faster than we can, enabling forms of action beyond human capabilities.
As I see it, object recognition is a key domain in which AI will eventually outperform us. At least for big recognisable pieces of kit. Therein lies the asymmetry - big pieces of recognisable military kit will be vulnerable to recognition by autonomous systems, whereas distinguishing human beings as being combatants or civilians is going to be hard, if not impossible to achieve.

Note to the future: All links are added automatically to the Internet Archive, using this tool (a). "(a)" for archived links was inspired by Milan Griffes (a), Andrew Zuckerman (a), and Alexey Guzey (a).

Forecasting