Financial Reinforcement Learning has been Half a Century in the Making
“A computer will never tell you to buy one stock and sell another… (there is) no substitute …for flair in judgement, and a sense of timing.” - Wall Street Journal 1962.
After digging through archives in Cambridge while working on a project to write a revisionist history of machine learning in finance, I came across early ‘Cybernetics’ articles that have been lost to modern researchers and developers in quantitative finance and portfolio management.
As far back as 1962, GPE Clarkson, a researcher from Carnegie Tech, showed how bank investment officers’ portfolio selection decisions could be automated using discriminator nets, i.e., a sequential branching computer program. I didn’t know that around the same time, a system developed by a New-York-based brokerage firm called Jesup & Lamont, not only routinised investment decisions based on decision heuristics like that of Clarkson, but also learned new patterns for future refinement.
This system might well have been the world’s first self-learning financial robot. Sadly, Jesup & Lamont’s innovation was never put into production, and the 133-year-old brokerage firm filed for bankruptcy in 2010.
Since the 1960s the word heuristic has been given a problem-solving connotation. Heuristic models were one of the business world’s first inroads to learn from data. A heuristic or rule can be as easy as buy-when-the-price-is-low for stock trading, first-in-first-out for accounting, or first-come-first-serve for job scheduling.
In portfolio management, heuristic programming is not unlike the 20-person team that was said to translate Ray Dalio’s unique financial worldview into algorithms, or the group of coders that developed Paul Tudor’s “Paul in a Box”.  The hedge fund Point72 was also purported to test models that mimic their portfolio managers’ trades.
After the work by Clarkson and Jesup & Lamont, heuristic decision-making systems were developed to automate the tasks of recruitment advisors in 1964, loan officers in 1964, treasurers in 1966, and sales analysts in 1967. This line of research has shown that operational data can be fitted very well with a few simple equations.
The programmatic solutions did not just perform well; their performance was often indistinguishable from professional investors, analysts, and advisors. The research was dubbed as “behavioural theory” and took shape at Carnegie tech. However, the infamous AI-winter brought this ground-breaking research to a halt due to a mismatch in aspirations, computing power, and data.
In the early days of this research, established figures like William J. Baumol said that “Portfolio decisions must remain an art, employing at its best a fine balance of experience, judgement, and intuition all of which only a good analyst can supply”. However, these discussions were often hypothecations due to the lack of authorities having access to large parts of valuable data. In the early 1960s, it has, for example, become particularly fashionable for large financial institutions to point out that they have computers as a marketing sell as competitive pressures mounted. This echoes today’s buzz around AI in finance.
New inventions often inspire new ideas. Not long after computers were first conceived in the 1800s, inventors like Ada Lovelace philosophised about the potential of them becoming intelligent. So did Jesup & Lamont, whose early work started with a 1959 case study showed how computers could be successfully applied to intelligent investment management. Jesup collected data manually for 30 securities listed on the NYSE; the data was collected from published reports and used to build their robot investor.
By around 1960, their input programs could translate, interpret, and fill unformatted information received from news service wires. By 1962, they were monitoring all common stocks traded on the NYSE, and it was allegedly successful in anticipating the market decline during April and May that year. The developers noted a few issues:
(1) The existing data sources could not be depended on, and it required frequent correction and manual review to reduce substantial errors.
(2) The output of the computers still far exceeded management’s capacity to assimilate the information.
(3) Traditional financial performance measures were inadequate to capture market behaviour over extended periods of time.
Expanding on these concerns, they write that “[until] 1963 those interested in computer-aided investment analysis face an appalling data acquisition task. Fundamental data has to be manually culled from published reports, and technical data was also manually transcribed from published reports or encoded from wire service lines.”
In 1963 this changed when Standard Statistic, a subsidiary of S&P, released the Compustat tapes containing fundamental data compiled annually from 1947 for 900 companies. They were soon followed by Scantlin Electronics, Utronics, and Bunker-Ramo in 1963 who began offering price, volume, earning, and dividend data for all listed securities. This newfound data allowed Jesup to fix many of the problems they had around the time.
Jesup & Lamont’s newly refactored 1964 system included the ability to learn to contribute to future trading refinements. Heuristics were incorporated to evaluate alternative investment approaches. An adaptive design allowed the procedures to change over time, leading to new models and processes that contributed to the systems’ future success. Therefore, unlike other rule-based methods at that time, self-learning, i.e., an early form of reinforcement learning, took place,. The system also provided more generally a source of data for manual investigation, and an approach to validate the firm’s investment models.
The firm implemented parametric and nonparametric statistical routines to facilitate the analysis of fundamental and technical relationships. They also realised the importance of “real-time data acquisition”, that would give analysts the ability to evaluate relevant market conditions as they developed. Using the validated analytic procedure, i.e., backtests, the system would recommend actions, i.e., buy, hold, sell, sell short. The system was able to recreate conditions existing since 1959 and test the effectiveness of an alternative analytical approach to determine a particular decision or rule’s performance during a specified period.
A couple of years later in 1966, Joseph Gal in the Financial Analyst Journal wrote that “It will soon be possible for portfolio managers and financial analysts to use a high-speed computer with the ease of the desk calculator”. Today, machine learning code has been streamlined, and in less than 10-lines of code, you can, for example, create a close to state-of-the-art machine learning option-pricing model with free online computing power such as that provided by Google Colab. This advance is reminiscent of the 1970s, where not long after the creation of the Chicago Board Options Exchange (CBOE), Black-Scholes option values could be easily calculated on handheld calculators.
Various researchers rightly point out that although specific tasks and subroutines can be automated by learning from humans and programming that into a machine, not everything is programmable. Baumol wrote in 1966, “There is a rich body of information, much of it hardly tangible, which necessarily influences the recommendations of an investment counsellor, but which the computer is not able to take into account. Knowledge about the personality of a retiring company president, or about the implications of current Defense Department discussions for the saleability of a company’s products is surely important, but there is no way in which it can be taken into account automatically and mechanically in accord with predeveloped mathematical formulae.”
Notwithstanding their disadvantages, the methods have slowly percolated through the industry over the decades. The nonparametric statistical routines that Jesup & Lamont have used in the 1960s have only recently become commonplace. Skilled staff at JP Morgan Chase have recently suffered the fate of the machine. A new contract intelligence programme was established to interpret commercial-loan agreements that previously required 360k hours of legal work by loan officers and lawyers. Other machine learning-enabled tasks include the automation of post-allocation requests at UBS and the automation of policy pay-outs at Fukoku Mutual Life Insurance.
Since around the 1980s, the automation systems have mostly remained rule-based like that of the 1960s, but the terminology changed from heuristics to expert systems. Expert systems appeared in finance around the early 1980s. At that time, Dupont, a chemical company, had built more than a hundred expert systems that helped save them and estimated $10mn a year. In finance, the Protrader expert system constructed in 1989 by KC Chen determined optimal investment strategies, executed transactions, and modified the knowledgebase through a learning mechanism. Investment advisors also had their own version in the form of PlanPower made available in 1986, offering tailored financial plans to individuals with incomes about $75,000. And a few years later Chase Lincoln First Bank provided an improved wealth management tool, selling reports for a mere $300; virtually an early form of robo-advisors that was to follow two decades later. This trend continued into the 1990s, and by 1993 the U.S. Department of Treasury had also launched a system called FinCEN that was used to determine 400 potential incidents of money laundering over two years equalling $1 billion in value.
Expert systems slowly disappeared as the expectations did not meet reality, and they were generally complicated to use. For example, G.E. developed a Commercial Loan Analysis Support System to assess commercial loans, they secured agreements with two large New York firms. Even though it performed as intended, there was no established person to manage the system, and the project failed and fell apart. Expert systems did not disappear entirely, as some subcomponents still remain today. For example, Excel VBA macro scripts could be seen as a rudimentary form of an expert system.
Developing concurrently with expert systems, the mid-1970s to late-1980s saw the development of more advanced machine learning models, i.e., statistical learning models instead of rule-based models. In 1975 and 1976, Jerry Felson applied a perceptron (simple neural networks) technique to learn from previous decision-making experiences for investment selection and market forecasting purposes and showed that you could achieve an above-average performance compared to the market. He notably came to believe that when the decision parameters become large, e.g., exceeds four, human decision-making deteriorates, and therefore, the programming of an investment decision process was useful.
Starting in the 1980s, Edward Thorp and Richard Dennis showed remarkable success by combining technical trading methods with elementary statistics. Soon enough, quantitative labs like The Morgan Stanley ATP group, headed by Nunzio Tartaglia, opened shop in 1986 by establishing more advanced techniques. A year later in 1987, Adams, Harding, and Leuck started a quantitative London-based fund called Man AHL. In 1987, two years after joining Morgan Stanley, David Shaw decided to start his own quantitative fund DE Shaw & Co. That same year James Simons renamed his Monemetrics to Renaissance Technologies to emphasise its new quantitative focus. A couple of years on, in 1989, Israel Englander launched Millennium Management and since there has been an exponential increase in the money managed by quantitative hedge funds.
This surge in systematic and quantitative trading funds and machine learning and statistical techniques was no coincidence; it took place in tandem with the proliferation of computers. In 1980, Sinclair in the U.K. sold a home computer for £99.95, that is less than £500 in today’s money. In 1981, IBM announced its Personal Computer, which became the basis of the modern personal computer industry. This advance not only led to improved computation power but also improved storage capacity and data processing ability. It allowed for the collection of large datasets, which was in turn used by faster and more powerful computers to mathematically scour for patterns in financial data to find predictive trading signals. Throughout this period, much of what we now referred to as machinelearning has already been well tried and tested by these firms. Neural networks, nonparametric kernel regressions, and dynamic programming methods were already experimented with by ex-IBM folk and notable academics at Renaissance Technologies in the late 1980s.
In recent years, researchers have been leveraging machine learning methods like reinforcement learning strategies to develop end-to-end solutions like fully automated derivative trading businesses. These methods effectively allow one to model any portfolio of derivatives like options and futures. The agent is exposed to an environment that consists of the market and other real-world constraints and then asked to hedge a position with a set of available securities using neural networks.
We have moved from a world where the rules are automatically learned from data instead of being hardcoded by developers. In so doing, machine learning has increased the subset of economically viable programmable processes.
This shift could have vast implications for all industries. Will automated derivative businesses exhibit improved hedging performance for lower fees and hence proliferate like exchange-traded funds? Will the systemic risks posed by a derivative market worth $1.2 quadrillion, inhibit any automation due to regulatory constraints?
In conclusion, the financial robot has been 60 years in the making, and it is only starting to stand on its own two feet, the implication of which are yet unknown.
Sponsored by Sov.ai
 Wall Street Journal, April 23, 1962, p. 4
 Portfolio Selection: A Simulation of Trust Investment
 The computer, new partner in investment management
 Project manager selection: the decision process
 A theory of the budgetary process
 Simulation and Sales Forecasting
 The Wall Street Journal, April 23, 1962 p 4
 JPMorgan Software Does in Seconds What Took Lawyers 360,000 Hours
 Robots enter investment banks’ trading floors
 This Japanese Company Is Replacing Its Staff With Artificial Intelligence
 Deep hedging: hedging derivatives under generic market frictions using reinforcement learning