Essay: One Example Where Intuitive Learning Does Not Work And What It Might Mean
Since last year, Google AlphaZero's clone chess engine, the Leela Chess Zero (Lc0) has held a consistent spot in the top ten positions of Top Chess Engine Championship (TCEC). Strangely, Lc0 is a neural network (NN) based chess engine which DID NOT make use of the human knowledgebank of chess. Since early 2019, SugarNN, based on Lc0, has held the top spot in TCEC defeating traditial (commercial) engines like Komodo and Houdini, which draw heavily from human chess knowledge and experience.
When one learns chess, he's taught many "strategic rules": bishops are best placed on long diagonals, pawn pushes have to be carefully planned (they cannot be taken back), and intuitions: queen is strong in the endgame, two minor pieces roughly equal a rook and a pawn, etc. Furthermore, there's a consistent story running in the mind of a chess player consisting of events, plans, and tactics. Every move has a "meaning." This is why chess commentators can present a good overview of the possible thoughts on a player's mind -- a story for the game.
But when it comes to games by Lc0, most commentators are just dumbfolded and struggle to explain the rationale behind its moves. It is as though the game has no story -- it seems completely random. Yet, miraculously, Lc0 always manages to win! (Of course there's a predictable algorithm behind Lc0, but it's perhaps too complex to be woven into a story.)
This might shed some light on the human quest for knowledge. Even in a concrete field like engineering (my field), we do not operate in the "real world." Instead, all our research, theories, and explanations are in an "idealised world" with "good properties." In this idealised world, a piece of theory is like a story -- it builds on existing stories and extends them in a meaningful and intuitive way. Luckily, when applied to the real world, the idealised theories work well. But they're mere approximations -- they're only roughly correct.
Similar to chessplay by Lc0, things in the real world appear random and haphazard to us. When we try to weave a story behind them, it turns out to be a non-story because it's so convoluted. Hence, it is not surprising that "learning based systems" such a NNs, with no intuition and biases and no need for stories, perform much better than our techniques based on our idealised theories. Nevertheless, presently, when we build NNs, we incorporate our biases into them -- in the form of structure and data. Today, without them, NNs don't work well. But, one might intuit, risking failure, that the haphazard approach of NNs is probably better suited to handle the complexity of the real world than organized storytelling offered by idealised theories.
Could it be that we are better off designing systems that lack intuition (because of their complexity) but work in the real world? Such systems, presently based on NNs, have no stories to them. But objectively, they seem to work well compared to the systems that are built based on idealised theories. If so, perhaps in the future this gives us very little room for consistent stories and a lot of room for trial and error and data-crunching based research.
Since last year, Google AlphaZero's clone chess engine, the Leela Chess Zero (Lc0) has held a consistent spot in the top ten positions of Top Chess Engine Championship (TCEC). Strangely, Lc0 is a neural network (NN) based chess engine which DID NOT make use of the human knowledgebank of chess. Since early 2019, SugarNN, based on Lc0, has held the top spot in TCEC defeating traditial (commercial) engines like Komodo and Houdini, which draw heavily from human chess knowledge and experience.
When one learns chess, he's taught many "strategic rules": bishops are best placed on long diagonals, pawn pushes have to be carefully planned (they cannot be taken back), and intuitions: queen is strong in the endgame, two minor pieces roughly equal a rook and a pawn, etc. Furthermore, there's a consistent story running in the mind of a chess player consisting of events, plans, and tactics. Every move has a "meaning." This is why chess commentators can present a good overview of the possible thoughts on a player's mind -- a story for the game.
But when it comes to games by Lc0, most commentators are just dumbfolded and struggle to explain the rationale behind its moves. It is as though the game has no story -- it seems completely random. Yet, miraculously, Lc0 always manages to win! (Of course there's a predictable algorithm behind Lc0, but it's perhaps too complex to be woven into a story.)
This might shed some light on the human quest for knowledge. Even in a concrete field like engineering (my field), we do not operate in the "real world." Instead, all our research, theories, and explanations are in an "idealised world" with "good properties." In this idealised world, a piece of theory is like a story -- it builds on existing stories and extends them in a meaningful and intuitive way. Luckily, when applied to the real world, the idealised theories work well. But they're mere approximations -- they're only roughly correct.
Similar to chessplay by Lc0, things in the real world appear random and haphazard to us. When we try to weave a story behind them, it turns out to be a non-story because it's so convoluted. Hence, it is not surprising that "learning based systems" such a NNs, with no intuition and biases and no need for stories, perform much better than our techniques based on our idealised theories. Nevertheless, presently, when we build NNs, we incorporate our biases into them -- in the form of structure and data. Today, without them, NNs don't work well. But, one might intuit, risking failure, that the haphazard approach of NNs is probably better suited to handle the complexity of the real world than organized storytelling offered by idealised theories.
Could it be that we are better off designing systems that lack intuition (because of their complexity) but work in the real world? Such systems, presently based on NNs, have no stories to them. But objectively, they seem to work well compared to the systems that are built based on idealised theories. If so, perhaps in the future this gives us very little room for consistent stories and a lot of room for trial and error and data-crunching based research.