■The speakers of [Practical Application of Automated QA Using Deep Learning]
Could you start off by giving us a brief overview of your talk?
Iwasaki ： In large-scale game development, QA imposes a significant strain on resources and greatly increases costs. Our talk was about the use of machine learning technology, which is one of the things we’ve been working on as a potential means to alleviate these QA issues. This is a continuation of the machine learning aspect of my CEDEC2019 session, [Log analysis and automated testing of the Luminous Engine for large-scale game development], and thanks to joining forces with Araya, Inc. this time, we discussed both the game engine and the machine learning side of automated testing using machine learning.
Could you tell us what brought you and Araya Inc. together to collaborate on this technology?
Iwasaki ：After last year’s CEDEC, I came to the realization that involving a specialist from the AI field would be essential in pushing forward our machine learning capabilities. Coincidentally, that’s when Miyake (Yoichiro Miyake, Lead AI Researcher, Advanced Technology Division, SQUARE ENIX / Director CTO, SQUARE ENIX AI & ARTS Alchemy) introduced me to Araya, Inc.
Tanimura ：We had been hypothesizing that technologies like deep learning and reinforcement learning could be applied to automated QA in game development, but were initially struggling with how we could actually verify that. As a trial, we tried creating sample games and automating the game controls by using reinforcement learning, but it didn’t really get us anywhere nor did it help identify specific issues that potentially occur in actual game development. That’s when I sought advice from Mr. Miyake, who was kind enough to introduce me to Luminous Productions (LP).
So, the timing worked out perfectly.
Araya Inc. had been specializing in AI research and this was, I believe, your company’s first time working with a game development studio – how was your experience working with LP?
Tanimura ： As LP has an abundance of experience in large-scale game development, I was very excited when this initiative came true. The opportunity to conduct verification as the game development dynamically progressed allowed us to garner various insights such as what sort of issues are emerging during actual large-scale game production, what sort of solutions the people involved in game development are looking for and so forth. The experience I’ve been able to build through this collaboration is truly valuable.
How was your experience working with Araya, Inc., Mr. Iwasaki?
Iwasaki ：We were able to navigate our collaboration in an ideal way, I think. LP’s area of expertise is obviously game development, so our interest really was not so much in researching how to achieve automated QA, but in putting that technology into practical use. However, since no such tool existed in the world yet, we were forced to build it ourselves – that’s where we were last year. Our aim up until that point was to improve overall development efficiency, but this time, we narrowed down our focus to machine leaning in particular and had Arya Inc. verify machine learning using the environment we provided – so things moved quite efficiently.
Were there any aspects you found particularly challenging due to this project being a joint initiative?
Tanimura ：The hardest challenge for us was conducting technology verification on a game environment that is still under active development. Normally, when we test a new learning algorithm using deep learning and reinforcement learning, we’d do so under the assumption that the target test environment is already fully established, with our focus purely on crafting the algorithms themselves. With this collaboration, however, the environment fluctuated as the development of the game progressed with each passing day, so it was at times difficult to isolate where certain issues stemmed from - whether it was in the algorithm or code designed by us or because the game specifications had been altered causing a change in the contents of the information we needed to acquire. Fortunately, we managed to solve this problem rather quickly as the developers at LP had prior knowledge of deep learning. Having said that, this isn’t a problem specific to our collaboration and is inevitable whenever anyone tries to achieve automated QA using reinforcement learning in a real development environment. It will, therefore, be extremely important to establish good communication and mutual understanding between game developers and the engineers handling deep learning and reinforcement learning.
Iwasaki ： My impression was that we were able to establish a clearer division of responsibilities than I anticipated. I initially contemplated on the possibility of us needing to do significant code modification on both sides in order to make it work, but our respective areas of responsibility were clearly defined and that made our collaboration much easier. I have to admit, though, that the discussions we had in the relatively earlier stages to align all the necessary information were a bit tough to get through, since we had three teams - the AI, engine, and game teams - working simultaneously.
■Far from the finish line - automated QA is a long-term challenge
Iwasaki ：This collaboration has convinced me that automated QA is truly a long-term challenge. When we started this collaboration, I still had the slightest, faintest expectations somewhere in the back of my mind that “I can’t do it myself, but maybe an AI expert can sweep in and magically give us a solution.” Based on my own machine learning experience, I knew that deep learning is no magic, of course, but I was still holding onto a glimmer of hope that “maybe an expert can….” However, after actually giving it a go, I found that the challenges we had as game developers were a challenge for the AI experts as well.
Tanimura ：When you look at some of the sensational research papers that have been published, the current AI technology can at times seem capable of magically solving everything
Iwasaki ：That is very true. We know it isn't so in our minds, but many of them seem impressive and quite magical. This collaboration, however, has really taught me it simply doesn’t work that way; it requires steady learning, and even with that, machine learning can’t learn what it can’t learn. That being said, machine learning and automated QA are still a good match, and you can teach the system as long as you follow the proper steps. So while there’s no magic, this collaboration helped strengthen my conviction that we are heading in the right direction. Things may not go as easily though, if your project doesn’t have an adequate foundation built already.
Does that mean, if you wanted to bring machine learning into a new project, it’s essential to build a foundation for the project first?
Iwasaki ： If the project is not built for it, there may be cases where the engine isn’t capable of providing the features that are desirable for machine learning. For example, the Luminous Engine has a “return to the frame of choice” feature, which helped us with responding quite quickly when Araya Inc. told us “we need a feature like that. It’ll be a must-have,” because, luckily, we happened to have the feature. We would’ve been in major trouble had we not had the feature in place and would have been required to implement it right there and then, as the feature certainly wasn’t something that could be created in the final stages of the project when the QA process actually commences. As machine learning research continues to advance, I expect something along the lines of a “before you do automated QA” prep list will become formalized to an extent, but needless to say, not everything can necessarily be prepped for in advance. Assuming automated QA will someday be packaged and put into practical use, I doubt we’d be able to adopt it as it is without any additional preparation.
■A human element is welcome to stay - what game developers really want
Iwasaki ：There are a number of research papers demonstrating how AI technology is utilized in automated QA, and from looking at the samples that have been published, they seem to be functioning well. However, it’s a different story when it comes to incorporating such technologies into your own work; that can be extremely difficult to do or even impossible to determine where it can be applied to in the first place. I feel there is still a major discrepancy between what’s discussed on research papers and what’s actually in use at a development environment.
Tanimura ： That may very well be true. The subjects are likely to be tested in a fully determined environment for a benchmark and explicitly tuned for a specific task. So, when these technologies are brought into a different environment, there’s no way to foresee how they pan out until you actually get your hands on them, and when you do, you may still need to figure out how to re-tune them to suit your specific needs. Even then, you’ll have to trace back on various literature to catch up on its background technology – so all and all, it may not be something you can just casually give a try in an actual development environment.
Iwasaki ：We definitely have reservations about trying something that can’t ensure a deliverable even after putting desperate efforts into incorporating it. Right now, when looking at a single project alone, the debugging costs of the automated QA system can be higher than the potential benefits gained. Another point to consider is, essentially, automated QA can be replaced with human effort. Let’s say there is this new, incredible graphic technology - measuring the investment value in such technology is pretty simple if it can visibly increase your graphics quality, but with QA, discussions ultimately come down to “why not just do it manually?” or “which helps us reduce costs, AI or human?” After all, there’s an alternative technology per se, called manpower.
Tanimura ：I agree. We certainly feel this is a task that must be tackled as well. When you gauge a single project, the development costs of the automated QA system itself can’t be ignored; I believe the effect of cost reduction will only materialize after we really buckle down to make careful, steady advancement, ultimately expanding the scope of application. Possibly, it may gain wider acceptance if we manage to add extra value to the automated QA system – something that can only be achieved by using deep learning and reinforcement learning – such as an ability to perform testing with higher accuracy than humans or to do testing that humans can’t even begin to try.
Iwasaki ： At the end of the day, we are not expecting automated QA to do everything for us. We’d just be grateful if it could help us reduce even some of the QA costs, so keeping a bit of a human element in the process is perfectly fine. What we worked on - the technology with the ability to play back exactly the same gameplay as our example data - is in very high demand, but as I mentioned earlier, the areas that seem solvable, and for which the developers actually need a solution, and the ones that are generating hype in academic AI research aren’t necessarily aligned. AI technologies - like AI being able to play just like a human or AI automatically reaching a level of mastery in a game - will continue to grow, as they are of great value and drawing attention within our industry, and yet at the same time, the system with the ability to replicate exactly what the QA team played today the next day has very high demand as well. People outside of the game industry might wonder, why don’t you just record what QA did on the gamepad and play that back? Or what would you need machine learning for? Guess what, recording on the gamepad and playing back doesn’t give us the exact same gameplay. It’d be even less likely to happen if the game is still under development. So, our hope is that more people will know the fact that such technologies are needed and that more people will start working on them.
Tanimura ：Just getting to know that there is demand for such solutions means a great deal to us. We are building and working on various hypotheses internally, but it’d be amazing if we could continue to get insight into what the game industry’s hottest topics are right now and what their current needs are.
It really sounds like the mutual understanding between the AI industry and the game industry is crucial.
Iwasaki ：Yes, I’d say it is one of the most important factors. This really was a wonderful opportunity, as I rarely have the chance to talk to an AI expert in a situation like this.
Tanimura：I was thinking that our initiative could also be used for things like guerilla testing for identifying bugs or to measure the difficulty level of various content. Moreover, it’s also applicable to auto-generating the thinking logic of NPCs, so hopefully we can expand to do many new and fun things in addition to QA automation.
Iwasaki ：The range of technology in games is quite extensive. The unique nature of the field of games is that we’re ready to incorporate anything and everything that seems interesting and usable, but we certainly can’t create all of it on our own. In that sense, finding people who are interested in working with us and getting the opportunity to collaborate with such people is essential. 。