Reiforceme Learig wih Symbioic Relaioships for Muliage Eviromes Shigo Mabu Graduae School of Sciece ad Egieerig, Yamaguchi Uiversiy, Tokiwadai 2-6- Ube, Yamaguchi, 755-86, Japa Masaao Obayashi Graduae School of Sciece ad Egieerig, Yamaguchi Uiversiy, Tokiwadai 2-6- Ube, Yamaguchi, 755-86, Japa Takashi Kuremoo Graduae School of Sciece ad Egieerig, Yamaguchi Uiversiy, Tokiwadai 2-6- Ube, Yamaguchi, 755-86, Japa E-mail: mabu@yamaguchi-u.ac.jp, m.obayas@yamaguchi-u.ac.jp, wu@yamaguchi-u.ac.jp Absrac Sudies o muliage sysems have bee widely sudied ad realized cooperaive behaviors bewee ages, where may ages work ogeher o achieve heir objecives. I his paper, a ew reiforceme learig framework cosiderig he cocep of Symbiosis i order o represe complicaed relaioships bewee ages ad aalyze he emergig behavior. I addiio, disribued sae-acio value ables are also used o efficiely solve he muliage problems wih large umber of sae-acio pairs. From he simulaio resuls, i is clarified ha he proposed mehod shows beer performace comparig o he coveioal reiforceme learig wihou cosiderig symbiosis. Keywords: reiforceme learig, symbiosis, muliage sysem, cooperaive behavior,. Iroducio There are may siuaios where ieress of some paries are coicided or cofliced, for example, huma relaioships, cooperaio or compeiio bewee compaies, ad eve ieraioal relaioships. Recely, he globalizaio is rapidly progressig, hus, he relaioships bewee persos ad orgaizaios have become very complicaed eworks. O he oher had, iformaio sysems have bee iellige ad workig cooperaively wih each oher, for example, cloud eworks, car avigaio sysems ad auomaio by robos. Research o complex eworks bega aroud 998 2 ad have araced aeios recely as a impora research for aalyzig pheomea i social sysems. Therefore, a model ha ca predic problems caused by he complex eworks ad propose he opimal soluios for he problems will be useful for realizig safe ad secure social sysems. I addiio, if he bes relaioships bewee paries ca be foud, i will coribue o he developme of he whole sociey.
S. Mabu, M. Obayashi ad T. Kuremoo I his paper, a ovel reiforceme learig algorihm ha iroduces a cocep of "Symbiosis" i order o build Wi-Wi relaioships bewee paries eve if each pary is pursuig he maximizaio of is ow profis. Symbiosis ca be defied as a relaioship where wo or more orgaisms live i close associaio wih each oher 3, ad several compuaioal models based o he symbiosis i he ecosysem have bee sudied 4-7. I he proposed reiforceme learig mehod, muliage eviromes are cosidered where here are several ages (persos ad orgaizaios) ha have cooperaive, compeiive or self-saisfied relaioships, ad such relaioships are defied as "symbioic vecors". The symbioic vecors ca represe six basic symbioic relaioships, i.e., muualism, harm, predaio, alruism, self-improveme ad self-deerioraio. The symbioic vecors are used o calculae rewards give o each age whe updaig Q values i reiforceme learig. The symbioic vecors represe o oly he arge direcio of self-beefi, bu ha of he oher ages workig i he same evirome. As a resul, he proposed mehod wih symbioic vecors ca build acio rules for corollig beefi of several ages. Therefore, he proposed mehod ca predic he resuls uder he curre symbioic relaioships by implemeig simulaios. This paper is orgaized as follows. I secio 2, Q learig algorihm wih disribued sae-acio value able is iroduced for efficiely solvig he muliage problems wih a large umber of sae-acio pairs. Secio 3 explais he proposed learig algorihm usig symbioic vecors. Secio 4 describes he simulaio eviromes ad resuls. Fially, secio 5 is devoed o coclusios. 2. Q learig wih disribued sae-acio value ables I he sadard reiforceme learig, he umber of sae-acio pairs icreases expoeially as he umbers of ipus, objecs o be perceived, ad possible acios icrease, ha is called The curse of dimesioaliy 8. Therefore, Q learig algorihm wih disribued saeacio value able (Q able) is iroduced i his paper. Fig.. Q-able divisio 2.. Represeaio of disribued Q ables Suppose ha a se of ipus (sesors) of ages is I, ad a se of possible acios is A. The I is maually divided io several subses, i.e., I, I, 2, I ( I I, I, 2 I ), depedig o he problems. For example, i he self-sufficie garbage collecig problem used i he simulaios of his paper, here are maily hree asks which have o be achieved by ages, hus, I is divided io hree subses, i.e., I, I2, I. Therefore, hree sub-q-ables are creaed based o 3 I, I I, respecively (Fig. ). 2, 3 2.2. Sae rasiio ad learig I his subsecio, he procedure of decidig a acio is explaied based o a example show i Fig. (hree sub-q-ables are used). The procedure of decidig a acio is as follows. ) Whe ipus are give from a evirome, each sub-q-able idepedely deermies he curre sae s ( is he sub-q-able umber.,2,3 ). 2) Three acios a are idepedely seleced by each sub-q-able usig greedy policy 9. 3) Compare he hree Q-values of a, ad he sub-qable selecig he acio wih he maximum Q-value is wier defied as (wier-q-able), ad is curre sae wier is defied as s. 4) The acio seleced by wier-q-able is execued wih he probabiliy of, or radom acio is execued wih he probabiliy of -This execued acio is wier defied as a. The updae of Q value is execued by Eq. ().
RL wih Symbioic Relaioships Fig. 3. Symbioic relaioships bewee wo ages Fig. 2. Symbioic vecor ad six symbioic relaioships for age (A example of wo dimesios) wier wier wier wier wier wier, s, a Q, s, a wier wier wier r max Q, s, a Q, s, a, Q, a () where, wier, s wier ad a wier show he umber, sae ad acio of he wier sub-q-able a ime, respecively. is a sub-q-able umber, s + is he sae of sub-q-able a ime +, ad a is a possible acio i sub-q-able. r is a reward, ( 0. 0 ) is a learig rae, ad ( 0. 0) is a discou rae 3. Reiforceme Learig wih Symbiosis This secio iroduces a symbioic vecor ad how o apply he symbioic vecor o reiforceme learig. 3.. Symbioic vecors Sadard reiforceme learig aims o maximize rewards ha he self-age obais, however, i he proposed mehod, o oly he rewards for he selfage, bu also he rewards for oher ages are cosider o execue reiforceme learig. I addiio, six symbioic relaioships are cosidered o build he acio sraegies, ha is, Predaio, Muualism, Alruism, Harm, Self-improveme ad Self-deerioraio. Fig. 2 shows symbioic sraegy for "age ", where oe axis shows he weigh (E ) o he beefi of age (selfage), he oher axis shows he weigh (E 2 ) o he beefi of age 2. I oher words, E shows he symbioic sraegy of age for age, ad E 2 shows he symbioic sraegy of age for age 2. Therefore, if age aims o maximize rewards for boh ages, i will ake "Muualism" sraegy where symbioic vecor v =(E, E 2 )=(.0,.0) (.0 E, E2. 0 ). Fig. 3 shows a symbioic relaioship bewee wo ages, where oe age akes muualism sraegy ad he oher age akes predaio sraegy. I his case, i ca be cosidered ha he symbioic vecor of age is v =(E, E 2 )=(.0,.0), ad ha of age 2 is v 2 =(E 2, E 22 )= (-.0,.0). Iermediae values bewee -.0 ad.0 ca be also used o defie a symbioic relaioship. For example, a symbioic vecor v=(.0, 0.) shows a weak muualism ha cosider he oher age's beefi a lile. Therefore, he symbioic vecor flexibly represes ay degree of symbioic relaioships, ad moreover, i ca be exeded o he relaioships bewee may ages. 3.2. Reiforceme learig wih Symbioic vecors This subsecio explais how o updae Q values cosiderig a symbioic vecor. Here, suppose here are p ages (age # #p), where he symbioic vecor of age k ( k p ) is v k =(E k, E k2,..., E kp ). Afer age k akes a acio ad fids rewards for all he ages (r, r 2,..., r P ), he modified reward used for updaig Q value of age k i () is calculaed as follows. r k p l E Eq. (2) calculaes he sum of he weighed rewards of all he ages # #p. For isace, whe age akes muualism sraegy v =(E, E 2 )=(.0,.0) ad age 2 akes predaio sraegy v 2 =(E 2, E 22 )= (-.0,.0), Eq. (2) for age ad 2 ca be represeed by Eq. (3) ad (4), respecively. 2 r.0 r. r (3) kl r 0 l (2)
S. Mabu, M. Obayashi ad T. Kuremoo Table. Ipus ad acios # Ipu coes Ipu value Fig. 4. A simulaio evirome 4. Simulaios 4.. A simulaio evirome r 2 2.0 r. 0r (4) Self-sufficie garbage collecig problem 0 is used for he performace evaluaio of he proposed mehod. Fig. 4 shows he simulaio evirome used i his paper, where here are wo ages, rashes, oe chargig saio, ad wo rash collecig places. The aim of his problem is o collec may rashes i he evirome ad ake hem o he collecig places assiged o each age, i.e., age k has o ake rashes o he collecig place for age k o obai reward. I addiio, each age has a limied eergy o move, hus he ages should check he remaiig eergy ad go o he chargig saio before ruig ou of he eergy. Table shows he ipus ad possible acios ha he ages ca use. The iiial eergy is 00 (full charge), ad whe a age goes forward, eergy is used by hree, ad whe i urs righ or lef, eergy is used by oe. The eergy ca be recharged gradually if he age says a he chargig saio. The oal ime for oe episode is 00 seps. Reward 00 is give o age k whe a age akes oe rash o he colleig place for age k, 0 is give whe a age collecs a rash, ad 0. (chargedeergy) is give whe a age says a he chargig saio. 4.2. Codiios of Q-learig wih Disribued Q- ables Table 2 shows he seig of disribued Q-ables, where hree sub-q-ables are prepared for dealig wih asks forward cell ohig, obsacle, collecig place for age, collecig place for age 2, rash, chargig saio 2 backward cell he same as "forward cell" 3 righ cell he same as "forward cell" 4 lef cell he same as "forward cell" 5 direcio of eares rash forward, backward, righ, lef 6 direcio of chargig forward, backward, righ, lef saio 7 direcio of collecig forward, backward, righ, lef place for age 8 direcio of collecig forward, backward, righ, lef place for age 2 9 he umber of holdig 0,, 2 (max 2) rashes 0 curre eergy level low (less ha 30), high (more ha 70), middle (oher values) Acios go forward 2 ur righ 3 ur lef 4 o acio for age 's beefi, for age 2's beefi, ad chargig eergy, respecively. Each age has is ow Q-able (hree sub-q-ables), i.e., he reiforceme learig of he wo ages are carried ou idepedely. The learig parameers are se as learig rae =0., discou rae =0.9, ad =0.05. 4.3. Simulaio resuls Table 2. Sub-Q-able seig able # Mai ask Ipu # used i each sub-q-able for age 's beefi,2,3,4,5,7,9 2 for age 2's beefi,2,3,4,5,8,9 3 Chargig eergy,2,3,4,6,0 I his subsecio, o cofirm he basic effecs of he symbioic relaioship, he proposed mehod wih muualism sraegy is compared wih he coveioal Q-learig, i.e., boh ages ake Self-improveme sraegy.
RL wih Symbioic Relaioships Fig. 5. The umber of colleced rashes a collecig places for Age ad Age 2 (Coveioal mehod) Fig. 6. The umber of colleced rashes a collecig places for Age ad Age 2 (Proposed mehod wih Muualism) Fig. 5 shows he umber of rashes ake o he collecig places for age ad ha for age 2, respecively, obaied by he coveioal mehod. Each lie is he average over 20 idepede simulaios. As he episode goes o, he umber of rashes icreases, ad i 5000h episode,.5 rashes are ake o he collecig place for age ad 2.25 rashes are for age 2. Fig 6 shows he resuls of he proposed mehod, are for age 2. I should be oed ha he umber of colleced rashes for age icreases wihou decreasig ha for age 2 (eve icreasig).therefore, i is clarified ha he proposed mehod wih muualism sraegy ca obai cooperaio behavior ad show beer performace ha he coveioal mehod. Nex, he coribuio of age ad 2 o akig rashes o he collecig places is aalyzed. Table 3 shows he daa o he umber of rashes ake o he collecig places i he fial episode. As described before, he proposed mehod akes 2.0 rashes o he collecig place for age o average, where he coribuio of age, i.e., he umber of rashes ha age akes o he collecig place for age, is.20, ad he coribuio of age 2, i.e., he umber of rashes ha age 2 akes o he collecig place for age, is 0.90. Therefore, we ca fid ha boh age ad 2 coribue o he beefi of age. Q learig akes.50 rashes o he collecig place for age o average, where he coribuio of age is.50 ad ha of age 2 is zero, which meas ha oly age coribues o he age 's beefi. Nex, he umber of rashes ake o he collecig place for age 2 is aalyzed. The proposed mehod akes 3.90 rashes o average, where he coribuio of age is.95 ad ha of age 2 is 0.90. Therefore, we ca also fid ha boh ages coribue o he beefi of age 2. Q learig akes 2.25 rashes, where he coribuio of age is 0.25 ad ha of age 2 is 2.00, which shows ha age coribues o he beefi of age 2 oly a lile, ad age 2 coribues o is ow beefi oly. 5. Coclusios I his paper, a ovel reiforceme learig algorihm based o symbioic relaioships is proposed, where symbioic vecor is iroduced o represe various kids of relaioships. I he simulaios, he Table 3. Daa o he umber of rashes ake o he collecig places i he las episode The umber of rashes ake for age The umber of rashes ake for age 2 Toal Coribuio of age Coribuio of age 2 Toal Coribuio of age Coribuio of age 2 Proposed mehod 2.0.20 0.90 3.90.95.95 Q learig.50.50 0.00 2.25 0.25 2.00 where 2. rashes are ake for age ad 3.9 rashes effeciveess of he proposed mehod wih muualism sraegy has bee show. By cosiderig o oly he
S. Mabu, M. Obayashi ad T. Kuremoo beefi of he self-age, bu also ha of he oher ages, cooperaive behaviors emerged. I he fuure, oher combiaios of symbioic relaioships will be cosidered o aalyze he emergig behaviors, ad moreover, mulilaeral relaioships will be also cosidered o build simulaio models dealig wih real siuaios of cooperaio or coflic bewee ages. Refereces. Ke Bimore, Game Teory: A Very Shor Iroducio (Oxford Uiersiy Press, 2007) 2. D. J. Was, S. H. Srogaz, Collecive dyamics of smallworld eworks, Naure, 393 (998) 440-442. 3. L. Margulis, The Symbioic Plae, Coac, (999). 4. T. Eguchi, K. Hirasawa, J. Hu, ad N. Oa, A Sudy of Evoluioary Muliage Models Based o Symbiosis, IEEE Tras. o Sysems, Ma, ad Cybereics, par B, (36), (2006). 5. J. Y. Goulermas ad P. Liasis, Hybrid symbioic geeic opimizaio for robus edge-based sereo correspodece, Paer Recogiio, (34), (200) 2477 2496. 6. J. Y. Goulermas ad P. Liasis, A collecive-based adapive symbioic model for surface recosrucio i area-based sereo, IEEE Tras. o Evoluioary Compuaio, (7) 5, (2003), 482 502. 7. C. P. Pieers, Symbioic eworks, i Proc. of he IEEE Cogress o Evoluioary Compuaio, (2003) 92 927. 8. R. E. Bellma, Dyamic Programmig, (Priceo Uiversiy Press, 957). 9. R. S. Suo, A. G. Baro, Reiforceme Learig - A Iroducio -, (998). 0. R. Pfeifer, ad C. Scheier. Udersadig ielligece, (MIT press, 999).