Reinforcement Learning with Symbiotic Relationships for Multiagent Environments

Similar documents
SOCIO-CULTURAL NEEDS ANALYSIS A CASE STUDY OF CITIZENS OF REGIONAL KHORASGAN

God s Great Passion. Burning Hearts. Recently a group of Christians were asked the question, Do you know God more than your spouse?

COVER ILAC-G8:1996. Guidelines on Assessment and Reporting of Compliance with Specification (based on measurements and tests in a laboratory)

Linking factors for gross and seasonally adjusted series

Efficient Model Checking of Fault-Tolerant Distributed Protocols

Thank you for your consideration of our request. Cordially, Irene V. Scheid Executive Director

Pre-K Aquatic. Mt. Washington Children s Center Keeping freshwater fish

Language Model for Cyrillic Mongolian to Traditional Mongolian Conversion

How GAIA asteroids can improve planetary ephemerides?

ScienceDirect. Capacity Model for Signalized Intersection under the Impact of Upstream Short Lane. Jing ZHAO a, Meiping YUN b *, Xiaoguang YANG c

Common Morality, Ethical Theory, and Engineering Ethics. Part II: Duty Ethics (or Respect for Persons) and Utilitarianism

Third- and fourth-graders often know a great deal about Jesus but may not feel they

If raised to believe in Santa Claus, children this age are becoming skeptical. They often

Many first- and second-graders are afraid of the dark. For them, there s a connection

Third- and fourth-graders are old enough to understand the difference between right

First- and second-graders haven t had enough life experience to know what it means

Probability of immortality and God s existence. A mathematical perspective

Your third- and fourth-graders are prone to temptation; in fact, few people are more

Four Friends Help a Paralyzed Man Mark 2:1-12

Disciples Follow Jesus

Being accepted by their peers and included in the group is very important to thirdand

First- and second-graders are eager and ready to learn new things, and as they learn

An Exponential Decay Curve in Old Testament Genealogies

LESSON 2: SHARE THE WORD. COMMENTARY / This portion of the lesson is for the leader s personal study.

NO! NO! NO! NO! NO! NO! NO!

Simulation of quorum systems in ad-hoc networks

Adults have relationship problems as often as and sometimes more often than

Most first- and second-graders enjoy making new friends. They accept and welcome

IN THE COURT OF APPEAL OF THE DEMOCRATIC SOCIALIST REPUBLIC OF SRI LANKA

SELF-ORGANISING QUORUM SYSTEMS FOR AD HOC NETWORKS

The Effects of Rumors on Stock Prices: A Test in an Emerging Market Yan ZHANG 1,2 and Hao-jia CHEN 1

Third- and fourth-graders are beginning to worry about many different things, such as

Induction and Hypothesis

Third- and fourth-graders have a keen sense of fairness. The kids in your group may

5 Equality or Priority?l

Most first- and second-graders still think very highly of their parents. Dads and

Third- and fourth-graders no longer see the world in strictly egocentric terms. Unlike

First- and second-graders have many fears. Some children fear losing a parent or

A Bayesian Simulation Model of Group Deliberation and Polarization

Young children become uneasy when adults aren t happy with their behavior. They ll

Third- and fourth-graders are now aware of things they didn t even know existed

It s important to help middle schoolers distinguish between taking the gospel to the

LESSON 3 Embrace Christ s Mission Key Text: John 15:1-17

First- and second-graders are able to understand the difference between right and

Latent Variable Models and Signal Separation

Jesus Tells About the Good Samaritan Luke 10:25-37

Fifth- and sixth-graders know well the idea of having heroes. They pick people to look

Third- and fourth-graders are very familiar with what it means to be kids. The thing

Young children are just beginning to develop friendships with other children. Playing

Christmas is an exciting time for most third- and fourth-graders. Taking a vacation

First- and second-graders are just beginning to learn that they can choose right from

>. œ. œ. > œ j. w > j J. œ >. j J j

Jesus Christ and the Resurrection. Three Life Changing Realities About Jesus Christ

Death seems far away to most teenagers. They may wonder why they ought to spend

Fifth- and sixth-graders might not know much about courage, beyond comic books

Trust is important to third- and fourth-graders. Therefore, it s important for kids to

First- and second-graders have a special desire to know they re loved no matter

First- and second-graders are eager for more independence. In their quest for

Students may feel either lost or pulled in many different directions either one

For preschoolers, families are the gatekeepers of how they experience the world

COMMENTARY / This portion of the lesson is for the leader s personal study.

Third- and fourth-graders love to share good news. They also care deeply for their

Ehrenfest Paradox, Sagnac Effect, and the Michelson-Morley Experiment

By the time kids are in the third or fourth grade, they have a pretty good

Tishreen University Journal for Research and Scientific Studies -Economic and Legal Sciences Series Vol. (30) No. (4) 2008 *** ***

Jesus Explains Eternal Life to Nicodemus John 3:1-17

Jesus Talks With the Samaritan Woman John 4:5-42

Jesus Calms a Storm LESSON WHAT CHILDREN DO SUPPLIES EASY PREP. Bible Truth Sleuth, CD player, pens Teacher Pack: CD

Noah Builds the Ark. washable markers, large poster board, ruler, scissors, tape Teacher Pack: Instant Ark cards

An Angel Appears to Joseph

What Do Short Sellers Know? Boehmer, Jones & Zhang D I S C U S S I O N B Y A D A M V. R E E D U N C C H A P E L H I L L

Center for Desert Archaeology Annual Report 2010

OUTER AIM The Lord reveals a most forgiving heart in contrast to the hardness of human nature.

First- and second-graders have no trouble believing in things they can t see, even if

The GNH Centre. Vol. I January, Gross National Happiness is more important than Gross Domestic Product.

Most third- and fourth-graders recognize the difference between right and wrong.

KEYWORDS: Design Specifications, AASHTO, LRFD, Load Factors, Resistance Factor, Calibration, Reliability.

Jesus Feeds Thousands

How to Select a Replication Protocol According to Scalability, Availability and Communication Overhead

First- and second-graders are discovering a new independence but need to know

Jesus told Nicodemus that no one can see the kingdom of God unless he is born

HOMEWORK 17. H 0 : p = 0.50 H a : p b. Using the class data from the questionnaire, test your hypothesis.

A Hybrid Approach based on Winter s Model and Weighted Fuzzy Time Series for Forecasting Trend and Seasonal Data

Susan Lingo Rt52Teachings1-9-SC.indd 1 2/3/10 1:26:51 PM

Abram is a wonderful example of a person who trusted and followed God. Most 5-

Copyright 2014 Our Sunday Visitor Publishing Divison, Our Sunday Visitor, Inc. All rights reserved. Please call , or visit

This book is a revision of Growing in God s Love (42036).

Little Bighorn LESSONS LEARNED. Notes:

Implicit Deregistration in 3G Cellular Networks

Jesus Comes Back to Life

Most 5- and 6-year-olds know what it means to get ready. They ve learned to dress

AUGMENTING SHORT HYDROLOGICAL RECORDS TO IMPROVE WATER RESOURCES STUDIES

THE INTEGRATION OF ISLAMIC STOCK MARKETS: DOES A PROBLEM FOR INVESTORS?

A SCRIPTURE UNION HOLIDAY CLUB PROGRAMME GREAT NEW IDEAS, INSPIRED BY EXPERIENCE

First- and second-graders are developing a strong sense of competition with others,

Hearts Reaching Up to God

Preschoolers can be very impatient when waiting for their needs to be met or

Third- and fourth-graders often complain if they don t get things their way. They have

FAITHWEAVER NOW FAMILY-FRIENDLY SUNDAY SCHOOL

Where Are You Standing?

Transcription:

Reiforceme Learig wih Symbioic Relaioships for Muliage Eviromes Shigo Mabu Graduae School of Sciece ad Egieerig, Yamaguchi Uiversiy, Tokiwadai 2-6- Ube, Yamaguchi, 755-86, Japa Masaao Obayashi Graduae School of Sciece ad Egieerig, Yamaguchi Uiversiy, Tokiwadai 2-6- Ube, Yamaguchi, 755-86, Japa Takashi Kuremoo Graduae School of Sciece ad Egieerig, Yamaguchi Uiversiy, Tokiwadai 2-6- Ube, Yamaguchi, 755-86, Japa E-mail: mabu@yamaguchi-u.ac.jp, m.obayas@yamaguchi-u.ac.jp, wu@yamaguchi-u.ac.jp Absrac Sudies o muliage sysems have bee widely sudied ad realized cooperaive behaviors bewee ages, where may ages work ogeher o achieve heir objecives. I his paper, a ew reiforceme learig framework cosiderig he cocep of Symbiosis i order o represe complicaed relaioships bewee ages ad aalyze he emergig behavior. I addiio, disribued sae-acio value ables are also used o efficiely solve he muliage problems wih large umber of sae-acio pairs. From he simulaio resuls, i is clarified ha he proposed mehod shows beer performace comparig o he coveioal reiforceme learig wihou cosiderig symbiosis. Keywords: reiforceme learig, symbiosis, muliage sysem, cooperaive behavior,. Iroducio There are may siuaios where ieress of some paries are coicided or cofliced, for example, huma relaioships, cooperaio or compeiio bewee compaies, ad eve ieraioal relaioships. Recely, he globalizaio is rapidly progressig, hus, he relaioships bewee persos ad orgaizaios have become very complicaed eworks. O he oher had, iformaio sysems have bee iellige ad workig cooperaively wih each oher, for example, cloud eworks, car avigaio sysems ad auomaio by robos. Research o complex eworks bega aroud 998 2 ad have araced aeios recely as a impora research for aalyzig pheomea i social sysems. Therefore, a model ha ca predic problems caused by he complex eworks ad propose he opimal soluios for he problems will be useful for realizig safe ad secure social sysems. I addiio, if he bes relaioships bewee paries ca be foud, i will coribue o he developme of he whole sociey.

S. Mabu, M. Obayashi ad T. Kuremoo I his paper, a ovel reiforceme learig algorihm ha iroduces a cocep of "Symbiosis" i order o build Wi-Wi relaioships bewee paries eve if each pary is pursuig he maximizaio of is ow profis. Symbiosis ca be defied as a relaioship where wo or more orgaisms live i close associaio wih each oher 3, ad several compuaioal models based o he symbiosis i he ecosysem have bee sudied 4-7. I he proposed reiforceme learig mehod, muliage eviromes are cosidered where here are several ages (persos ad orgaizaios) ha have cooperaive, compeiive or self-saisfied relaioships, ad such relaioships are defied as "symbioic vecors". The symbioic vecors ca represe six basic symbioic relaioships, i.e., muualism, harm, predaio, alruism, self-improveme ad self-deerioraio. The symbioic vecors are used o calculae rewards give o each age whe updaig Q values i reiforceme learig. The symbioic vecors represe o oly he arge direcio of self-beefi, bu ha of he oher ages workig i he same evirome. As a resul, he proposed mehod wih symbioic vecors ca build acio rules for corollig beefi of several ages. Therefore, he proposed mehod ca predic he resuls uder he curre symbioic relaioships by implemeig simulaios. This paper is orgaized as follows. I secio 2, Q learig algorihm wih disribued sae-acio value able is iroduced for efficiely solvig he muliage problems wih a large umber of sae-acio pairs. Secio 3 explais he proposed learig algorihm usig symbioic vecors. Secio 4 describes he simulaio eviromes ad resuls. Fially, secio 5 is devoed o coclusios. 2. Q learig wih disribued sae-acio value ables I he sadard reiforceme learig, he umber of sae-acio pairs icreases expoeially as he umbers of ipus, objecs o be perceived, ad possible acios icrease, ha is called The curse of dimesioaliy 8. Therefore, Q learig algorihm wih disribued saeacio value able (Q able) is iroduced i his paper. Fig.. Q-able divisio 2.. Represeaio of disribued Q ables Suppose ha a se of ipus (sesors) of ages is I, ad a se of possible acios is A. The I is maually divided io several subses, i.e., I, I, 2, I ( I I, I, 2 I ), depedig o he problems. For example, i he self-sufficie garbage collecig problem used i he simulaios of his paper, here are maily hree asks which have o be achieved by ages, hus, I is divided io hree subses, i.e., I, I2, I. Therefore, hree sub-q-ables are creaed based o 3 I, I I, respecively (Fig. ). 2, 3 2.2. Sae rasiio ad learig I his subsecio, he procedure of decidig a acio is explaied based o a example show i Fig. (hree sub-q-ables are used). The procedure of decidig a acio is as follows. ) Whe ipus are give from a evirome, each sub-q-able idepedely deermies he curre sae s ( is he sub-q-able umber.,2,3 ). 2) Three acios a are idepedely seleced by each sub-q-able usig greedy policy 9. 3) Compare he hree Q-values of a, ad he sub-qable selecig he acio wih he maximum Q-value is wier defied as (wier-q-able), ad is curre sae wier is defied as s. 4) The acio seleced by wier-q-able is execued wih he probabiliy of, or radom acio is execued wih he probabiliy of -This execued acio is wier defied as a. The updae of Q value is execued by Eq. ().

RL wih Symbioic Relaioships Fig. 3. Symbioic relaioships bewee wo ages Fig. 2. Symbioic vecor ad six symbioic relaioships for age (A example of wo dimesios) wier wier wier wier wier wier, s, a Q, s, a wier wier wier r max Q, s, a Q, s, a, Q, a () where, wier, s wier ad a wier show he umber, sae ad acio of he wier sub-q-able a ime, respecively. is a sub-q-able umber, s + is he sae of sub-q-able a ime +, ad a is a possible acio i sub-q-able. r is a reward, ( 0. 0 ) is a learig rae, ad ( 0. 0) is a discou rae 3. Reiforceme Learig wih Symbiosis This secio iroduces a symbioic vecor ad how o apply he symbioic vecor o reiforceme learig. 3.. Symbioic vecors Sadard reiforceme learig aims o maximize rewards ha he self-age obais, however, i he proposed mehod, o oly he rewards for he selfage, bu also he rewards for oher ages are cosider o execue reiforceme learig. I addiio, six symbioic relaioships are cosidered o build he acio sraegies, ha is, Predaio, Muualism, Alruism, Harm, Self-improveme ad Self-deerioraio. Fig. 2 shows symbioic sraegy for "age ", where oe axis shows he weigh (E ) o he beefi of age (selfage), he oher axis shows he weigh (E 2 ) o he beefi of age 2. I oher words, E shows he symbioic sraegy of age for age, ad E 2 shows he symbioic sraegy of age for age 2. Therefore, if age aims o maximize rewards for boh ages, i will ake "Muualism" sraegy where symbioic vecor v =(E, E 2 )=(.0,.0) (.0 E, E2. 0 ). Fig. 3 shows a symbioic relaioship bewee wo ages, where oe age akes muualism sraegy ad he oher age akes predaio sraegy. I his case, i ca be cosidered ha he symbioic vecor of age is v =(E, E 2 )=(.0,.0), ad ha of age 2 is v 2 =(E 2, E 22 )= (-.0,.0). Iermediae values bewee -.0 ad.0 ca be also used o defie a symbioic relaioship. For example, a symbioic vecor v=(.0, 0.) shows a weak muualism ha cosider he oher age's beefi a lile. Therefore, he symbioic vecor flexibly represes ay degree of symbioic relaioships, ad moreover, i ca be exeded o he relaioships bewee may ages. 3.2. Reiforceme learig wih Symbioic vecors This subsecio explais how o updae Q values cosiderig a symbioic vecor. Here, suppose here are p ages (age # #p), where he symbioic vecor of age k ( k p ) is v k =(E k, E k2,..., E kp ). Afer age k akes a acio ad fids rewards for all he ages (r, r 2,..., r P ), he modified reward used for updaig Q value of age k i () is calculaed as follows. r k p l E Eq. (2) calculaes he sum of he weighed rewards of all he ages # #p. For isace, whe age akes muualism sraegy v =(E, E 2 )=(.0,.0) ad age 2 akes predaio sraegy v 2 =(E 2, E 22 )= (-.0,.0), Eq. (2) for age ad 2 ca be represeed by Eq. (3) ad (4), respecively. 2 r.0 r. r (3) kl r 0 l (2)

S. Mabu, M. Obayashi ad T. Kuremoo Table. Ipus ad acios # Ipu coes Ipu value Fig. 4. A simulaio evirome 4. Simulaios 4.. A simulaio evirome r 2 2.0 r. 0r (4) Self-sufficie garbage collecig problem 0 is used for he performace evaluaio of he proposed mehod. Fig. 4 shows he simulaio evirome used i his paper, where here are wo ages, rashes, oe chargig saio, ad wo rash collecig places. The aim of his problem is o collec may rashes i he evirome ad ake hem o he collecig places assiged o each age, i.e., age k has o ake rashes o he collecig place for age k o obai reward. I addiio, each age has a limied eergy o move, hus he ages should check he remaiig eergy ad go o he chargig saio before ruig ou of he eergy. Table shows he ipus ad possible acios ha he ages ca use. The iiial eergy is 00 (full charge), ad whe a age goes forward, eergy is used by hree, ad whe i urs righ or lef, eergy is used by oe. The eergy ca be recharged gradually if he age says a he chargig saio. The oal ime for oe episode is 00 seps. Reward 00 is give o age k whe a age akes oe rash o he colleig place for age k, 0 is give whe a age collecs a rash, ad 0. (chargedeergy) is give whe a age says a he chargig saio. 4.2. Codiios of Q-learig wih Disribued Q- ables Table 2 shows he seig of disribued Q-ables, where hree sub-q-ables are prepared for dealig wih asks forward cell ohig, obsacle, collecig place for age, collecig place for age 2, rash, chargig saio 2 backward cell he same as "forward cell" 3 righ cell he same as "forward cell" 4 lef cell he same as "forward cell" 5 direcio of eares rash forward, backward, righ, lef 6 direcio of chargig forward, backward, righ, lef saio 7 direcio of collecig forward, backward, righ, lef place for age 8 direcio of collecig forward, backward, righ, lef place for age 2 9 he umber of holdig 0,, 2 (max 2) rashes 0 curre eergy level low (less ha 30), high (more ha 70), middle (oher values) Acios go forward 2 ur righ 3 ur lef 4 o acio for age 's beefi, for age 2's beefi, ad chargig eergy, respecively. Each age has is ow Q-able (hree sub-q-ables), i.e., he reiforceme learig of he wo ages are carried ou idepedely. The learig parameers are se as learig rae =0., discou rae =0.9, ad =0.05. 4.3. Simulaio resuls Table 2. Sub-Q-able seig able # Mai ask Ipu # used i each sub-q-able for age 's beefi,2,3,4,5,7,9 2 for age 2's beefi,2,3,4,5,8,9 3 Chargig eergy,2,3,4,6,0 I his subsecio, o cofirm he basic effecs of he symbioic relaioship, he proposed mehod wih muualism sraegy is compared wih he coveioal Q-learig, i.e., boh ages ake Self-improveme sraegy.

RL wih Symbioic Relaioships Fig. 5. The umber of colleced rashes a collecig places for Age ad Age 2 (Coveioal mehod) Fig. 6. The umber of colleced rashes a collecig places for Age ad Age 2 (Proposed mehod wih Muualism) Fig. 5 shows he umber of rashes ake o he collecig places for age ad ha for age 2, respecively, obaied by he coveioal mehod. Each lie is he average over 20 idepede simulaios. As he episode goes o, he umber of rashes icreases, ad i 5000h episode,.5 rashes are ake o he collecig place for age ad 2.25 rashes are for age 2. Fig 6 shows he resuls of he proposed mehod, are for age 2. I should be oed ha he umber of colleced rashes for age icreases wihou decreasig ha for age 2 (eve icreasig).therefore, i is clarified ha he proposed mehod wih muualism sraegy ca obai cooperaio behavior ad show beer performace ha he coveioal mehod. Nex, he coribuio of age ad 2 o akig rashes o he collecig places is aalyzed. Table 3 shows he daa o he umber of rashes ake o he collecig places i he fial episode. As described before, he proposed mehod akes 2.0 rashes o he collecig place for age o average, where he coribuio of age, i.e., he umber of rashes ha age akes o he collecig place for age, is.20, ad he coribuio of age 2, i.e., he umber of rashes ha age 2 akes o he collecig place for age, is 0.90. Therefore, we ca fid ha boh age ad 2 coribue o he beefi of age. Q learig akes.50 rashes o he collecig place for age o average, where he coribuio of age is.50 ad ha of age 2 is zero, which meas ha oly age coribues o he age 's beefi. Nex, he umber of rashes ake o he collecig place for age 2 is aalyzed. The proposed mehod akes 3.90 rashes o average, where he coribuio of age is.95 ad ha of age 2 is 0.90. Therefore, we ca also fid ha boh ages coribue o he beefi of age 2. Q learig akes 2.25 rashes, where he coribuio of age is 0.25 ad ha of age 2 is 2.00, which shows ha age coribues o he beefi of age 2 oly a lile, ad age 2 coribues o is ow beefi oly. 5. Coclusios I his paper, a ovel reiforceme learig algorihm based o symbioic relaioships is proposed, where symbioic vecor is iroduced o represe various kids of relaioships. I he simulaios, he Table 3. Daa o he umber of rashes ake o he collecig places i he las episode The umber of rashes ake for age The umber of rashes ake for age 2 Toal Coribuio of age Coribuio of age 2 Toal Coribuio of age Coribuio of age 2 Proposed mehod 2.0.20 0.90 3.90.95.95 Q learig.50.50 0.00 2.25 0.25 2.00 where 2. rashes are ake for age ad 3.9 rashes effeciveess of he proposed mehod wih muualism sraegy has bee show. By cosiderig o oly he

S. Mabu, M. Obayashi ad T. Kuremoo beefi of he self-age, bu also ha of he oher ages, cooperaive behaviors emerged. I he fuure, oher combiaios of symbioic relaioships will be cosidered o aalyze he emergig behaviors, ad moreover, mulilaeral relaioships will be also cosidered o build simulaio models dealig wih real siuaios of cooperaio or coflic bewee ages. Refereces. Ke Bimore, Game Teory: A Very Shor Iroducio (Oxford Uiersiy Press, 2007) 2. D. J. Was, S. H. Srogaz, Collecive dyamics of smallworld eworks, Naure, 393 (998) 440-442. 3. L. Margulis, The Symbioic Plae, Coac, (999). 4. T. Eguchi, K. Hirasawa, J. Hu, ad N. Oa, A Sudy of Evoluioary Muliage Models Based o Symbiosis, IEEE Tras. o Sysems, Ma, ad Cybereics, par B, (36), (2006). 5. J. Y. Goulermas ad P. Liasis, Hybrid symbioic geeic opimizaio for robus edge-based sereo correspodece, Paer Recogiio, (34), (200) 2477 2496. 6. J. Y. Goulermas ad P. Liasis, A collecive-based adapive symbioic model for surface recosrucio i area-based sereo, IEEE Tras. o Evoluioary Compuaio, (7) 5, (2003), 482 502. 7. C. P. Pieers, Symbioic eworks, i Proc. of he IEEE Cogress o Evoluioary Compuaio, (2003) 92 927. 8. R. E. Bellma, Dyamic Programmig, (Priceo Uiversiy Press, 957). 9. R. S. Suo, A. G. Baro, Reiforceme Learig - A Iroducio -, (998). 0. R. Pfeifer, ad C. Scheier. Udersadig ielligece, (MIT press, 999).