Mission Structure

Take a look at the Mission.xsd in the Schemas folder of the root directory where Malmö is installed. In there you will find Schema definitions for elements that form the overall structure of Missions. The general structure of a Mission XML can be inferred from there and from the tutorial code examples provided as such:

        <Summary>A Summary of the Mission</Summary>
        <Description>A more detailed Description of the Mission</Description>
            <!-- One of the following: 

                <FlatWorldGenerator>, <FileWorldGenerator> and 


            <!-- Some number of the following: 

                 <DrawingDecorator>, <MazeDecorator>, 
                 <ClassroomDecorator> and <SnakeDecorator>

            <!-- Some number of the following: 

                 <ServerQuitFromTimeUp> and 
        <!-- Some number of the following. 
             Note: AgentMissionHandlers are defined as a large xs:group in Mission.xsd
                   These include: Observation Producers, Reward Handlers, Command Handlers, 
                                  and Agent Quit Producers  



Hopefully the above gives a good representation of what Missions look like in Malmö. Wherever possible, common sensible values have been replaced such as for the Weather and StartTime. XML comments are defined using <!-- and --> with the contents within. The comments above talk about what is to be in place of the comments in the given regions of the XML document.

Mission Handlers

This tutorial is focused on creating a new task, the Build Battle Task, and specifically for that we need a new Agent Mission Handler and in particular a Reward Handler.

Now, we will add our first line of code! Scroll to the xs:group named AgentMissionHandlers in the Mission.xsd file. There should be a section of code with Reward Handlers. Add an element at the end of the section with ref set to RewardForStructureCopying and minOccurs set to 0.

As we have talked about in the previous section, minOccurs is an occurrence indicator that specifies the minimum number of times an element needs to appear (in this case in the all section of the group). The ref attribute references an element that is declared elsewhere, which may or may not be in the same schema document. Take a look at the top of the Mission.xsd file. There you will see some xs:includes. We will be writing the schema for the RewardForStructureCopying element in MissionHandler.xsd where all the other RewardHandlers and indeed (Agent)MissionHandlers are defined.

After completing the above, you should have something like what is shown below:

    <xs:element ref="RewardForTouchingBlockType" minOccurs="0" />
    <xs:element ref="RewardForSendingCommand" minOccurs="0" />
    <xs:element ref="RewardForCollectingItem" minOccurs="0" />
    <xs:element ref="RewardForDiscardingItem" minOccurs="0" />
    <xs:element ref="RewardForReachingPosition" minOccurs="0"/>
    <xs:element ref="RewardForMissionEnd" minOccurs="0"/>
    <xs:element ref="RewardForStructureCopying" minOccurs="0"/>

Now, let us write the XML Schema Definition for RewardForStructureCopying. First, we open MissionHandler.xsd. Now, scroll down to the section just after the comment which contains the phrase "REWARD PRODUCERS."

To get an idea of what we will be adding first, take a look at the simpleType definition Behavior:

<xs:simpleType name="Behaviour">
    <xs:restriction base="xs:string">
      <xs:enumeration value="onceOnly" />
      <xs:enumeration value="oncePerBlock" />
      <xs:enumeration value="oncePerTimeSpan" />
      <xs:enumeration value="constant" />

As can be seen above, Behavior is a simpleType that restricts a string to take one of four possible values (an enumeration with four possible values.) Functionally, if one looks at the naming and the Java files associated with this Schema, Behavior specifies what is commonly known in reinforcement learning as reward density.

As the names imply: onceOnly signifies that an agent will receive a reward once only, oncePerBlock signifies that an agent will receive rewards once per block, oncePerTimeSpan specifies that a reward is received only once in some time span, and finally, consant signifies that a reward is constantly given.

Now, the above explanations might seem a little vague, but it should do for what we are going to add. RewardDensityForBuildAndBreak is type very similar to Behavior in that it is also a reward density type. Ideally we should reuse types if at all possible, however; just to illustrate the definitions of types in XSD as well as make a more functional and meaning type for the particular task we are creating, we define this new type. Go ahead and add the following code snippet right after the last Schema definition in the Reward Producers section.

  <xs:simpleType name="RewardDensityForBuildAndBreak">
    <xs:restriction base="xs:string">
      <xs:enumeration value="PER_BLOCK"/>
      <xs:enumeration value="ACCUMULATED"/>
      <xs:enumeration value="MISSION_END"/>

The above simpleType definition also uses an xs:restriction with the type xs:string. It can take one of three values, PER_BLOCK which signifies that a reward is given per block placed or broken, ACCUMULATED signifying some sort of reward accumulation and MISSION_END signifying that a reward is only given at a mission's end. Again, these may seem like vague description, but they will become clear when we add the Minecraft Forge code.

Now, in the Build Battle, we will be comparing a given area where the player (agent) is to construct a structure that is already built. A simple way to define such areas is using a cuboid. Given that Minecraft is comprised of voxels (individual cubes), we can describe a cuboid using just 2 block positions: a minimum and maximum that form a diagonal of the cuboid. Take a look at the definition of Grid and DrawCuboid in the same MissionHandler.xsd file. Both of them are indeed what we would like to some extent; however, we don't really need to have a Grid that is named nor do we need to specify block types and other block properties which the DrawCuboid requires. We thus can define another type, namely UnnamedGridDefinition.

  <xs:complexType name="UnnamedGridDefinition">
      <xs:element name="min" type="Pos" />
      <xs:element name="max" type="Pos" />

Note that the above type is complex and consists of a sequence with two elements, a Pos named min and a Pos named max where Pos is on closer inspection (in the same MissionHandler.xsd file once again), a 3-tuple defining x, y and z coordinates. Importantly, it should also be said that xs:sequence tags by default use a minOccurs and maxOccurs of 1 thus required all UnnamedGridDefinition to have both a max and min.

Last, but certainly not least, we define the actual RewardForStructureCopying Schema. It will take two UnnamedGridDefinitions defining the player structure bounds and built or goal structure bounds that are to be compared. Also, it will take two DrawBlockBasedObjectTypes that allow for providing visual feedback to agents and choosing the block properties to use when blocks are switched for visual feedback. For example, if the player places a block outside the player structure boundary, upon block placement the type of the block could change to redstone (giving the block a red appearance) along with a negative reward. Finally, we have a RewardDensityForBuildAndBreak also. This gives the following code.

  <xs:element name="RewardForStructureCopying">
        <xs:element name="PlayerStructureBounds" type="UnnamedGridDefinition" />
        <xs:element name="GoalStructureBounds" type="UnnamedGridDefinition" />
        <xs:element name="BlockTypeOnCorrectPlacement"
                    type="DrawBlockBasedObjectType" nillable="true" />
        <xs:element name="BlockTypeOnIncorrectPlacement"
                    type="DrawBlockBasedObjectType" nillable="true" />
        <xs:element name="RewardDensity" type="RewardDensityForBuildAndBreak" />
      <xs:attributeGroup ref="RewardProducerAttributes"/>

The inclusion of the attributeGroup RewardProducerAttributes is yet unexplained. Examine the other Reward Producers, you will in fact see this attributeGroup in their definitions as well. The reason this is added is to allow for Reward Producers to share some attributes. In this case, we'd like to give Rewards a dimension. The idea is to be able to associate certain types of rewards with a dimension. For example, we might want to associate rewards for reaching goal locations with say dimension 1 while rewards for defeating enemies with another dimension, giving a total of two dimensions and corresponding to two-dimensional vectors.

Upon adding the above three code snippets to the end of the Reward Producers section, we are now ready to start writing the Forge side code and making changes to the Minecraft mod.

At this point, ensure that you have added the single line of code to your Mission.xsd file that adds a ref to RewardForStructureCopying as well as the above three code snippets in the given order right at the end of the Reward Producers section in the MissionHandlers.xsd file.

To test if everything is working right, try rebuilding the platform and in particular, try running the gradle task JAXB by executing:

The above will perform a task whereby necessary Java definitions of the schemas we just defined will be added. These will in fact be used shortly.