as: Xtensa Automatic Alignment

 
 9.54.3.2 Automatic Instruction Alignment
 ........................................
 
 The Xtensa assembler will automatically align certain instructions, both
 to optimize performance and to satisfy architectural requirements.
 
    As an optimization to improve performance, the assembler attempts to
 align branch targets so they do not cross instruction fetch boundaries.
 (Xtensa processors can be configured with either 32-bit or 64-bit
 instruction fetch widths.)  An instruction immediately following a call
 is treated as a branch target in this context, because it will be the
 target of a return from the call.  This alignment has the potential to
 reduce branch penalties at some expense in code size.  This optimization
 is enabled by default.  You can disable it with the '--no-target-align'
 command-line option (SeeCommand-line Options Xtensa Options.).
 
    The target alignment optimization is done without adding instructions
 that could increase the execution time of the program.  If there are
 density instructions in the code preceding a target, the assembler can
 change the target alignment by widening some of those instructions to
 the equivalent 24-bit instructions.  Extra bytes of padding can be
 inserted immediately following unconditional jump and return
 instructions.  This approach is usually successful in aligning many, but
 not all, branch targets.
 
    The 'LOOP' family of instructions must be aligned such that the first
 instruction in the loop body does not cross an instruction fetch
 boundary (e.g., with a 32-bit fetch width, a 'LOOP' instruction must be
 on either a 1 or 2 mod 4 byte boundary).  The assembler knows about this
 restriction and inserts the minimal number of 2 or 3 byte no-op
 instructions to satisfy it.  When no-op instructions are added, any
 label immediately preceding the original loop will be moved in order to
 refer to the loop instruction, not the newly generated no-op
 instruction.  To preserve binary compatibility across processors with
 different fetch widths, the assembler conservatively assumes a 32-bit
 fetch width when aligning 'LOOP' instructions (except if the first
 instruction in the loop is a 64-bit instruction).
 
    Previous versions of the assembler automatically aligned 'ENTRY'
 instructions to 4-byte boundaries, but that alignment is now the
 programmer's responsibility.