GAS has acquired layers of code over time. The original GAS only supported the a.out object file format, with three sections. Support for multiple sections has been added in two different ways.
The preferred approach is to use the version of GAS created when the symbol
BFD_ASSEMBLER
is defined. The other versions of GAS are documented for
historical purposes, and to help anybody who has to debug code written for
them.
The type segT
is used to represent a section in code which must work
with all versions of GAS.
The original GAS only supported the a.out object file format with three
sections: `.text', `.data', and `.bss'. This is the version of
GAS that is compiled if neither BFD_ASSEMBLER
nor MANY_SEGMENTS
is defined. This version of GAS is still used for the m68k-aout target, and
perhaps others.
This version of GAS should not be used for any new development.
There is still code that is specific to this version of GAS, notably in
`write.c'. There is no way for this code to loop through all the
sections; it simply looks at global variables like text_frag_root
and
data_frag_root
.
The type segT
is an enum.
The MANY_SEGMENTS
version of gas is only used for COFF. It uses the BFD
library, but it writes out all the data itself using bfd_write
. This
version of gas supports up to 40 normal sections. The section names are stored
in the seg_name
array. Other information is stored in the
segment_info
array.
The type segT
is an enum. Code that wants to examine all the sections
can use a segT
variable as loop index from SEG_E0
up to but not
including SEG_UNKNOWN
.
Most of the code specific to this version of GAS is in the file
`config/obj-coff.c', in the portion of that file that is compiled when
BFD_ASSEMBLER
is not defined.
This version of GAS is still used for several COFF targets.
The preferred version of GAS is the BFD_ASSEMBLER
version. In this
version of GAS, the output file is a normal BFD, and the BFD routines are used
to generate the output.
BFD_ASSEMBLER
will automatically be used for certain targets, including
those that use the ELF, ECOFF, and SOM object file formats, and also all Alpha,
MIPS, PowerPC, and SPARC targets. You can force the use of
BFD_ASSEMBLER
for other targets with the configure option
`--enable-bfd-assembler'; however, it has not been tested for many
targets, and can not be assumed to work.
This section describes some fundamental GAS data types.
The definition for struct symbol
, also known as symbolS
, is
located in `struc-symbol.h'. Symbol structures contain the following
fields:
sy_value
expressionS
that describes the value of the symbol. It might
refer to one or more other symbols; if so, its true value may not be known
until resolve_symbol_value
is called in write_object_file
.
The expression is often simply a constant. Before resolve_symbol_value
is called, the value is the offset from the frag (see section Frags). Afterward,
the frag address has been added in.
sy_resolved
sy_resolving
sy_used_in_reloc
sy_next
sy_previous
symbolS
structures describe a singly or doubly
linked list. (If SYMBOLS_NEED_BACKPOINTERS
is not defined, the
sy_previous
field will be omitted; SYMBOLS_NEED_BACKPOINTERS
is
always defined if BFD_ASSEMBLER
.) These fields should be accessed with
the symbol_next
and symbol_previous
macros.
sy_frag
sy_used
sy_mri_common
COMMON
pseudo-op when assembling in MRI mode.
bsym
BFD_ASSEMBLER
is defined, this points to the BFD asymbol
that
will be used in writing the object file.
sy_name_offset
BFD_ASSEMBLER
is not defined.) This is the position of
the symbol's name in the string table of the object file. On some formats,
this will start at position 4, with position 0 reserved for unnamed symbols.
This field is not used until write_object_file
is called.
sy_symbol
BFD_ASSEMBLER
is not defined.) This is the
format-specific symbol structure, as it would be written into the object file.
sy_number
BFD_ASSEMBLER
is not defined.) This is a 24-bit symbol
number, for use in constructing relocation table entries.
sy_obj
OBJ_SYMFIELD_TYPE
. If no macro by
that name is defined in `obj-format.h', this field is not defined.
sy_tc
TC_SYMFIELD_TYPE
. If no macro
by that name is defined in `targ-cpu.h', this field is not defined.
TARGET_SYMBOL_FIELDS
OBJ_SYMFIELD_TYPE
and TC_SYMFIELD_TYPE
.
There are a number of access routines used to extract the fields of a
symbolS
structure. When possible, these routines should be used rather
than referring to the fields directly. These routines will work for any GAS
version.
S_SET_VALUE
S_GET_VALUE
resolve_symbol_value
to be
called if necessary, so S_GET_VALUE
should only be called when it is
safe to resolve symbols (i.e., after the entire input file has been read and
all symbols have been defined).
S_SET_SEGMENT
S_GET_SEGMENT
S_GET_NAME
S_SET_NAME
S_IS_EXTERNAL
S_IS_EXTERN
S_IS_EXTERNAL
. Don't use it.
S_IS_WEAK
S_IS_COMMON
S_IS_DEFINED
S_IS_DEBUG
S_IS_LOCAL
S_IS_EXTERNAL
. The `-L' assembler option affects the return value
of this function.
S_SET_EXTERNAL
S_CLEAR_EXTERNAL
S_SET_WEAK
S_GET_TYPE
S_GET_DESC
S_GET_OTHER
type
, desc
, and other
fields of the symbol. These
are only defined for object file formats for which they make sense (primarily
a.out).
S_SET_TYPE
S_SET_DESC
S_SET_OTHER
type
, desc
, and other
fields of the symbol. These
are only defined for object file formats for which they make sense (primarily
a.out).
S_GET_SIZE
S_SET_SIZE
Expressions are stored in an expressionS
structure. The structure is
defined in `expr.h'.
The macro expression
will create an expressionS
structure based
on the text found at the global variable input_line_pointer
.
A single expressionS
structure can represent a single operation.
Complex expressions are formed by creating expression symbols and
combining them in expressionS
structures. An expression symbol is
created by calling make_expr_symbol
. An expression symbol should
naturally never appear in a symbol table, and the implementation of
S_IS_LOCAL
(see section Symbols) reflects that. The function
expr_symbol_where
returns non-zero if a symbol is an expression symbol,
and also returns the file and line for the expression which caused it to be
created.
The expressionS
structure has two symbol fields, a number field, an
operator field, and a field indicating whether the number is unsigned.
The operator field is of type operatorT
, and describes how to interpret
the other fields; see the definition in `expr.h' for the possibilities.
An operatorT
value of O_big
indicates either a floating point
number, stored in the global variable generic_floating_point_number
, or
an integer to large to store in an offsetT
type, stored in the global
array generic_bignum
. This rather inflexible approach makes it
impossible to use floating point numbers or large expressions in complex
expressions.
A fixup is basically anything which can not be resolved in the first pass. Sometimes a fixup can be resolved by the end of the assembly; if not, the fixup becomes a relocation entry in the object file.
A fixup is created by a call to fix_new
or fix_new_exp
. Both
take a frag (see section Frags), a position within the frag, a size, an indication
of whether the fixup is PC relative, and a type. In a BFD_ASSEMBLER
GAS, the type is nominally a bfd_reloc_code_real_type
, but several
targets use other type codes to represent fixups that can not be described as
relocations.
The fixS
structure has a number of fields, several of which are obsolete
or are only used by a particular target. The important fields are:
fx_frag
fx_where
fx_addsy
fx_subsy
fx_offset
fx_addnumber
md_apply_fix
and tc_gen_reloc
. The machine independent code does
not use it.
fx_next
fx_r_type
BFD_ASSEMBLER
, or
if the target defines NEED_FX_R_TYPE
.
fx_size
fx_pcrel
fx_done
fx_file
fx_line
tc_fix_data
TC_FIX_TYPE
, and is only defined if the target defines
that macro.
The fragS
structure is defined in `as.h'. Each frag represents a
portion of the final object file. As GAS reads the source file, it creates
frags to hold the data that it reads. At the end of the assembly the frags and
fixups are processed to produce the final contents.
fr_address
relax_segment
fills in this field.
fr_next
fr_fix
fr_var
fr_fix
characters. May be zero.
fr_offset
fr_type
. Generally,
if fr_var
is non-zero, this is a repeat count: the fr_var
characters are output fr_offset
times.
line
fr_type
fr_offset
,
fr_symbol
and the variable-length tail of the frag, as well as the
treatment it gets in various phases of processing. It does not affect the
initial fr_fix
characters; they are always supposed to be output
verbatim (fixups aside). See below for specific values this field can have.
fr_subtype
md_relax_frag
isn't defined, this is
assumed to be an index into TC_GENERIC_RELAX_TABLE
for the generic
relaxation code to process (see section Relaxation). If md_relax_frag
is
defined, this field is available for any use by the CPU-specific code.
fr_symbol
fr_type
.
fr_opcode
tc_frag_data
TC_FRAG_TYPE
is defined.
fr_file
fr_line
fr_literal
These are the possible relaxation states, provided in the enumeration type
relax_stateT
, and the interpretations they represent for the other
fields:
rs_align
rs_align_code
fr_offset
is the logarithm (base 2) of the alignment in bytes.
(For example, if alignment on an 8-byte boundary were desired, fr_offset
would have a value of 3.) The variable characters indicate the fill pattern to
be used. The fr_subtype
field holds the maximum number of bytes to skip
when doing this alignment. If more bytes are needed, the alignment is not
done. An fr_subtype
value of 0 means no maximum, which is the normal
case. Target backends can use rs_align_code
to handle certain types of
alignment differently.
rs_broken_word
rs_cfa
fr_symbol
is an expression symbol for the subtraction which may be
relaxed. The fr_opcode
field holds the frag for the preceding command
byte. The fr_offset
field holds the offset within that frag. The
fr_subtype
field is used during relaxation to hold the current size of
the frag.
rs_fill
fr_offset
times. If
fr_offset
is 0, this frag has a length of fr_fix
. Most frags
have this type.
rs_leb128
fr_symbol
is always an expression
symbol, as constant expressions are emitted directly. The fr_offset
field is used during relaxation to hold the previous size of the number so
that we can determine if the fragment changed size.
rs_machine_dependent
fr_symbol
and fr_offset
, and fr_subtype
indicates the
particular machine-specific addressing mode desired. See section Relaxation.
rs_org
fr_symbol
and fr_offset
; one
character from the variable-length tail is used as the fill character.
A chain of frags is built up for each subsection. The data structure
describing a chain is called a frchainS
, and contains the following
fields:
frch_root
frch_last
frch_next
frchainS
structures.
frch_seg
frch_subseg
fix_root, fix_tail
BFD_ASSEMBLER
is defined). Point to first and last
fixS
structures associated with this subsection.
frch_obstack
frch_frag_now
A frchainS
corresponds to a subsection; each section has a list of
frchainS
records associated with it. In most cases, only one subsection
of each section is used, so the list will only be one element long, but any
processing of frag chains should be prepared to deal with multiple chains per
section.
After the input files have been completely processed, and no more frags are to be generated, the frag chains are joined into one per section for further processing. After this point, it is safe to operate on one chain per section.
The assembler always has a current frag, named frag_now
. More space is
allocated for the current frag using the frag_more
function; this
returns a pointer to the amount of requested space. Relaxing is done using
variant frags allocated by frag_var
or frag_variant
(see section Relaxation).
This is a quick look at what an assembler run looks like.
read_a_source_file
function reads in the file
and parses it. The global variable input_line_pointer
points to the
current text; it is guaranteed to be correct up to the end of the line, but not
farther.
colon
function, and
isolates the first word. If it looks like a pseudo-op, the word is looked up
in the pseudo-op hash table po_hash
and dispatched to a pseudo-op
routine. Otherwise, the target dependent md_assemble
routine is called
to parse the instruction.
frag_more
to get space to store it in.
fix_new
or
fix_new_exp
.
write_object_file
routine is
called. It assigns addresses to all the frags (relax_segment
), resolves
all the fixups (fixup_segment
), resolves all the symbol values (using
resolve_symbol_value
), and finally writes out the file (in the
BFD_ASSEMBLER
case, this is done by simply calling bfd_close
).
Each GAS target specifies two main things: the CPU file and the object format
file. Two main switches in the `configure.in' file handle this. The
first switches on CPU type to set the shell variable cpu_type
. The
second switches on the entire target to set the shell variable fmt
.
The configure script uses the value of cpu_type
to select two files in
the `config' directory: `tc-CPU.c' and `tc-CPU.h'.
The configuration process will create a file named `targ-cpu.h' in the
build directory which includes `tc-CPU.h'.
The configure script also uses the value of fmt
to select two files:
`obj-fmt.c' and `obj-fmt.h'. The configuration process
will create a file named `obj-format.h' in the build directory which
includes `obj-fmt.h'.
You can also set the emulation in the configure script by setting the em
variable. Normally the default value of `generic' is fine. The
configuration process will create a file named `targ-env.h' in the build
directory which includes `te-em.h'.
Porting GAS to a new CPU requires writing the `tc-CPU' files. Porting GAS to a new object file format requires writing the `obj-fmt' files. There is sometimes some interaction between these two files, but it is normally minimal.
The best approach is, of course, to copy existing files. The documentation below assumes that you are looking at existing files to see usage details.
These interfaces have grown over time, and have never been carefully thought out or designed. Nothing about the interfaces described here is cast in stone. It is possible that they will change from one version of the assembler to the next. Also, new macros are added all the time as they are needed.
The CPU backend files are the heart of the assembler. They are the only parts of the assembler which actually know anything about the instruction set of the processor.
You must define a reasonably small list of macros and functions in the CPU backend files. You may define a large number of additional macros in the CPU backend files, not all of which are documented here. You must, of course, define macros in the `.h' file, which is included by every assembler source file. You may define the functions as macros in the `.h' file, or as functions in the `.c' file.
TC_CPU
TC_M68K
. You might have to use this
if it is necessary to add CPU specific code to the object format file.
TARGET_FORMAT
OBJ_FMT
macro.
TARGET_ARCH
bfd_set_arch_mach
.
TARGET_MACH
bfd_set_arch_mach
. If
it is not defined, GAS will use 0.
TARGET_BYTES_BIG_ENDIAN
md_shortopts
md_longopts
md_longopts_size
md_parse_option
md_show_usage
md_shortopts
is a const char *
which GAS adds to the machine
independent string passed to getopt
. md_longopts
is a
struct option []
which GAS adds to the machine independent long options
passed to getopt
; you may use OPTION_MD_BASE
, defined in
`as.h', as the start of a set of long option indices, if necessary.
md_longopts_size
is a size_t
holding the size md_longopts
.
GAS will call md_parse_option
whenever getopt
returns an
unrecognized code, presumably indicating a special code value which appears in
md_longopts
. GAS will call md_show_usage
when a usage message is
printed; it should print a description of the machine specific options.
md_begin
md_cleanup
md_assemble
md_assemble
will do this by calling frag_more
and writing out
some bytes (see section Frags). md_assemble
will call fix_new
to
create fixups as needed (see section Fixups). Targets which need to do special
purpose relaxation will call frag_var
.
md_pseudo_table
pseudo_typeS
. It is a mapping from
pseudo-op names to functions. You should use this table to implement
pseudo-ops which are specific to the CPU.
tc_conditional_pseudoop
pseudo_typeS
argument.
It should return non-zero if the pseudo-op is a conditional which controls
whether code is assembled, such as `.if'. GAS knows about the normal
conditional pseudo-ops,and you should normally not have to define this macro.
comment_chars
const char
array of characters which start a
comment.
tc_comment_chars
comment_chars
.
line_comment_chars
const char
array of characters which start a
comment when they appear at the start of a line.
line_separator_chars
const char
array of characters which separate
lines (the semicolon is such a character by default, and need not be listed in
this array).
EXP_CHARS
const char
array of characters which may be
used as the exponent character in a floating point number. This is normally
"eE"
.
FLT_CHARS
const char
array of characters which may be
used to indicate a floating point constant. A zero followed by one of these
characters is assumed to be followed by a floating point number; thus they
operate the way that 0x
is used to indicate a hexadecimal constant.
Usually this includes `r' and `f'.
LEX_AT
LEX_NAME
and LEX_BEGIN_NAME
,
both defined in `read.h'. LEX_NAME
indicates that the character
may appear in a name. LEX_BEGIN_NAME
indicates that the character may
appear at the beginning of a nem.
LEX_BR
LEX_PCT
LEX_QM
LEX_DOLLAR
LEX_NAME | LEX_BEGIN_NAME
.
SINGLE_QUOTE_STRINGS
NO_STRING_ESCAPES
ONLY_STANDARD_ESCAPES
md_start_line_hook
LABELS_WITHOUT_COLONS
TC_START_LABEL
NO_PSEUDO_DOT
TC_EQUAL_IN_INSN
TC_EOL_IN_INSN
md_parse_name
reg_section
.
md_undefined_symbol
md_begin
is called.
md_operand
input_line_pointer
will point to the start
of the expression.
tc_unrecognized_line
md_do_align
HANDLE_ALIGN
md_flush_pending_output
TC_PARSE_CONS_EXPRESSION
.word
. You can use this to recognize relocation
directives that may appear in such directives.
BITFIELD_CONS_EXPRESSION
REPEAT_CONS_EXPRESSION
md_cons_align
TC_CONS_FIX_NEW
TC_INIT_FIX_DATA (fixp)
TC_FIX_TYPE
macro.
TC_FIX_DATA_PRINT (stream, fixp)
print_fixup
.
TC_FRAG_INIT (fragp)
TC_FRAG_TYPE
macro.
md_number_to_chars
number_to_chars_bigendian
or
number_to_chars_littleendian
, whichever is appropriate. On targets like
the MIPS which support options to change the endianness, which function to call
is a runtime decision. On other targets, md_number_to_chars
can be a
simple macro.
md_reloc_size
BFD_ASSEMBLER
and not MANY_SEGMENTS
). It holds the size of a
relocation entry.
WORKING_DOT_WORD
md_short_jump_size
md_long_jump_size
md_create_short_jump
md_create_long_jump
WORKING_DOT_WORD
is defined, GAS will not do broken word processing
(see section Broken words). Otherwise, you should set md_short_jump_size
to
the size of a short jump (a jump that is just long enough to jump around a long
jmp) and md_long_jump_size
to the size of a long jump (a jump that can
go anywhere in the function), You should define md_create_short_jump
to
create a short jump around a long jump, and define md_create_long_jump
to create a long jump.
md_estimate_size_before_relax
rs_machine_dependent
frag before any relaxing is done. It may also create any necessary
relocations.
md_relax_frag
md_relax_frag
should
return the change in size of the frag. See section Relaxation.
TC_GENERIC_RELAX_TABLE
md_relax_frag
, you may define
TC_GENERIC_RELAX_TABLE
as a table of relax_typeS
structures. The
machine independent code knows how to use such a table to relax PC relative
references. See `tc-m68k.c' for an example. See section Relaxation.
md_prepare_relax_scan
LINKER_RELAXING_SHRINKS_ONLY
md_begin
), a
`.align' directive will cause extra space to be allocated. The linker can
then discard this space when relaxing the section.
md_convert_frag
md_apply_fix
TC_HANDLES_FX_DONE
md_apply_fix
correctly sets the
fx_done
field in the fixup.
tc_gen_reloc
BFD_ASSEMBLER
GAS will call this to generate a reloc. GAS will pass
the resulting reloc to bfd_install_relocation
. This currently works
poorly, as bfd_install_relocation
often does the wrong thing, and
instances of tc_gen_reloc
have been written to work around the problems,
which in turns makes it difficult to fix bfd_install_relocation
.
RELOC_EXPANSION_POSSIBLE
tc_gen_reloc
may return multiple
relocation entries for a single fixup. In this case, the return value of
tc_gen_reloc
is a pointer to a null terminated array.
MAX_RELOC_EXPANSION
RELOC_EXPANSION_POSSIBLE
is defined; it
indicates the largest number of relocs which tc_gen_reloc
may return for
a single fixup.
tc_fix_adjustable
MD_PCREL_FROM_SECTION
md_pcrel_from
MD_PCREL_FROM_SECTION
. The difference is
that md_pcrel_from
does not take a section argument.
tc_frob_label
md_section_align
tc_frob_section
BFD_ASSEMBLER
GAS will call it for each
section at the end of the assembly.
tc_frob_file_before_adjust
tc_frob_symbol
tc_frob_file
tc_frob_file_after_relocs
LISTING_HEADER
"GAS LISTING"
.
LISTING_WORD_SIZE
LISTING_LHS_WIDTH
LISTING_WORD_SIZE
bytes. The
default value is 1.
LISTING_LHS_WIDTH_SECOND
LISTING_LHS_WIDTH
, but applying to the second and subsequent line
of the data printed for a particular source line. The default value is 1.
LISTING_LHS_CONT_LINES
LISTING_RHS_WIDTH
As with the CPU backend, the object format backend must define a few things, and may define some other things. The interface to the object format backend is generally simpler; most of the support for an object file format consists of defining a number of pseudo-ops.
The object format `.h' file must include `targ-cpu.h'.
This section will only define the BFD_ASSEMBLER
version of GAS. It is
impossible to support a new object file format using any other version anyhow,
as the original GAS version only supports a.out, and the MANY_SEGMENTS
GAS version only supports COFF.
OBJ_format
OBJ_ELF
. You might have to use this
if it is necessary to add object file format specific code to the CPU file.
obj_begin
obj_app_file
.file
pseudo-op or a `#' line as used by the C preprocessor.
OBJ_COPY_SYMBOL_ATTRIBUTES
obj_fix_adjustable
obj_sec_sym_ok_for_reloc
EMIT_SECTION_SYMBOLS
obj_adjust_symtab
.file
symbol if none was generated previously.
SEPARATE_STAB_SECTIONS
INIT_STAB_SECTION
OBJ_PROCESS_STAB
obj_frob_section
obj_frob_file_before_adjust
obj_frob_symbol
obj_frob_file
obj_frob_file_after_relocs
Normally you do not have to write an emulation file. You can just use `te-generic.h'.
If you do write your own emulation file, it must include `obj-format.h'.
An emulation file will often define TE_EM
; this may then be used
in other files to change the output.
Relaxation is a generic term used when the size of some instruction or data depends upon the value of some symbol or other data.
GAS knows to relax a particular type of PC relative relocation using a table. You can also define arbitrarily complex forms of relaxation yourself.
If you do not define md_relax_frag
, and you do define
TC_GENERIC_RELAX_TABLE
, GAS will relax rs_machine_dependent
frags
based on the frag subtype and the displacement to some specified target
address. The basic idea is that several machines have different addressing
modes for instructions that can specify different ranges of values, with
successive modes able to access wider ranges, including the entirety of the
previous range. Smaller ranges are assumed to be more desirable (perhaps the
instruction requires one word instead of two or three); if this is not the
case, don't describe the smaller-range, inferior mode.
The fr_subtype
field of a frag is an index into a CPU-specific
relaxation table. That table entry indicates the range of values that can be
stored, the number of bytes that will have to be added to the frag to
accomodate the addressing mode, and the index of the next entry to examine if
the value to be stored is outside the range accessible by the current
addressing mode. The fr_symbol
field of the frag indicates what symbol
is to be accessed; the fr_offset
field is added in.
If the TC_PCREL_ADJUST
macro is defined, which currently should only happen
for the NS32k family, the TC_PCREL_ADJUST
macro is called on the frag to
compute an adjustment to be made to the displacement.
The value fitted by the relaxation code is always assumed to be a displacement
from the current frag. (More specifically, from fr_fix
bytes into the
frag.)
The end of the relaxation sequence is indicated by a "next" value of 0. This means that the first entry in the table can't be used.
For some configurations, the linker can do relaxing within a section of an object file. If call instructions of various sizes exist, the linker can determine which should be used in each instance, when a symbol's value is resolved. In order for the linker to avoid wasting space and having to insert no-op instructions, it must be able to expand or shrink the section contents while still preserving intra-section references and meeting alignment requirements.
For the i960 using b.out format, no expansion is done; instead, each `.align' directive causes extra space to be allocated, enough that when the linker is relaxing a section and removing unneeded space, it can discard some or all of this extra padding and cause the following data to be correctly aligned.
For the H8/300, I think the linker expands calls that can't reach, and doesn't worry about alignment issues; the cpu probably never needs any significant alignment beyond the instruction size.
The relaxation table type contains these fields:
long rlx_forward
long rlx_backward
rlx_length
rlx_more
The relaxation is done in relax_segment
in `write.c'. The
difference in the length fields between the original mode and the one finally
chosen by the relaxing code is taken as the size by which the current frag will
be increased in size. For example, if the initial relaxing mode has a length
of 2 bytes, and because of the size of the displacement, it gets upgraded to a
mode with a size of 6 bytes, it is assumed that the frag will grow by 4 bytes.
(The initial two bytes should have been part of the fixed portion of the frag,
since it is already known that they will be output.) This growth must be
effected by md_convert_frag
; it should increase the fr_fix
field
by the appropriate size, and fill in the appropriate bytes of the frag.
(Enough space for the maximum growth should have been allocated in the call to
frag_var as the second argument.)
If relocation records are needed, they should be emitted by
md_estimate_size_before_relax
. This function should examine the target
symbol of the supplied frag and correct the fr_subtype
of the frag if
needed. When this function is called, if the symbol has not yet been defined,
it will not become defined later; however, its value may still change if the
section it is in gets relaxed.
Usually, if the symbol is in the same section as the frag (given by the
sec argument), the narrowest likely relaxation mode is stored in
fr_subtype
, and that's that.
If the symbol is undefined, or in a different section (and therefore moveable
to an arbitrarily large distance), the largest available relaxation mode is
specified, fix_new
is called to produce the relocation record,
fr_fix
is increased to include the relocated field (remember, this
storage was allocated when frag_var
was called), and frag_wane
is
called to convert the frag to an rs_fill
frag with no variant part.
Sometimes changing addressing modes may also require rewriting the instruction.
It can be accessed via fr_opcode
or fr_fix
.
Sometimes fr_var
is increased instead, and frag_wane
is not
called. I'm not sure, but I think this is to keep fr_fix
referring to
an earlier byte, and fr_subtype
set to rs_machine_dependent
so
that md_convert_frag
will get called.
If using a simple table is not suitable, you may implement arbitrarily complex relaxation semantics yourself. For example, the MIPS backend uses this to emit different instruction sequences depending upon the size of the symbol being accessed.
When you assemble an instruction that may need relaxation, you should allocate
a frag using frag_var
or frag_variant
with a type of
rs_machine_dependent
. You should store some sort of information in the
fr_subtype
field so that you can figure out what to do with the frag
later.
When GAS reaches the end of the input file, it will look through the frags and work out their final sizes.
GAS will first call md_estimate_size_before_relax
on each
rs_machine_dependent
frag. This function must return an estimated size
for the frag.
GAS will then loop over the frags, calling md_relax_frag
on each
rs_machine_dependent
frag. This function should return the change in
size of the frag. GAS will keep looping over the frags until none of the frags
changes size.
Some compilers, including GCC, will sometimes emit switch tables specifying
16-bit .word
displacements to branch targets, and branch instructions
that load entries from that table to compute the target address. If this is
done on a 32-bit machine, there is a chance (at least with really large
functions) that the displacement will not fit in 16 bits. The assembler
handles this using a concept called broken words. This idea is well
named, since there is an implied promise that the 16-bit field will in fact
hold the specified displacement.
If broken word processing is enabled, and a situation like this is encountered,
the assembler will insert a jump instruction into the instruction stream, close
enough to be reached with the 16-bit displacement. This jump instruction will
transfer to the real desired target address. Thus, as long as the .word
value really is used as a displacement to compute an address to jump to, the
net effect will be correct (minus a very small efficiency cost). If
.word
directives with label differences for values are used for other
purposes, however, things may not work properly. For targets which use broken
words, the `-K' option will warn when a broken word is discovered.
The broken word code is turned off by the WORKING_DOT_WORD
macro. It
isn't needed if .word
emits a value large enough to contain an address
(or, more correctly, any possible difference between two addresses).
This section describes basic internal functions used by GAS.
vfprintf
, and a final newline.
An error indicated by as_bad
will result in a non-zero exit status when
the assembler has finished. Calling as_fatal
will result in immediate
termination of the assembler process.
valueT
value into printable
format, in case it's wider than modes that *printf
can handle. If the
type is narrow enough, a decimal number will be produced; otherwise, it will be
in hexadecimal. The value itself is not examined to make this determination.
The test suite is kind of lame for most processors. Often it only checks to
see if a couple of files can be assembled without the assembler reporting any
errors. For more complete testing, write a test which either examines the
assembler listing, or runs objdump
and examines its output. For the
latter, the TCL procedure run_dump_test
may come in handy. It takes the
base name of a file, and looks for `file.d'. This file should
contain as its initial lines a set of variable settings in `#' comments,
in the form:
#varname: value
The varname may be objdump
, nm
, or as
, in which case
it specifies the options to be passed to the specified programs. Exactly one
of objdump
or nm
must be specified, as that also specifies which
program to run after the assembler has finished. If varname is
source
, it specifies the name of the source file; otherwise,
`file.s' is used. If varname is name
, it specifies the
name of the test to be used in the pass
or fail
messages.
The non-commented parts of the file are interpreted as regular expressions, one
per line. Blank lines in the objdump
or nm
output are skipped,
as are blank lines in the .d
file; the other lines are tested to see if
the regular expression matches the program output. If it does not, the test
fails.
Note that this means the tests must be modified if the objdump
output
style is changed.
This document was generated on 7 April 1999 using the texi2html translator version 1.52.