IT박스

얼마나 많은 GCC 최적화 수준이 있습니까?

itboxs 2020. 9. 7. 07:56
반응형

얼마나 많은 GCC 최적화 수준이 있습니까?


얼마나 많은 GCC 최적화 수준이 있습니까?

gcc -O1, gcc -O2, gcc -O3 및 gcc -O4를 시도했습니다.

정말 많은 수를 사용하면 작동하지 않습니다.

그러나 나는 시도했다

gcc -O100

그리고 그것은 컴파일되었습니다.

얼마나 많은 최적화 수준이 있습니까?


현명하게 말하자면, gcc에 줄 수있는 유효한 -O 옵션이 8 가지 있지만 같은 의미를 갖는 옵션도 있습니다.

이 답변의 원래 버전에는 7 가지 옵션이 있다고 명시되어 있습니다. GCC는 이후 -Og총 8 개를 추가 했습니다.

로부터 매뉴얼 페이지

  • -O(과 동일 -O1)
  • -O0 (최적화 안 함, 최적화 수준이 지정되지 않은 경우 기본값)
  • -O1 (최소한 최적화)
  • -O2 (더 최적화)
  • -O3 (더욱 최적화)
  • -Ofast (표준 준수를 위반하는 지점까지 매우 적극적으로 최적화)
  • -Og (디버깅 경험을 최적화하십시오. -Og는 디버깅을 방해하지 않는 최적화를 가능하게합니다. 표준 편집-컴파일-디버그주기를위한 최적화 수준이어야하며, 빠른 컴파일과 좋은 디버깅 경험을 유지하면서 합리적인 수준의 최적화를 제공해야합니다. )
  • -Os(크기 최적화. 일반적으로 코드 크기를 늘리지 않는 -Os모든 -O2최적화를 활성화합니다 . 또한 코드 크기를 줄이기 위해 설계된 추가 최적화를 수행 -Os합니다. 다음 최적화 플래그를 비활성화합니다. -falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks -freorder-blocks-and-partition -fprefetch-loop-arrays -ftree-vect-loop-version)

@pauldoo가 언급했듯이 OS X에는 플랫폼 별 최적화가있을 수도 있습니다. -Oz


GCC 5.1의 소스 코드를 해석-O100 하여 man 페이지에 명확하지 않기 때문에 어떤 일이 발생하는지 살펴 보겠습니다 .

우리는 다음과 같이 결론을 내릴 것입니다.

  • 위의 아무것도 -O3까지이 INT_MAX과 동일 -O3하지만 쉽게, 향후 변경 될 수 있으므로에 의존하지 않습니다.
  • GCC 5.1은보다 큰 정수를 입력하면 정의되지 않은 동작을 실행 INT_MAX합니다.
  • 인수는 숫자 만 가질 수 있거나 정상적으로 실패합니다. 특히 다음과 같은 음의 정수는 제외됩니다.-O-1

하위 프로그램에 집중

먼저 GCC는 단지 프런트 엔드 있다는 사실을 cpp, as, cc1, collect2. 빠른는 ./XXX --help단지 말한다 collect2cc1걸릴 -O그래서 그들에 중점을하자.

과:

gcc -v -O100 main.c |& grep 100

제공합니다 :

COLLECT_GCC_OPTIONS='-O100' '-v' '-mtune=generic' '-march=x86-64'
/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/5.1.0/cc1 [[noise]] hello_world.c -O100 -o /tmp/ccetECB5.

그래서 -O모두에게 전달되었습니다 .cc1collect2

O common.opt

common.opt is a GCC specific CLI option description format described in the internals documentation and translated to C by opth-gen.awk and optc-gen.awk.

It contains the following interesting lines:

O
Common JoinedOrMissing Optimization
-O<number>  Set optimization level to <number>

Os
Common Optimization
Optimize for space rather than speed

Ofast
Common Optimization
Optimize for speed disregarding exact standards compliance

Og
Common Optimization
Optimize for debugging experience rather than speed or size

which specify all the O options. Note how -O<n> is in a separate family from the other Os, Ofast and Og.

When we build, this generates a options.h file that contains:

OPT_O = 139,                               /* -O */
OPT_Ofast = 140,                           /* -Ofast */
OPT_Og = 141,                              /* -Og */
OPT_Os = 142,                              /* -Os */

As a bonus, while we are grepping for \bO\n inside common.opt we notice the lines:

-optimize
Common Alias(O)

which teaches us that --optimize (double dash because it starts with a dash -optimize on the .opt file) is an undocumented alias for -O which can be used as --optimize=3!

Where OPT_O is used

Now we grep:

git grep -E '\bOPT_O\b'

which points us to two files:

Let's first track down opts.c

opts.c:default_options_optimization

All opts.c usages happen inside: default_options_optimization.

We grep backtrack to see who calls this function, and we see that the only code path is:

  • main.c:main
  • toplev.c:toplev::main
  • opts-global.c:decode_opts
  • opts.c:default_options_optimization

and main.c is the entry point of cc1. Good!

The first part of this function:

  • does integral_argument which calls atoi on the string corresponding to OPT_O to parse the input argument
  • stores the value inside opts->x_optimize where opts is a struct gcc_opts.

struct gcc_opts

After grepping in vain, we notice that this struct is also generated at options.h:

struct gcc_options {
    int x_optimize;
    [...]
}

where x_optimize comes from the lines:

Variable
int optimize

present in common.opt, and that options.c:

struct gcc_options global_options;

so we guess that this is what contains the entire configuration global state, and int x_optimize is the optimization value.

255 is an internal maximum

in opts.c:integral_argument, atoi is applied to the input argument, so INT_MAX is an upper bound. And if you put anything larger, it seem that GCC runs C undefined behaviour. Ouch?

integral_argument also thinly wraps atoi and rejects the argument if any character is not a digit. So negative values fail gracefully.

Back to opts.c:default_options_optimization, we see the line:

if ((unsigned int) opts->x_optimize > 255)
  opts->x_optimize = 255;

so that the optimization level is truncated to 255. While reading opth-gen.awk I had come across:

# All of the optimization switches gathered together so they can be saved and restored.
# This will allow attribute((cold)) to turn on space optimization.

and on the generated options.h:

struct GTY(()) cl_optimization
{
  unsigned char x_optimize;

which explains why the truncation: the options must also be forwarded to cl_optimization, which uses a char to save space. So 255 is an internal maximum actually.

opts.c:maybe_default_options

Back to opts.c:default_options_optimization, we come across maybe_default_options which sounds interesting. We enter it, and then maybe_default_option where we reach a big switch:

switch (default_opt->levels)
  {

  [...]

  case OPT_LEVELS_1_PLUS:
    enabled = (level >= 1);
    break;

  [...]

  case OPT_LEVELS_3_PLUS:
    enabled = (level >= 3);
    break;

There are no >= 4 checks, which indicates that 3 is the largest possible.

Then we search for the definition of OPT_LEVELS_3_PLUS in common-target.h:

enum opt_levels
{
  OPT_LEVELS_NONE, /* No levels (mark end of array).  */
  OPT_LEVELS_ALL, /* All levels (used by targets to disable options
                     enabled in target-independent code).  */
  OPT_LEVELS_0_ONLY, /* -O0 only.  */
  OPT_LEVELS_1_PLUS, /* -O1 and above, including -Os and -Og.  */
  OPT_LEVELS_1_PLUS_SPEED_ONLY, /* -O1 and above, but not -Os or -Og.  */
  OPT_LEVELS_1_PLUS_NOT_DEBUG, /* -O1 and above, but not -Og.  */
  OPT_LEVELS_2_PLUS, /* -O2 and above, including -Os.  */
  OPT_LEVELS_2_PLUS_SPEED_ONLY, /* -O2 and above, but not -Os or -Og.  */
  OPT_LEVELS_3_PLUS, /* -O3 and above.  */
  OPT_LEVELS_3_PLUS_AND_SIZE, /* -O3 and above and -Os.  */
  OPT_LEVELS_SIZE, /* -Os only.  */
  OPT_LEVELS_FAST /* -Ofast only.  */
};

Ha! This is a strong indicator that there are only 3 levels.

opts.c:default_options_table

opt_levels is so interesting, that we grep OPT_LEVELS_3_PLUS, and come across opts.c:default_options_table:

static const struct default_options default_options_table[] = {
    /* -O1 optimizations.  */
    { OPT_LEVELS_1_PLUS, OPT_fdefer_pop, NULL, 1 },
    [...]

    /* -O3 optimizations.  */
    { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
    [...]
}

so this is where the -On to specific optimization mapping mentioned in the docs is encoded. Nice!

Assure that there are no more uses for x_optimize

The main usage of x_optimize was to set other specific optimization options like -fdefer_pop as documented on the man page. Are there any more?

We grep, and find a few more. The number is small, and upon manual inspection we see that every usage only does at most a x_optimize >= 3, so our conclusion holds.

lto-wrapper.c

Now we go for the second occurrence of OPT_O, which was in lto-wrapper.c.

LTO means Link Time Optimization, which as the name suggests is going to need an -O option, and will be linked to collec2 (which is basically a linker).

In fact, the first line of lto-wrapper.c says:

/* Wrapper to call lto.  Used by collect2 and the linker plugin.

In this file, the OPT_O occurrences seems to only normalize the value of O to pass it forward, so we should be fine.


Seven distinct levels:

  • -O0 (default): No optimization.

  • -O or -O1 (same thing): Optimize, but do not spend too much time.

  • -O2: Optimize more aggressively

  • -O3: Optimize most aggressively

  • -Ofast: Equivalent to -O3 -ffast-math. -ffast-math triggers non-standards-compliant floating point optimizations. This allows the compiler to pretend that floating point numbers are infinitely precise, and that algebra on them follows the standard rules of real number algebra. It also tells the compiler to tell the hardware to flush denormals to zero and treat denormals as zero, at least on some processors, including x86 and x86-64. Denormals trigger a slow path on many FPUs, and so treating them as zero (which does not trigger the slow path) can be a big performance win.

  • -Os: Optimize for code size. This can actually improve speed in some cases, due to better I-cache behavior.

  • -Og: Optimize, but do not interfere with debugging. This enables non-embarrassing performance for debug builds and is intended to replace -O0 for debug builds.

There are also other options that are not enabled by any of these, and must be enabled separately. It is also possible to use an optimization option, but disable specific flags enabled by this optimization.

For more information, see GCC website.


Four (0-3): See the GCC 4.4.2 manual. Anything higher is just -O3, but at some point you will overflow the variable size limit.

참고URL : https://stackoverflow.com/questions/1778538/how-many-gcc-optimization-levels-are-there

반응형