American fuzzy lop (fuzzer)
American fuzzy lop (AFL), stylized in lowercase as american fuzzy lop, is a free software fuzzer that employs genetic algorithms in order to efficiently increase code coverage of the test cases. So far it has detected dozens of significant software bugs in major free software projects, including X.Org Server,[2] PHP,[3] OpenSSL,[4][5] pngcrush, bash,[6] Firefox,[7] BIND,[8][9] Qt,[10] and SQLite.[11]
Developer(s) | Michał Zalewski |
---|---|
Initial release | 12 November 2013 |
Stable release | 2.57b
/ 30 June 2020[1] |
Repository | |
Written in | C, assembly |
Operating system | Cross-platform |
Type | Fuzzer |
License | Apache License 2.0 |
Website | lcamtuf |
For many years after its release, AFL has been considered a "state of the art" fuzzer.[12] AFL is considered "a de-facto standard for fuzzing",[13] and the release of AFL contributed significantly to the development of fuzzing as a research area.[14] AFL is widely used in academia; academic fuzzers are often forks of AFL, and AFL is commonly used as a baseline to evaluate new techniques.[15][16]
The source code of American fuzzy lop is published on GitHub. Its name is a reference to a breed of rabbit, the American Fuzzy Lop.
Overview
AFL requires the user to provide a sample command that runs the tested application and at least one small example input. The input can be fed to the tested program either via standard input or as an input file specified in the process command line. Fuzzing networked programs is currently not directly supported, although in some cases there are feasible solutions to this problem.[17] For example, in case of an audio player, American fuzzy lop can be instructed to open a short sound file with it. Then, the fuzzer attempts to actually execute the specified command and if that succeeds, it tries to reduce the input file to the smallest one that triggers the same behavior.
After this initial phase, AFL begins the actual process of fuzzing by applying various modifications to the input file. When the tested program crashes or hangs, this usually implies the discovery of a new bug, possibly a security vulnerability. In this case, the modified input file is saved for further user inspection.
In order to maximize the fuzzing performance, American fuzzy lop expects the tested program to be compiled with the aid of a utility program that instruments the code with helper functions which track control flow. This allows the fuzzer to detect when the target's behavior changes in response to the input. In cases when this is not possible, black-box testing is supported as well.
Fuzzing algorithm
Fuzzers attempt to find unexpected behaviors (i.e., bugs) in a target program by repeatedly executing the program on various inputs. As described above, AFL is a gray-box fuzzer, meaning it injects instrumentation to measure code coverage into the target program at compile time and uses the coverage metric to direct the generation of new inputs. AFL's fuzzing algorithm has influenced many subsequent gray-box fuzzers.[19][20]
The inputs to AFL are an instrumented target program (the system under test) and corpus, that is, a collection of inputs to the target. Inputs are also known as test cases. The algorithm maintains a queue of inputs, which is initialized to the input corpus. The overall algorithm works as follows:[21]
- Load the next input from the queue
- Minimize the test case
- Mutate the test case. If any mutant results in additional code coverage, add it to the queue. If the mutant results in a crash or hang, save it to disk for later inspection.
- Go to step 1
Mutation
To generate new inputs, AFL applies various mutations to existing inputs.[22] These mutations are mostly agnostic to the input format of the target program; they generally treat the input as simple blob of binary data.
At first, AFL applies a deterministic sequence of mutations to each input. These are applied at various offsets in the input. They include:[23][24]
- Flipping (i.e., negating or inverting) 1-32 bits
- Incrementing and decrementing 8-, 16-, and 32-bit integers, in both little- and big-endian encodings
- Overwriting parts of the input with "approximately two dozen 'interesting' values", including zero and maximum and minimum signed and unsigned integers of various widths, again in both little- and big-endian encodings.
- Replacing parts of the input with data drawn from a "dictionary" of user-specified or auto-detected tokens (e.g., magic bytes, or keywords in a text-based format)[25][22][26]
After applying all available deterministic mutations, AFL moves on to havoc, a stage where between 2 and 128 mutations are applied in a row. These mutations are any of:[22]
- The deterministic mutations described above
- Overwriting bytes with random values
- Operations over multi-byte "blocks":
- Deleting blocks
- Duplicating blocks
- Setting each byte in a block to a single value
If AFL cycles through the entire queue without generating any input that achieves new code coverage, it begins splicing. Splicing takes two inputs from the queue, truncates them at arbitrary positions, concatenates them together, and applies the havoc stage to the result.
Measuring coverage
AFL pioneered the use of binned hitcounts for measuring code coverage.[27] The author claims that this technique mitigates path explosion.[28][29]
Conceptually, AFL counts the number of times a given execution of the target traverses each edge in the target's control-flow graph; the documentation refers to these edges as tuples and the counts as hitcounts. At the end of the execution, the hitcounts are binned or bucketed into the following eight buckets: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, and 128 and greater. AFL maintains a global set of (tuple, binned count) pairs that have been produced by any execution thus far. An input is considered "interesting" and is added to the queue if it produces a (tuple, binned count) pair that is not yet in the global set.
In practice, the hitcounts are collected and processed using an efficient but lossy scheme. The compile-time instrumentation injects code that is conceptually similar to the following at each branch in the control-flow graph of the target program:[30]
cur_location = <COMPILE_TIME_RANDOM>;
shared_mem[cur_location ^ prev_location]++;
prev_location = cur_location >> 1;
where <COMPILE_TIME_RANDOM>
is a random integer and shared_mem
is a 64 kilobyte region of memory shared between the fuzzer and the target.
This representation is more fine-grained (distinguishes between more executions) than simple block or statement coverage, but still allows for a linear-time "interestingness" test.
Minimization
On the assumption that smaller inputs take less time to execute, AFL attempts to minimize or trim the test cases in the queue.[22][31] Trimming works by removing blocks from the input; if the trimmed input still results in the same coverage (see #Measuring coverage), then the original input is discarded and the trimmed input is saved in the queue.
Features
Performance features
One of the challenges American fuzzy lop had to solve involved an efficient spawning of hundreds of processes per second. Apart from the original engine that spawned every process from scratch, American fuzzy lop offers the default engine that relies heavily on the fork
system call.[33][27] This can further be sped up by leveraging LLVM deferred fork server mode or the similar persistent mode, but this comes at the cost of having to modify the tested program.[34] Also, American fuzzy lop supports fuzzing the same program over the network.
User interface
American fuzzy lop features a colorful command line interface that displays real-time statistics about the fuzzing process. Various settings may be triggered by either command line options or environment variables. Apart from that, programs may read runtime statistics from files in a machine-readable format.
Utility programs
In addition to afl-fuzz
and tools that can be used for binary instrumentation, American fuzzy lop features utility programs meant for monitoring of the fuzzing process. Apart from that, there is afl-cmin
and afl-tmin
, which can be used for test case and test corpus minimization. This can be useful when the test cases generated by afl-fuzz
would be used by other fuzzers.
Forks
AFL has been forked many times in order to examine new fuzzing techniques, or to apply fuzzing to different kinds of programs. A few notable forks include:
- AFL++
- MOPT-AFL[35]
- AFLFast[36]
- AFLSmart[37]
- AFLGo[38]
- SymCC-AFL[39]
- WinAFL, "a fork of AFL for fuzzing Windows binaries"[40]
AFL++
Initial release | 2.52c / 5 June 2019 |
---|---|
Stable release | 4.00c
/ 26 January 2022[41] |
Repository | |
Website | aflplus |
AFL++ (AFLplusplus)[42] is a community-maintained fork of AFL created due to the relative inactivity of Google's upstream AFL development since September 2017. It includes new features and speedups.[43]
Google's OSS-Fuzz initiative, which provides free fuzzing services to open source software, replaced its AFL option with AFL++ in January 2021.[44][45]
References
Notes
- "Releases - google/AFL". Retrieved 19 January 2021 – via GitHub.
- "Advisory-2015-03-17". x.org.
- "NVD - Detail". nist.gov.
- "NVD - Detail". nist.gov.
- "NVD - Detail". nist.gov.
- "CVE - CVE-2014-6278". mitre.org.
- "CVE - CVE-2014-8637". mitre.org.
- "How to fuzz a server with American Fuzzy Lop". Fastly. 21 July 2015.
- "CVE - CVE-2015-5477". mitre.org.
- "[Announce] Qt Project Security Advisory - Multiple Vulnerabilities in Qt Image Format Handling". qt-project.org. 13 April 2015.
- "How SQLite Is Tested # 4.1.1. SQL Fuzz Using The American Fuzzy Lop Fuzzer". sqlite.org.
- Poncelet, Clement; Sagonas, Konstantinos; Tsiftes, Nicolas (2023-01-05). "So Many Fuzzers, So Little Time✱". Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. ASE '22. New York, NY, USA: Association for Computing Machinery. pp. 1–12. doi:10.1145/3551349.3556946. ISBN 978-1-4503-9475-8. S2CID 253456740.
- Fioraldi et al. 2023, p. 2.
- Fioraldi, Andrea; Maier, Dominik Christian; Zhang, Dongjia; Balzarotti, Davide (2022-11-07). "LibAFL". Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. CCS '22. New York, NY, USA: Association for Computing Machinery. pp. 1051–1065. doi:10.1145/3548606.3560602. ISBN 978-1-4503-9450-5. S2CID 253410747.. "The release of AFL marked an important milestone in the area of software security testing, revitalizing fuzzing as a major research topic".
- Hazimeh, Ahmad; Herrera, Adrian; Payer, Mathias (2021-06-15). "Magma: A Ground-Truth Fuzzing Benchmark". Proceedings of the ACM on Measurement and Analysis of Computing Systems. 4 (3): 49:1–49:29. arXiv:2009.01120. doi:10.1145/3428334. S2CID 227230949.
- Metzman et al. 2021.
- Technion. "Fuzzing nginx - Hunting vulnerabilities with afl-fuzz". lolware.net.
- Zalewski, Michał (2015-02-27). "Logo for afl-fuzz". afl-users | Google Groups. Retrieved 2019-07-25.
- Fioraldi et al. 2023.
- Chen, Peng; Chen, Hao (May 2018). "Angora: Efficient Fuzzing by Principled Search". 2018 IEEE Symposium on Security and Privacy (SP). pp. 711–725. doi:10.1109/SP.2018.00046. ISBN 978-1-5386-4353-2. S2CID 3729194.
- "Motivation behind AFL — AFL 2.53b documentation". afl-1.readthedocs.io. Retrieved 2023-02-26.
- Fioraldi et al. 2023, p. 6.
- "Binary fuzzing strategies: what works, what doesn't". lcamtuf.blogspot.com. 8 August 2014.
- "AFL User Guide — AFL 2.53b documentation". afl-1.readthedocs.io. Retrieved 2023-02-26.
- "Finding bugs in SQLite, the easy way". lcamtuf.blogspot.com. 14 April 2015.
- Manès, Valentin J.M.; Han, HyungSeok; Han, Choongwoo; Cha, Sang Kil; Egele, Manuel; Schwartz, Edward J.; Woo, Maverick (November 2021). "The Art, Science, and Engineering of Fuzzing: A Survey". IEEE Transactions on Software Engineering. 47 (11): 2312–2331. arXiv:1812.00140. doi:10.1109/TSE.2019.2946563. ISSN 1939-3520. S2CID 102351047.
- Fioraldi et al. 2023, p. 5.
- "Technical "whitepaper" for afl-fuzz".
- "More about AFL — AFL 2.53b documentation". afl-1.readthedocs.io. Retrieved 2023-02-27. "This approach allows for a very fine-grained and long-term exploration of program state while not having to perform any computationally intensive and fragile global comparisons of complex execution traces, and while avoiding the scourge of path explosion."
- "More about AFL — AFL 2.53b documentation". afl-1.readthedocs.io. Retrieved 2023-02-27.
- "More about AFL — AFL 2.53b documentation". afl-1.readthedocs.io. Retrieved 2023-02-27.
- "More about AFL — AFL 2.53b documentation". afl-1.readthedocs.io. Retrieved 2023-02-27.
- "Fuzzing random programs without execve()". lcamtuf.blogspot.com. 14 October 2014.
- "New in AFL: persistent mode". lcamtuf's blog. 11 June 2015.
- Lyu, Chenyang; Ji, Shouling; Zhang, Chao; Li, Yuwei; Lee, Wei-Han; Song, Yu; Beyah, Raheem (2019). {MOPT}: Optimized Mutation Scheduling for Fuzzers. pp. 1949–1966. ISBN 978-1-939133-06-9.
- Böhme, Marcel; Pham, Van-Thuan; Roychoudhury, Abhik (May 2019). "Coverage-Based Greybox Fuzzing as Markov Chain". IEEE Transactions on Software Engineering. 45 (5): 489–506. doi:10.1109/TSE.2017.2785841. ISSN 1939-3520.
- Pham, Van-Thuan; Böhme, Marcel; Santosa, Andrew E.; Căciulescu, Alexandru Răzvan; Roychoudhury, Abhik (September 2021). "Smart Greybox Fuzzing". IEEE Transactions on Software Engineering. 47 (9): 1980–1997. doi:10.1109/TSE.2019.2941681. ISSN 1939-3520. S2CID 53721813.
- Böhme, Marcel; Pham, Van-Thuan; Nguyen, Manh-Dung; Roychoudhury, Abhik (2017-10-30). "Directed Greybox Fuzzing". Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. CCS '17. New York, NY, USA: Association for Computing Machinery. pp. 2329–2344. doi:10.1145/3133956.3134020. ISBN 978-1-4503-4946-8. S2CID 29430742.
- Poeplau, Sebastian; Francillon, Aurélien (2020). Symbolic execution with {SymCC}: Don't interpret, compile!. pp. 181–198. ISBN 978-1-939133-17-5.
- WinAFL, Google Project Zero, 2023-02-23, retrieved 2023-02-26
- "Releases - AFLplusplus/AFLplusplus". Retrieved 19 June 2022 – via GitHub.
- Fioraldi, Andrea; Maier, Dominik; Eißfeldt, Heiko; Heuse, Marc (August 2020). AFL++: Combining incremental steps of fuzzing research. 14th USENIX Workshop on Offensive Technologies (WOOT 20).
- "The AFL++ fuzzing framework". AFLplusplus.
- metzman, jonathan. "[afl++] Use AFL++ instead of AFL for fuzzing. by jonathanmetzman · Pull Request #5046 · google/oss-fuzz". GitHub.
- Metzman et al. 2021, p. 1394.
Sources
- Fioraldi, Andrea; Mantovani, Alessandro; Maier, Dominik; Balzarotti, Davide (2023-01-20). "Dissecting American Fuzzy Lop – A FuzzBench Evaluation". ACM Transactions on Software Engineering and Methodology. 32 (2): 1–26. doi:10.1145/3580596. ISSN 1049-331X. S2CID 247088398.
- Metzman, Jonathan; Szekeres, László; Simon, Laurent; Sprabery, Read; Arya, Abhishek (2021-08-18). "FuzzBench: An open fuzzer benchmarking platform and service". Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2021. New York, NY, USA: Association for Computing Machinery. pp. 1393–1403. doi:10.1145/3468264.3473932. ISBN 978-1-4503-8562-6. S2CID 237205274.
Further reading
- The AFL technical "whitepaper", written by the original author Michał Zalewski
- Multi-System and Internet Security Cookbook, Hors-Serie No. 11 "Outils de sécurité", p. 36, "American Fuzzy Lop", Kevin Denis, June 2015. Archived 2016-05-06 at the Wayback Machine
- "Fuzz and strings (lwn.net)"
- "Fuzzing (on) FreeBSD - (Mostly) automated bug discovery with security/afl" - a presentation at FOSDEM
- "Testing with two failure seeking missiles: fuzzing and property based testing" - a presentation at EuroPython 2015.
- "The Fuzzing Project"
- "Fuzzing Code with AFL", Peter Gutmann, ;login, Vol. 41, No. 2, Summer 2016,
External links
- Official AFL documentation
- The blog of Michał Zalewski, author of AFL. The blog contains several posts diving in to the technical details of AFL.