Ben Niu

Ben Niu

Redmond, Washington, United States
699 followers 500+ connections

About

I have extensive experience in computer system (OS kernel, compiler, hypervisor, etc.)…

Experience

  • Microsoft Graphic

    Microsoft

    Redmond, WA

  • -

    Redmond, WA

  • -

    Redmond, WA

  • -

    Redmond, WA

  • -

  • -

    One Microsoft Way, Redmond, WA 98052

  • -

    1440 McCarthy Blvd, Milpitas, CA 95035

  • -

Education

Volunteer Experience

  • Student Program Committee Member

    IEEE Security and Privacy 2016

    - 3 months

    Science and Technology

    I volunteered to serve on the Student Program Committee (SPC) at IEEE Security and Privacy 2016 (Oakland'16), which is a premier conference in the area of information security. As a SPC member, I reviewed technical papers submitted to Oakland'16.

  • Shadow Program Committee Member

    AsiaCCS 2017

    - 2 months

    Science and Technology

    Reviewing submitted papers and reviews.

Publications

  • HyperFuzzer: An Efficient Hybrid Fuzzer for Virtual CPUs

    ACM Conference on Computer and Communications Security (CCS), 2021

    In this cloud computing era, the security of hypervisors is critical to the overall security of the cloud. In particular, the security of CPU virtualization in hypervisors is paramount because it is implemented in the most privileged CPU mode. Blackbox and graybox fuzzing are limited to finding shallow virtual CPU bugs due to its huge search space. Whitebox fuzzing can be used for systematic analysis of CPU virtualization, but existing implementations rely on slow hardware emulators to…

    In this cloud computing era, the security of hypervisors is critical to the overall security of the cloud. In particular, the security of CPU virtualization in hypervisors is paramount because it is implemented in the most privileged CPU mode. Blackbox and graybox fuzzing are limited to finding shallow virtual CPU bugs due to its huge search space. Whitebox fuzzing can be used for systematic analysis of CPU virtualization, but existing implementations rely on slow hardware emulators to enable dynamic symbolic execution.

    In this paper, we present HyperFuzzer, the first efficient hybrid fuzzer for virtual CPUs. Our key observation is that a virtual CPU’s execution is determined by the VM state. Based on this observation, we design a new fuzzing setup that uses complete VM states as fuzzing inputs, and a new fuzzing technique we call Nimble Symbolic Execution to enable dynamic symbolic execution for CPU virtualization running on bare metal. Specifically, it uses the hardware to log the control flow efficiently, and then reconstructs an approximate execution trace from only the control flow and the fuzzing input. The reconstructed execution trace is surprisingly sufficient for precise dynamic symbolic execution of virtual CPUs.

    We have built a prototype of HyperFuzzer based on Intel Processor Trace for Microsoft Hyper-V. Our experimental results show that HyperFuzzer can run thousands of tests per second, which is 3 orders of magnitude faster than using a hardware emulator. When compared with a baseline using full (control+data) execution traces, HyperFuzzer can still generate 96.8% of the test inputs generated by the baseline. HyperFuzzer has found 11 previously unknown virtual CPU bugs in the Hyper-V hypervisor, and all of them were confirmed and fixed.

    Other authors
    See publication
  • Reverse Debugging of Kernel Failures in Deployed Systems

    Proceedings of the 2020 USENIX Annual Technical Conference (ATC)

    Post-mortem diagnosis of kernel failures is crucial for operating system vendors because kernel failures impact the reliability and security of the whole system. However, debugging kernel failures in deployed systems remains a challenge because developers have to speculate the conditions leading to the failure based on limited information such as memory dumps. In this paper, we present Kernel REPT, the first practical reverse debugging solution for kernel failures that is highly efficient…

    Post-mortem diagnosis of kernel failures is crucial for operating system vendors because kernel failures impact the reliability and security of the whole system. However, debugging kernel failures in deployed systems remains a challenge because developers have to speculate the conditions leading to the failure based on limited information such as memory dumps. In this paper, we present Kernel REPT, the first practical reverse debugging solution for kernel failures that is highly efficient, imposes small memory footprint and requires no extra software layer. To meet this goal, Kernel REPT employs efficient hardware tracing to record the kernel's control flow on each processor, recognizes the control flow of each software thread based on the context switch history, and recovers its data flow by emulating machine instructions and hardware events such as interrupts and exceptions. We design, implement, and deploy Kernel REPT on Microsoft Windows. We show that developers can use Kernel REPT to do interactive reverse debugging and find the root cause of real-world kernel failures. Kernel REPT also enables automatic root-cause analysis for certain kernel failures that were hard to debug even manually. Furthermore, Kernel REPT can proactively identify kernel bugs by checking the reconstructed execution history against a set of predetermined invariants.

    Other authors
    See publication
  • REPT: Reverse Debugging of Failures in Deployed Software

    Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018

    Debugging software failures in deployed systems is important because they impact real users and customers. However, debugging such failures is notoriously hard in practice because developers have to rely on limited information such as memory dumps. The execution history is usually unavailable because high-fidelity program tracing is not affordable in deployed systems.

    In this paper, we present REPT, a practical system that enables reverse debugging of software failures in deployed…

    Debugging software failures in deployed systems is important because they impact real users and customers. However, debugging such failures is notoriously hard in practice because developers have to rely on limited information such as memory dumps. The execution history is usually unavailable because high-fidelity program tracing is not affordable in deployed systems.

    In this paper, we present REPT, a practical system that enables reverse debugging of software failures in deployed systems. REPT reconstructs the execution history with high fidelity by combining online lightweight hardware tracing of a program's control flow with offline binary analysis that recovers its data flow. It is seemingly impossible to recover data values thousands of instructions before the failure due to information loss and concurrent execution. REPT tackles these challenges by constructing a partial execution order based on timestamps logged by hardware and iteratively performing forward and backward execution with error correction.

    We design and implement REPT, deploy it on Microsoft Windows, and integrate it into Windows Debugger. We evaluate REPT on 16 real-world bugs and show that it can recover data values accurately (92% on average) and efficiently (less than 20 seconds) for these bugs. We also show that it enables effective reverse debugging for 14 bugs.

    Other authors
    See publication
  • Lazy Diagnosis of In-Production Concurrency Bugs

    SOSP 2017

    Diagnosing concurrency bugs—the process of understanding the root causes of failures—is hard. Developers depend on reproducing concurrency bugs to diagnose them. Traditionally, systems that attempt to reproduce concurrency bugs record fine-grained thread schedules of events (e.g., shared memory accesses) that lead to failures. Recording schedules incurs high runtime performance overhead and scales poorly, making existing techniques unsuitable in production.

    In this paper, we formulate…

    Diagnosing concurrency bugs—the process of understanding the root causes of failures—is hard. Developers depend on reproducing concurrency bugs to diagnose them. Traditionally, systems that attempt to reproduce concurrency bugs record fine-grained thread schedules of events (e.g., shared memory accesses) that lead to failures. Recording schedules incurs high runtime performance overhead and scales poorly, making existing techniques unsuitable in production.

    In this paper, we formulate the coarse interleaving hypothesis, which states that events leading to concurrency bugs are coarsely interleaved. Therefore, a fine-grained and expensive recording is unnecessary for diagnosing concurrency bugs. We test the coarse interleaving hypothesis by studying 54 bugs in 13 systems and find that it holds in all cases. In particular, the time elapsed between events leading to concurrency bugs is on average 5 orders of magnitude greater than what is used today in fine-grained recording.

    Using the coarse interleaving hypothesis, we develop Lazy Diagnosis, a hybrid dynamic-static interprocedural pointer and type analysis to diagnose the root causes of concurrency bugs. Our Lazy Diagnosis prototype, SNORLAX, relies on commodity hardware to track thread interleavings at a coarse granularity. SNORLAX does not require any source code changes and can diagnose complex concurrency bugs in real large-scale systems (MySQL, httpd, memcached, etc.) with full accuracy and an average runtime performance overhead of below 1%. Broadly, we believe that our findings can be used to build more efficient in-production bug detection and record/replay techniques.

    Other authors
    See publication
  • Per-Input Control-Flow Integrity

    ACM CCS 2015 (Acceptance Rate: 19.8%)

    Control-Flow Integrity (CFI) is an effective approach to mitigating control-flow hijacking attacks. Conventional CFI techniques statically extract a control-flow graph (CFG) from a program and instrument the program to enforce that CFG. The statically generated CFG includes all edges for all possible inputs; however, for a concrete input, the CFG may include many unnecessary edges.

    We present Per-Input Control-Flow Integrity (PICFI or πCFI), which is a new CFI technique that can enforce a…

    Control-Flow Integrity (CFI) is an effective approach to mitigating control-flow hijacking attacks. Conventional CFI techniques statically extract a control-flow graph (CFG) from a program and instrument the program to enforce that CFG. The statically generated CFG includes all edges for all possible inputs; however, for a concrete input, the CFG may include many unnecessary edges.

    We present Per-Input Control-Flow Integrity (PICFI or πCFI), which is a new CFI technique that can enforce a CFG computed for each concrete input. πCFI starts executing a program with the empty CFG and lets the program itself lazily add edges to the enforced CFG if such edges are required for the concrete input. The edge addition is performed by πCFI-inserted instrumentation code. To prevent attackers from arbitrarily adding edges, πCFI uses a statically computed all-input CFG to constrain what edges can be added at runtime. To minimize performance overhead, operations for adding edges are designed to be idempotent, so they can be patched to no-ops after their first execution. As our evaluation shows, πCFI provides better security than conventional fine-grained CFI with comparable performance overhead.

    Other authors
    See publication
  • RockJIT: Securing Just-In-Time Compilation Using Modular Control-Flow Integrity

    ACM CCS 2014 (Acceptance Rate: 19%)

    Managed languages such as JavaScript are popular. For performance, modern implementations of managed languages adopt Just- In-Time (JIT) compilation. The danger to a JIT compiler is that an attacker can often control the input program and use it to trigger a vulnerability in the JIT compiler to launch code injection or JIT spraying attacks. In this paper, we propose a general approach called RockJIT to securing JIT compilers through Control-Flow Integrity (CFI). RockJIT builds a fine-grained…

    Managed languages such as JavaScript are popular. For performance, modern implementations of managed languages adopt Just- In-Time (JIT) compilation. The danger to a JIT compiler is that an attacker can often control the input program and use it to trigger a vulnerability in the JIT compiler to launch code injection or JIT spraying attacks. In this paper, we propose a general approach called RockJIT to securing JIT compilers through Control-Flow Integrity (CFI). RockJIT builds a fine-grained control-flow graph from the source code of the JIT compiler and dynamically up- dates the control-flow policy when new code is generated on the fly. Through evaluation on Google’s V8 JavaScript engine, we demonstrate that RockJIT can enforce strong security on a JIT compiler, while incurring only modest performance overhead (14.6% on V8) and requiring a small amount of changes to V8’s code. Key contributions of RockJIT are a general architecture for securing JIT compilers and a method for generating fine-grained control-flow graphs from C++ code.

    Other authors
    See publication
  • Modular Control-Flow Integrity

    ACM PLDI 2014 (Acceptance Rate: 18%)

    Control-Flow Integrity (CFI) is a software-hardening technique. It inlines checks into a program so that its execution always follows a predetermined Control-Flow Graph (CFG). As a result, CFI is effective at preventing control-flow hijacking attacks. However, past fine-grained CFI implementations do not support separate compilation, which hinders its adoption.
    We present Modular Control-Flow Integrity (MCFI), a new CFI technique that supports separate compilation. MCFI allows modules to be…

    Control-Flow Integrity (CFI) is a software-hardening technique. It inlines checks into a program so that its execution always follows a predetermined Control-Flow Graph (CFG). As a result, CFI is effective at preventing control-flow hijacking attacks. However, past fine-grained CFI implementations do not support separate compilation, which hinders its adoption.
    We present Modular Control-Flow Integrity (MCFI), a new CFI technique that supports separate compilation. MCFI allows modules to be independently instrumented and linked statically or dynamically. The combined module enforces a CFG that is a combination of the individual modules’ CFGs. One challenge in support- ing dynamic linking in multithreaded code is how to ensure a safe transition from the old CFG to the new CFG when libraries are dynamically linked. The key technique we use is to have the CFG rep- resented in a runtime data structure and have reads and updates of the data structure wrapped in transactions to ensure thread safety. Our evaluation on SPECCPU2006 benchmarks shows that MCFI supports separate compilation, incurs low overhead of around 5%, and enhances security.

    Other authors
    See publication
  • Monitor Integrity Protection with Space Efficiency and Separate Compilation

    ACM CCS 2013 (Acceptance Rate: 20%)

    Low-level inlined reference monitors weave monitor code into a program for security. To ensure that monitor code cannot be by- passed by branching instructions, some form of control-flow integrity must be guaranteed. Past approaches to protecting monitor code either have high space overhead or do not support separate compilation. We present Monitor Integrity Protection (MIP), a form of coarse-grained control-flow integrity. The key idea of MIP is to arrange instructions in variable-sized chunks…

    Low-level inlined reference monitors weave monitor code into a program for security. To ensure that monitor code cannot be by- passed by branching instructions, some form of control-flow integrity must be guaranteed. Past approaches to protecting monitor code either have high space overhead or do not support separate compilation. We present Monitor Integrity Protection (MIP), a form of coarse-grained control-flow integrity. The key idea of MIP is to arrange instructions in variable-sized chunks and dynamically restrict indirect branches to target only chunk beginnings. We show that this simple idea is effective in protecting monitor code integrity, enjoys low space and execution-time overhead, supports separate compilation, and is largely compatible with an existing compiler toolchain. We also show that MIP enables a separate verifier that completely disassembles a binary and verifies its security.
    MIP is designed to support inlined reference monitors. As a case study, we have implemented MIP-based Software-based Fault Isolation (SFI) on both x86-32 and x86-64. The evaluation shows that MIP-based SFI has competitive performance with other SFI implementations, while enjoying low space overhead.

    See publication
  • Efficient User-Space Information Flow Control

    ACM AsiaCCS 2013 (Acceptance Rate: 16%)

    The model of Decentralized Information Flow Control (DIFC) is effective at improving application security and can support rich confidentiality and integrity policies. We describe the design and implementation of duPro, an efficient user-space information flow control framework. duPro adopts Software-based Fault Isolation (SFI) to isolate protection domains within the same process. It controls the end-to-end information flow at the granularity of SFI domains. Being a user-space framework, duPro…

    The model of Decentralized Information Flow Control (DIFC) is effective at improving application security and can support rich confidentiality and integrity policies. We describe the design and implementation of duPro, an efficient user-space information flow control framework. duPro adopts Software-based Fault Isolation (SFI) to isolate protection domains within the same process. It controls the end-to-end information flow at the granularity of SFI domains. Being a user-space framework, duPro does not require any OS changes. Since SFI is more lightweight than hardware- based isolation (e.g., OS processes), the inter-domain communication and scheduling in duPro are more efficient than process-level DIFC systems. Finally, duPro supports a novel checkpointing- restoration mechanism for efficiently reusing protection domains. Experiments demonstrate applications can be ported to duPro with negligible overhead, enhanced security, and with tight control over information flow.

    Other authors
    See publication
  • Enforcing User-Space Privilege Separation with Declarative Architectures

    ACM STC 2012

    Applying privilege separation in software development is an effective strategy for limiting the damage of an attack on a software system. In this approach, a software system is separated into a set of communicating protection domains of least privilege. In a privilege-separated system, even if one protection domain is hi- jacked by an attacker, the rest of the system may still function.
    uPro is a tool that provides efficient and flexible enforcement of privilege separation. It adopts…

    Applying privilege separation in software development is an effective strategy for limiting the damage of an attack on a software system. In this approach, a software system is separated into a set of communicating protection domains of least privilege. In a privilege-separated system, even if one protection domain is hi- jacked by an attacker, the rest of the system may still function.
    uPro is a tool that provides efficient and flexible enforcement of privilege separation. It adopts software-based fault isolation to implement protection domains in the user-space so that inter-domain communication is efficient. It provides a declarative language to describe an application’s security architecture, facilitating developers to identify different architecture alternatives. The evaluation shows that real applications can be ported to uPro with enhanced security, acceptable performance, and declarative architectures.

    Other authors
    See publication

Patents

  • Hybrid binaries supporting code stream folding

    Filed US 11,042,422

    A hybrid binary executable under both native processes and compatibility (e.g., emulated) processes. When the hybrid binary is loaded by a native process, the process executes a native code stream contained in the binary directly on a processor. When the hybrid binary is loaded by a compatibility process, the process executes an emulation-compatible (EC) code stream directly on a processor. When executing in a compatibility process, the EC code stream can interact with a foreign code stream…

    A hybrid binary executable under both native processes and compatibility (e.g., emulated) processes. When the hybrid binary is loaded by a native process, the process executes a native code stream contained in the binary directly on a processor. When the hybrid binary is loaded by a compatibility process, the process executes an emulation-compatible (EC) code stream directly on a processor. When executing in a compatibility process, the EC code stream can interact with a foreign code stream that executes in an emulator. The foreign code stream can be included in the hybrid binary itself, or can be external to the hybrid binary. The hybrid binary format supports folding of code between the native code stream and the EC code stream. The hybrid binary comprises a set of memory transformations which are applied to image data obtained from the binary when the hybrid binary executes under the compatibility process.

    Other inventors
    See patent
  • STACK TRACES USING SHADOW STACK

    Filed US 16417493

    A program is executed using a call stack and shadow stack. The call stack includes frames having respective return addresses. The frames may also store variables and/or parameters. The shadow stack stores duplicates of the return addresses in the call stack. The call stack and the shadow stack are maintained by, (i) each time a function is called, adding a corresponding stack frame to the call stack and adding a corresponding return address to the shadow stack, and (ii) each time a function is…

    A program is executed using a call stack and shadow stack. The call stack includes frames having respective return addresses. The frames may also store variables and/or parameters. The shadow stack stores duplicates of the return addresses in the call stack. The call stack and the shadow stack are maintained by, (i) each time a function is called, adding a corresponding stack frame to the call stack and adding a corresponding return address to the shadow stack, and (ii) each time a function is exited, removing a corresponding frame from the call stack and removing a corresponding return address from the shadow stack. A backtrace of the program's current call chain is generated by accessing the return addresses in the shadow stack. The outputted backtrace includes the return addresses from the shadow stack and/or information about the traced functions that is derived from the shadow stack's return addresses.

    Other inventors
    See patent
  • Methods for Enforcing Control Flow of a Computer Program

    Issued US 9361102

    One aspect of the invention provides a method of controlling execution of a computer program. The method comprises the following runtime steps: parsing code to identify one or more indirect branches; creating a branch ID data structure that maps an indirect branch location to a branch ID, which is the indirect branch's equivalence class ID; creating a target ID data structure that maps a code address to a target ID, which is an equivalence class ID to which the address belongs; and prior to…

    One aspect of the invention provides a method of controlling execution of a computer program. The method comprises the following runtime steps: parsing code to identify one or more indirect branches; creating a branch ID data structure that maps an indirect branch location to a branch ID, which is the indirect branch's equivalence class ID; creating a target ID data structure that maps a code address to a target ID, which is an equivalence class ID to which the address belongs; and prior to execution of an indirect branch including a return instruction located at an address: obtaining the branch ID associated with the return address from the branch ID data structure; obtaining the target ID associated with an actual return address for the indirect branch from the target ID data structure; and comparing the branch ID and the target ID.

    Other inventors
    See patent

Courses

  • Advanced Algorithms

    441

  • Advanced Programming Languages

    497

  • Database Systems, Algorithms and Applications

    341

  • Hardware and Software in Parallel Computing

    497

  • Pattern Recognition

    426

  • Theory of Operating Systems

    403

Honors & Awards

  • Best Paper Award at OSDI 2018

    Usenix OSDI

  • ACM SIGSAC Doctoral Dissertation Award Runner-Up 2016

    ACM SIGSAC

    https://www.sigsac.org/award/diss-awards.html

  • ACM SIGSAC Doctoral Dissertation Award Runner-Up 2016

    ACM SIGSAC

    https://www.sigsac.org/award/diss-awards.html

  • Real World Crypto 2016 Support

    RWC Organizing Committee

    $500 dollar support for serving on IEEE Security & Privacy 2016 Student Program Committee and attending RWC 2016.

  • CCS 2015 Student Travel Grant

    ACM CCS 2015 Student Travel Grant Chairs

    $1000 awarded to attend the conference.

  • CCS 2014 Student Travel Grant

    ACM CCS 2014 Student Travel Grant Chairs

    $1000 awarded to attend the conference.

  • BlackHat 2014 USA Student Scholarship

    BlackHat 2014 USA

    An Academic Pass ($895) was awarded for free.

  • Rossin Doctoral Fellow

    P.C. Rossin College of Engineering, Lehigh University

  • Graduate of Honor

    Zhejiang University

  • China Academy of Science Scholarship

    China Academy of Science

Languages

  • English

    Full professional proficiency

  • Chinese

    Native or bilingual proficiency

View Ben’s full profile

  • See who you know in common
  • Get introduced
  • Contact Ben directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Ben Niu in United States

Add new skills with these courses