Ben Niu

Redmond, Washington, United States
699 followers 500+ connections

View mutual connections with Ben

Welcome back

Email or phone

Password

Forgot password?

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Join to view profile

Microsoft

Lehigh University

About

I have extensive experience in computer system (OS kernel, compiler, hypervisor, etc.)…

Experience

Microsoft

Redmond, WA
-

Redmond, WA
-

Redmond, WA
-

Redmond, WA
-
-

One Microsoft Way, Redmond, WA 98052
-

1440 McCarthy Blvd, Milpitas, CA 95035
-

Education

Lehigh University

2010 - 2015
2010 - 2014
2005 - 2009

Volunteer Experience

Student Program Committee Member

IEEE Security and Privacy 2016

Nov 2015 - Jan 2016 3 months

Science and Technology

I volunteered to serve on the Student Program Committee (SPC) at IEEE Security and Privacy 2016 (Oakland'16), which is a premier conference in the area of information security. As a SPC member, I reviewed technical papers submitted to Oakland'16.
Shadow Program Committee Member

AsiaCCS 2017

Nov 2016 - Dec 2016 2 months

Science and Technology

Reviewing submitted papers and reviews.

Publications

HyperFuzzer: An Efficient Hybrid Fuzzer for Virtual CPUs

ACM Conference on Computer and Communications Security (CCS), 2021 October 9, 2021
In this cloud computing era, the security of hypervisors is critical to the overall security of the cloud. In particular, the security of CPU virtualization in hypervisors is paramount because it is implemented in the most privileged CPU mode. Blackbox and graybox fuzzing are limited to finding shallow virtual CPU bugs due to its huge search space. Whitebox fuzzing can be used for systematic analysis of CPU virtualization, but existing implementations rely on slow hardware emulators to…

In this cloud computing era, the security of hypervisors is critical to the overall security of the cloud. In particular, the security of CPU virtualization in hypervisors is paramount because it is implemented in the most privileged CPU mode. Blackbox and graybox fuzzing are limited to finding shallow virtual CPU bugs due to its huge search space. Whitebox fuzzing can be used for systematic analysis of CPU virtualization, but existing implementations rely on slow hardware emulators to enable dynamic symbolic execution.

In this paper, we present HyperFuzzer, the first efficient hybrid fuzzer for virtual CPUs. Our key observation is that a virtual CPU’s execution is determined by the VM state. Based on this observation, we design a new fuzzing setup that uses complete VM states as fuzzing inputs, and a new fuzzing technique we call Nimble Symbolic Execution to enable dynamic symbolic execution for CPU virtualization running on bare metal. Specifically, it uses the hardware to log the control flow efficiently, and then reconstructs an approximate execution trace from only the control flow and the fuzzing input. The reconstructed execution trace is surprisingly sufficient for precise dynamic symbolic execution of virtual CPUs.

We have built a prototype of HyperFuzzer based on Intel Processor Trace for Microsoft Hyper-V. Our experimental results show that HyperFuzzer can run thousands of tests per second, which is 3 orders of magnitude faster than using a hardware emulator. When compared with a baseline using full (control+data) execution traces, HyperFuzzer can still generate 96.8% of the test inputs generated by the baseline. HyperFuzzer has found 11 previously unknown virtual CPU bugs in the Hyper-V hypervisor, and all of them were confirmed and fixed.

Other authors
See publication
Reverse Debugging of Kernel Failures in Deployed Systems

Proceedings of the 2020 USENIX Annual Technical Conference (ATC) October 8, 2018
Post-mortem diagnosis of kernel failures is crucial for operating system vendors because kernel failures impact the reliability and security of the whole system. However, debugging kernel failures in deployed systems remains a challenge because developers have to speculate the conditions leading to the failure based on limited information such as memory dumps. In this paper, we present Kernel REPT, the first practical reverse debugging solution for kernel failures that is highly efficient…

Post-mortem diagnosis of kernel failures is crucial for operating system vendors because kernel failures impact the reliability and security of the whole system. However, debugging kernel failures in deployed systems remains a challenge because developers have to speculate the conditions leading to the failure based on limited information such as memory dumps. In this paper, we present Kernel REPT, the first practical reverse debugging solution for kernel failures that is highly efficient, imposes small memory footprint and requires no extra software layer. To meet this goal, Kernel REPT employs efficient hardware tracing to record the kernel's control flow on each processor, recognizes the control flow of each software thread based on the context switch history, and recovers its data flow by emulating machine instructions and hardware events such as interrupts and exceptions. We design, implement, and deploy Kernel REPT on Microsoft Windows. We show that developers can use Kernel REPT to do interactive reverse debugging and find the root cause of real-world kernel failures. Kernel REPT also enables automatic root-cause analysis for certain kernel failures that were hard to debug even manually. Furthermore, Kernel REPT can proactively identify kernel bugs by checking the reconstructed execution history against a set of predetermined invariants.

Other authors
See publication
REPT: Reverse Debugging of Failures in Deployed Software

Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018 September 17, 2018
Debugging software failures in deployed systems is important because they impact real users and customers. However, debugging such failures is notoriously hard in practice because developers have to rely on limited information such as memory dumps. The execution history is usually unavailable because high-fidelity program tracing is not affordable in deployed systems.

In this paper, we present REPT, a practical system that enables reverse debugging of software failures in deployed…

Debugging software failures in deployed systems is important because they impact real users and customers. However, debugging such failures is notoriously hard in practice because developers have to rely on limited information such as memory dumps. The execution history is usually unavailable because high-fidelity program tracing is not affordable in deployed systems.

In this paper, we present REPT, a practical system that enables reverse debugging of software failures in deployed systems. REPT reconstructs the execution history with high fidelity by combining online lightweight hardware tracing of a program's control flow with offline binary analysis that recovers its data flow. It is seemingly impossible to recover data values thousands of instructions before the failure due to information loss and concurrent execution. REPT tackles these challenges by constructing a partial execution order based on timestamps logged by hardware and iteratively performing forward and backward execution with error correction.

We design and implement REPT, deploy it on Microsoft Windows, and integrate it into Windows Debugger. We evaluate REPT on 16 real-world bugs and show that it can recover data values accurately (92% on average) and efficiently (less than 20 seconds) for these bugs. We also show that it enables effective reverse debugging for 14 bugs.

Other authors
See publication
Lazy Diagnosis of In-Production Concurrency Bugs

SOSP 2017 October 28, 2017
Diagnosing concurrency bugs—the process of understanding the root causes of failures—is hard. Developers depend on reproducing concurrency bugs to diagnose them. Traditionally, systems that attempt to reproduce concurrency bugs record fine-grained thread schedules of events (e.g., shared memory accesses) that lead to failures. Recording schedules incurs high runtime performance overhead and scales poorly, making existing techniques unsuitable in production.

In this paper, we formulate…

Diagnosing concurrency bugs—the process of understanding the root causes of failures—is hard. Developers depend on reproducing concurrency bugs to diagnose them. Traditionally, systems that attempt to reproduce concurrency bugs record fine-grained thread schedules of events (e.g., shared memory accesses) that lead to failures. Recording schedules incurs high runtime performance overhead and scales poorly, making existing techniques unsuitable in production.

In this paper, we formulate the coarse interleaving hypothesis, which states that events leading to concurrency bugs are coarsely interleaved. Therefore, a fine-grained and expensive recording is unnecessary for diagnosing concurrency bugs. We test the coarse interleaving hypothesis by studying 54 bugs in 13 systems and find that it holds in all cases. In particular, the time elapsed between events leading to concurrency bugs is on average 5 orders of magnitude greater than what is used today in fine-grained recording.

Using the coarse interleaving hypothesis, we develop Lazy Diagnosis, a hybrid dynamic-static interprocedural pointer and type analysis to diagnose the root causes of concurrency bugs. Our Lazy Diagnosis prototype, SNORLAX, relies on commodity hardware to track thread interleavings at a coarse granularity. SNORLAX does not require any source code changes and can diagnose complex concurrency bugs in real large-scale systems (MySQL, httpd, memcached, etc.) with full accuracy and an average runtime performance overhead of below 1%. Broadly, we believe that our findings can be used to build more efficient in-production bug detection and record/replay techniques.

Other authors
See publication
Per-Input Control-Flow Integrity

ACM CCS 2015 (Acceptance Rate: 19.8%) July 21, 2015
Control-Flow Integrity (CFI) is an effective approach to mitigating control-ﬂow hijacking attacks. Conventional CFI techniques statically extract a control-ﬂow graph (CFG) from a program and instrument the program to enforce that CFG. The statically generated CFG includes all edges for all possible inputs; however, for a concrete input, the CFG may include many unnecessary edges.

We present Per-Input Control-Flow Integrity (PICFI or πCFI), which is a new CFI technique that can enforce a…

Control-Flow Integrity (CFI) is an effective approach to mitigating control-ﬂow hijacking attacks. Conventional CFI techniques statically extract a control-ﬂow graph (CFG) from a program and instrument the program to enforce that CFG. The statically generated CFG includes all edges for all possible inputs; however, for a concrete input, the CFG may include many unnecessary edges.

We present Per-Input Control-Flow Integrity (PICFI or πCFI), which is a new CFI technique that can enforce a CFG computed for each concrete input. πCFI starts executing a program with the empty CFG and lets the program itself lazily add edges to the enforced CFG if such edges are required for the concrete input. The edge addition is performed by πCFI-inserted instrumentation code. To prevent attackers from arbitrarily adding edges, πCFI uses a statically computed all-input CFG to constrain what edges can be added at runtime. To minimize performance overhead, operations for adding edges are designed to be idempotent, so they can be patched to no-ops after their ﬁrst execution. As our evaluation shows, πCFI provides better security than conventional ﬁne-grained CFI with comparable performance overhead.

Other authors
See publication
RockJIT: Securing Just-In-Time Compilation Using Modular Control-Flow Integrity

ACM CCS 2014 (Acceptance Rate: 19%) August 24, 2014
Managed languages such as JavaScript are popular. For performance, modern implementations of managed languages adopt Just- In-Time (JIT) compilation. The danger to a JIT compiler is that an attacker can often control the input program and use it to trigger a vulnerability in the JIT compiler to launch code injection or JIT spraying attacks. In this paper, we propose a general approach called RockJIT to securing JIT compilers through Control-Flow Integrity (CFI). RockJIT builds a fine-grained…

Managed languages such as JavaScript are popular. For performance, modern implementations of managed languages adopt Just- In-Time (JIT) compilation. The danger to a JIT compiler is that an attacker can often control the input program and use it to trigger a vulnerability in the JIT compiler to launch code injection or JIT spraying attacks. In this paper, we propose a general approach called RockJIT to securing JIT compilers through Control-Flow Integrity (CFI). RockJIT builds a fine-grained control-flow graph from the source code of the JIT compiler and dynamically up- dates the control-flow policy when new code is generated on the fly. Through evaluation on Google’s V8 JavaScript engine, we demonstrate that RockJIT can enforce strong security on a JIT compiler, while incurring only modest performance overhead (14.6% on V8) and requiring a small amount of changes to V8’s code. Key contributions of RockJIT are a general architecture for securing JIT compilers and a method for generating fine-grained control-flow graphs from C++ code.

Other authors
See publication
Modular Control-Flow Integrity

ACM PLDI 2014 (Acceptance Rate: 18%) June 9, 2014
Control-Flow Integrity (CFI) is a software-hardening technique. It inlines checks into a program so that its execution always follows a predetermined Control-Flow Graph (CFG). As a result, CFI is effective at preventing control-flow hijacking attacks. However, past fine-grained CFI implementations do not support separate compilation, which hinders its adoption.
We present Modular Control-Flow Integrity (MCFI), a new CFI technique that supports separate compilation. MCFI allows modules to be…

Control-Flow Integrity (CFI) is a software-hardening technique. It inlines checks into a program so that its execution always follows a predetermined Control-Flow Graph (CFG). As a result, CFI is effective at preventing control-flow hijacking attacks. However, past fine-grained CFI implementations do not support separate compilation, which hinders its adoption.
We present Modular Control-Flow Integrity (MCFI), a new CFI technique that supports separate compilation. MCFI allows modules to be independently instrumented and linked statically or dynamically. The combined module enforces a CFG that is a combination of the individual modules’ CFGs. One challenge in support- ing dynamic linking in multithreaded code is how to ensure a safe transition from the old CFG to the new CFG when libraries are dynamically linked. The key technique we use is to have the CFG rep- resented in a runtime data structure and have reads and updates of the data structure wrapped in transactions to ensure thread safety. Our evaluation on SPECCPU2006 benchmarks shows that MCFI supports separate compilation, incurs low overhead of around 5%, and enhances security.

Other authors
See publication
Monitor Integrity Protection with Space Efficiency and Separate Compilation

ACM CCS 2013 (Acceptance Rate: 20%) November 4, 2013

Low-level inlined reference monitors weave monitor code into a program for security. To ensure that monitor code cannot be by- passed by branching instructions, some form of control-flow integrity must be guaranteed. Past approaches to protecting monitor code either have high space overhead or do not support separate compilation. We present Monitor Integrity Protection (MIP), a form of coarse-grained control-flow integrity. The key idea of MIP is to arrange instructions in variable-sized chunks…

Low-level inlined reference monitors weave monitor code into a program for security. To ensure that monitor code cannot be by- passed by branching instructions, some form of control-flow integrity must be guaranteed. Past approaches to protecting monitor code either have high space overhead or do not support separate compilation. We present Monitor Integrity Protection (MIP), a form of coarse-grained control-flow integrity. The key idea of MIP is to arrange instructions in variable-sized chunks and dynamically restrict indirect branches to target only chunk beginnings. We show that this simple idea is effective in protecting monitor code integrity, enjoys low space and execution-time overhead, supports separate compilation, and is largely compatible with an existing compiler toolchain. We also show that MIP enables a separate verifier that completely disassembles a binary and verifies its security.
MIP is designed to support inlined reference monitors. As a case study, we have implemented MIP-based Software-based Fault Isolation (SFI) on both x86-32 and x86-64. The evaluation shows that MIP-based SFI has competitive performance with other SFI implementations, while enjoying low space overhead.

See publication
Efficient User-Space Information Flow Control

ACM AsiaCCS 2013 (Acceptance Rate: 16%) May 8, 2013
The model of Decentralized Information Flow Control (DIFC) is effective at improving application security and can support rich confidentiality and integrity policies. We describe the design and implementation of duPro, an efficient user-space information flow control framework. duPro adopts Software-based Fault Isolation (SFI) to isolate protection domains within the same process. It controls the end-to-end information flow at the granularity of SFI domains. Being a user-space framework, duPro…

The model of Decentralized Information Flow Control (DIFC) is effective at improving application security and can support rich confidentiality and integrity policies. We describe the design and implementation of duPro, an efficient user-space information flow control framework. duPro adopts Software-based Fault Isolation (SFI) to isolate protection domains within the same process. It controls the end-to-end information flow at the granularity of SFI domains. Being a user-space framework, duPro does not require any OS changes. Since SFI is more lightweight than hardware- based isolation (e.g., OS processes), the inter-domain communication and scheduling in duPro are more efficient than process-level DIFC systems. Finally, duPro supports a novel checkpointing- restoration mechanism for efficiently reusing protection domains. Experiments demonstrate applications can be ported to duPro with negligible overhead, enhanced security, and with tight control over information flow.

Other authors
See publication
Enforcing User-Space Privilege Separation with Declarative Architectures

ACM STC 2012 October 15, 2012
Applying privilege separation in software development is an effective strategy for limiting the damage of an attack on a software system. In this approach, a software system is separated into a set of communicating protection domains of least privilege. In a privilege-separated system, even if one protection domain is hi- jacked by an attacker, the rest of the system may still function.
uPro is a tool that provides efficient and flexible enforcement of privilege separation. It adopts…

Applying privilege separation in software development is an effective strategy for limiting the damage of an attack on a software system. In this approach, a software system is separated into a set of communicating protection domains of least privilege. In a privilege-separated system, even if one protection domain is hi- jacked by an attacker, the rest of the system may still function.
uPro is a tool that provides efficient and flexible enforcement of privilege separation. It adopts software-based fault isolation to implement protection domains in the user-space so that inter-domain communication is efficient. It provides a declarative language to describe an application’s security architecture, facilitating developers to identify different architecture alternatives. The evaluation shows that real applications can be ported to uPro with enhanced security, acceptable performance, and declarative architectures.

Other authors
See publication

Patents

Hybrid binaries supporting code stream folding

Filed June 22, 2021 US 11,042,422
A hybrid binary executable under both native processes and compatibility (e.g., emulated) processes. When the hybrid binary is loaded by a native process, the process executes a native code stream contained in the binary directly on a processor. When the hybrid binary is loaded by a compatibility process, the process executes an emulation-compatible (EC) code stream directly on a processor. When executing in a compatibility process, the EC code stream can interact with a foreign code stream…

A hybrid binary executable under both native processes and compatibility (e.g., emulated) processes. When the hybrid binary is loaded by a native process, the process executes a native code stream contained in the binary directly on a processor. When the hybrid binary is loaded by a compatibility process, the process executes an emulation-compatible (EC) code stream directly on a processor. When executing in a compatibility process, the EC code stream can interact with a foreign code stream that executes in an emulator. The foreign code stream can be included in the hybrid binary itself, or can be external to the hybrid binary. The hybrid binary format supports folding of code between the native code stream and the EC code stream. The hybrid binary comprises a set of memory transformations which are applied to image data obtained from the binary when the hybrid binary executes under the compatibility process.

Other inventors
See patent
STACK TRACES USING SHADOW STACK

Filed May 20, 2019 US 16417493
A program is executed using a call stack and shadow stack. The call stack includes frames having respective return addresses. The frames may also store variables and/or parameters. The shadow stack stores duplicates of the return addresses in the call stack. The call stack and the shadow stack are maintained by, (i) each time a function is called, adding a corresponding stack frame to the call stack and adding a corresponding return address to the shadow stack, and (ii) each time a function is…

A program is executed using a call stack and shadow stack. The call stack includes frames having respective return addresses. The frames may also store variables and/or parameters. The shadow stack stores duplicates of the return addresses in the call stack. The call stack and the shadow stack are maintained by, (i) each time a function is called, adding a corresponding stack frame to the call stack and adding a corresponding return address to the shadow stack, and (ii) each time a function is exited, removing a corresponding frame from the call stack and removing a corresponding return address from the shadow stack. A backtrace of the program's current call chain is generated by accessing the return addresses in the shadow stack. The outputted backtrace includes the return addresses from the shadow stack and/or information about the traced functions that is derived from the shadow stack's return addresses.

Other inventors
See patent
Methods for Enforcing Control Flow of a Computer Program

Issued June 7, 2016 US 9361102
One aspect of the invention provides a method of controlling execution of a computer program. The method comprises the following runtime steps: parsing code to identify one or more indirect branches; creating a branch ID data structure that maps an indirect branch location to a branch ID, which is the indirect branch's equivalence class ID; creating a target ID data structure that maps a code address to a target ID, which is an equivalence class ID to which the address belongs; and prior to…

One aspect of the invention provides a method of controlling execution of a computer program. The method comprises the following runtime steps: parsing code to identify one or more indirect branches; creating a branch ID data structure that maps an indirect branch location to a branch ID, which is the indirect branch's equivalence class ID; creating a target ID data structure that maps a code address to a target ID, which is an equivalence class ID to which the address belongs; and prior to execution of an indirect branch including a return instruction located at an address: obtaining the branch ID associated with the return address from the branch ID data structure; obtaining the target ID associated with an actual return address for the indirect branch from the target ID data structure; and comparing the branch ID and the target ID.

Other inventors
See patent

Courses

Advanced Algorithms

441
Advanced Programming Languages

497
Database Systems, Algorithms and Applications

341
Hardware and Software in Parallel Computing

497
Pattern Recognition

426
Theory of Operating Systems

403

Honors & Awards

Best Paper Award at OSDI 2018

Usenix OSDI

Oct 2018
ACM SIGSAC Doctoral Dissertation Award Runner-Up 2016

ACM SIGSAC

Oct 2016

https://www.sigsac.org/award/diss-awards.html
ACM SIGSAC Doctoral Dissertation Award Runner-Up 2016

ACM SIGSAC

Oct 2016

https://www.sigsac.org/award/diss-awards.html
Real World Crypto 2016 Support

RWC Organizing Committee

Jan 2016

$500 dollar support for serving on IEEE Security & Privacy 2016 Student Program Committee and attending RWC 2016.
CCS 2015 Student Travel Grant

ACM CCS 2015 Student Travel Grant Chairs

2015

$1000 awarded to attend the conference.
CCS 2014 Student Travel Grant

ACM CCS 2014 Student Travel Grant Chairs

Nov 2014

$1000 awarded to attend the conference.
BlackHat 2014 USA Student Scholarship

BlackHat 2014 USA

Aug 2014

An Academic Pass ($895) was awarded for free.
Rossin Doctoral Fellow

P.C. Rossin College of Engineering, Lehigh University

2014
Graduate of Honor

Zhejiang University

2009
China Academy of Science Scholarship

China Academy of Science

2008

Languages

English

Full professional proficiency
Chinese

Native or bilingual proficiency

View Ben’s full profile

See who you know in common
Get introduced
Contact Ben directly

Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Ben Niu in United States

19 others named Ben Niu in United States are on LinkedIn

See others named Ben Niu

Add new skills with these courses

See all courses

Ben Niu

Redmond, Washington, United States 699 followers 500+ connections

About

Experience

-

-

-

-

-

-

-

Education

Volunteer Experience

Student Program Committee Member

IEEE Security and Privacy 2016

Shadow Program Committee Member

AsiaCCS 2017

Publications

ACM Conference on Computer and Communications Security (CCS), 2021 October 9, 2021

Proceedings of the 2020 USENIX Annual Technical Conference (ATC) October 8, 2018

Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018 September 17, 2018

SOSP 2017 October 28, 2017

ACM CCS 2015 (Acceptance Rate: 19.8%) July 21, 2015

ACM CCS 2014 (Acceptance Rate: 19%) August 24, 2014

ACM PLDI 2014 (Acceptance Rate: 18%) June 9, 2014

ACM CCS 2013 (Acceptance Rate: 20%) November 4, 2013

ACM AsiaCCS 2013 (Acceptance Rate: 16%) May 8, 2013

ACM STC 2012 October 15, 2012

Patents

Filed June 22, 2021 US 11,042,422

Filed May 20, 2019 US 16417493

Issued June 7, 2016 US 9361102

Courses

Advanced Algorithms

441

Advanced Programming Languages

497

Database Systems, Algorithms and Applications

341

Hardware and Software in Parallel Computing

497

Pattern Recognition

426

Theory of Operating Systems

403

Honors & Awards

Best Paper Award at OSDI 2018

Usenix OSDI

ACM SIGSAC Doctoral Dissertation Award Runner-Up 2016

ACM SIGSAC

ACM SIGSAC Doctoral Dissertation Award Runner-Up 2016

ACM SIGSAC

Real World Crypto 2016 Support

RWC Organizing Committee

CCS 2015 Student Travel Grant

ACM CCS 2015 Student Travel Grant Chairs

CCS 2014 Student Travel Grant

ACM CCS 2014 Student Travel Grant Chairs

BlackHat 2014 USA Student Scholarship

BlackHat 2014 USA

Rossin Doctoral Fellow

P.C. Rossin College of Engineering, Lehigh University

Graduate of Honor

Zhejiang University

China Academy of Science Scholarship

China Academy of Science

Languages

English

Full professional proficiency

Chinese

Native or bilingual proficiency

View Ben’s full profile

Other similar profiles

Young Jin Kim

Haiyuan Cao

(Z) Duong Nguyen-Huu

Xibei (Bill) Wang

Anubhav Gupta

Sonam Damani

Maral Mesmakhosroshahi

Redmond, Washington, United States
699 followers 500+ connections