Skip to content

Conversation

@barrbrain
Copy link
Collaborator

No description provided.

@barrbrain
Copy link
Collaborator Author

barrbrain commented Dec 7, 2023

See #3292 (comment) for a perf trace from before this PR.
Here is the same workload after this PR:

# Samples: 166K of event 'cycles'
# Event count (approx.): 92343598454
#
#       Overhead  Command / Shared Object / Symbol
# ..............  ...............................................................................................................................................................................................................
#
   100.00%        rav1e  
       91.71%        rav1e                
           6.89%        [.] put_8tap_neon
            |          
            |--5.80%--rav1e::me::estimate_motion
            |          rav1e::me::sub_pixel_me (inlined)
            |          rav1e::me::subpel_diamond_search (inlined)
            |          rav1e::me::get_subpel_mv_rd (inlined)
            |          put_8tap_neon
            |          
             --0.57%--put_8tap_neon
                       put_8tap_neon

           4.81%        [.] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
            |
            ---rav1e::api::internal::ContextInner<T>::receive_packet
               rav1e::api::internal::ContextInner<T>::compute_block_importances (inlined)
               <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::for_each (inlined)
               rav1e::api::internal::ContextInner<T>::compute_block_importances::{{closure}} (inlined)
               rav1e::api::internal::ContextInner<T>::update_block_importances (inlined)
               core::iter::traits::iterator::Iterator::for_each (inlined)
               <core::iter::adapters::flatten::FlatMap<I,U,F> as core::iter::traits::iterator::Iterator>::fold (inlined)
               <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold (inlined)
               <core::iter::adapters::fuse::Fuse<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::traits::iterator::Iterator::fold (inlined)
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
               core::iter::adapters::map::map_fold::{{closure}} (inlined)
               core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold::flatten::{{closure}} (inlined)
               <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold::flatten::{{closure}} (inlined)
               <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
               |          
                --4.80%--<core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
                          core::iter::traits::iterator::Iterator::fold (inlined)
                          |          
                           --4.74%--<core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
                                     |          
                                      --4.73%--core::iter::adapters::map::map_fold::{{closure}} (inlined)
                                                |          
                                                 --4.68%--rav1e::api::internal::ContextInner<T>::update_block_importances::{{closure}}::{{closure}} (inlined)
                                                           |          
                                                           |--1.21%--<v_frame::plane::Plane<T> as rav1e::frame::plane::AsRegion<T>>::region (inlined)
                                                           |          |          
                                                           |           --0.90%--rav1e::tiling::plane_region::PlaneRegion<T>::new (inlined)
                                                           |                     rav1e::tiling::plane_region::PlaneRegion<T>::from_slice (inlined)
                                                           |          
                                                            --0.56%--rav1e::asm::aarch64::dist::get_satd (inlined)

           4.77%        [.] rav1e::asm::aarch64::transform::forward::daala_fdct32
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               rav1e::asm::aarch64::transform::forward::daala_fdct32
               |          
                --3.95%--rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32 (inlined)
                          |          
                           --2.68%--rav1e::asm::aarch64::transform::forward::daala_fdst_iv_16_asym (inlined)
                                     |          
                                      --0.63%--rav1e::asm::aarch64::transform::forward::RotateKernel::half_kernel (inlined)

           4.49%        [.] rav1e_satd8x8_neon
            |          
             --4.25%--rav1e::api::internal::ContextInner<T>::receive_packet
                       rav1e::api::internal::ContextInner<T>::compute_block_importances (inlined)
                       <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                       rav1e::api::internal::ContextInner<T>::compute_block_importances::{{closure}} (inlined)
                       rav1e::api::internal::ContextInner<T>::update_block_importances (inlined)
                       core::iter::traits::iterator::Iterator::for_each (inlined)
                       <core::iter::adapters::flatten::FlatMap<I,U,F> as core::iter::traits::iterator::Iterator>::fold (inlined)
                       <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold (inlined)
                       core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold (inlined)
                       <core::iter::adapters::fuse::Fuse<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
                       <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                       <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
                       core::iter::traits::iterator::Iterator::fold (inlined)
                       <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
                       core::iter::adapters::map::map_fold::{{closure}} (inlined)
                       core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold::flatten::{{closure}} (inlined)
                       <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold::flatten::{{closure}} (inlined)
                       <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                       <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
                       core::iter::traits::iterator::Iterator::fold (inlined)
                       <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
                       core::iter::adapters::map::map_fold::{{closure}} (inlined)
                       rav1e::api::internal::ContextInner<T>::update_block_importances::{{closure}}::{{closure}} (inlined)
                       rav1e::asm::aarch64::dist::get_satd (inlined)
                       satd8x8_neon (inlined)

           4.33%        [.] rav1e::rdo::compute_distortion
            |          
             --4.28%--rav1e::rdo::rdo_mode_decision
                       |          
                        --4.16%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                                  core::iter::traits::iterator::Iterator::try_fold (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                                  rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                                  rav1e::rdo::luma_chroma_mode_rdo
                                  rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
                                  rav1e::rdo::compute_distortion
                                  |          
                                  |--2.45%--rav1e::rdo::cdef_dist_wxh (inlined)
                                  |          |          
                                  |          |--1.46%--rav1e::asm::aarch64::dist::cdef_dist::cdef_dist_kernel (inlined)
                                  |          |          |          
                                  |          |           --1.12%--rav1e::activity::apply_ssim_boost (inlined)
                                  |          |                     |          
                                  |          |                      --1.00%--rav1e::activity::ssim_boost_rsqrt (inlined)
                                  |          |          
                                  |           --0.54%--rav1e::rdo::compute_distortion::{{closure}} (inlined)
                                  |          
                                   --1.26%--rav1e::rdo::sse_wxh (inlined)
                                             |          
                                              --0.93%--rav1e::rdo::compute_distortion::{{closure}} (inlined)
                                                        |          
                                                         --0.63%--rav1e::rdo::distortion_scale (inlined)

           4.18%        [.] rav1e::asm::aarch64::transform::forward::daala_fdct64
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               rav1e::asm::aarch64::transform::forward::daala_fdct64
               |          
               |--2.26%--rav1e::asm::aarch64::transform::forward::daala_fdst_iv_32_asym (inlined)
               |          |          
               |           --0.52%--rav1e::asm::aarch64::transform::forward::RotateKernel::half_kernel (inlined)
               |          
               |--0.74%--rav1e::asm::aarch64::transform::forward::daala_fdct64::butterfly_pair (inlined)
               |          
                --0.56%--rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32_asym (inlined)

           4.00%        [.] rav1e::asm::aarch64::transform::forward::forward_transform_neon
            |          
             --3.95%--rav1e::asm::aarch64::transform::forward::forward_transform_neon
                       |          
                       |--0.82%--<core::iter::adapters::zip::Zip<A,B> as core::iter::traits::iterator::Iterator>::next (inlined)
                       |          <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next (inlined)
                       |          
                       |--0.72%--rav1e::asm::aarch64::transform::forward::round_shift_array_neon (inlined)
                       |          
                        --0.60%--rav1e::asm::aarch64::transform::forward::transpose_8x8_neon (inlined)

           3.70%        [.] rav1e::encoder::encode_block_post_cdef
            |          
            |--2.99%--rav1e::encoder::encode_partition_topdown
            |          rav1e::rdo::rdo_partition_decision
            |          |          
            |          |--1.67%--rav1e::rdo::rdo_partition_simple (inlined)
            |          |          rav1e::rdo::rdo_mode_decision
            |          |          |          
            |          |           --1.61%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
            |          |                     <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
            |          |                     core::iter::traits::iterator::Iterator::try_fold (inlined)
            |          |                     <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
            |          |                     rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
            |          |                     rav1e::rdo::luma_chroma_mode_rdo
            |          |                     rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
            |          |                     rav1e::encoder::encode_block_post_cdef
            |          |          
            |           --1.32%--rav1e::rdo::rdo_partition_none (inlined)
            |                     rav1e::rdo::rdo_mode_decision
            |                     |          
            |                      --1.28%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
            |                                <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
            |                                core::iter::traits::iterator::Iterator::try_fold (inlined)
            |                                <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
            |                                rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
            |                                rav1e::rdo::luma_chroma_mode_rdo
            |                                rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
            |                                rav1e::encoder::encode_block_post_cdef
            |          
             --0.54%--rav1e::encoder::encode_partition_bottomup
                       rav1e::encoder::encode_partition_bottomup
                       rav1e::rdo::rdo_mode_decision
                       |          
                        --0.53%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                                  core::iter::traits::iterator::Iterator::try_fold (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                                  rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                                  rav1e::rdo::luma_chroma_mode_rdo
                                  rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
                                  rav1e::encoder::encode_block_post_cdef

           3.55%        [.] rav1e_satd16x8_neon
            |          
             --2.94%--rav1e::me::estimate_motion
                       |          
                        --2.63%--rav1e::me::sub_pixel_me (inlined)
                                  rav1e::me::subpel_diamond_search (inlined)
                                  rav1e::me::get_subpel_mv_rd (inlined)
                                  rav1e::me::compute_mv_rd (inlined)
                                  satd16x8_neon (inlined)

           2.90%        [.] rav1e_cdef_dist_kernel_8x8_neon
            |          
             --2.87%--rav1e::rdo::rdo_mode_decision
                       |          
                        --2.80%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                                  core::iter::traits::iterator::Iterator::try_fold (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                                  rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                                  rav1e::rdo::luma_chroma_mode_rdo
                                  rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
                                  rav1e::rdo::compute_distortion
                                  rav1e::rdo::cdef_dist_wxh (inlined)
                                  rav1e::asm::aarch64::dist::cdef_dist::cdef_dist_kernel (inlined)
                                  cdef_dist_kernel_8x8_neon (inlined)

           2.39%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
            |          
             --2.34%--rav1e::encoder::encode_tx_block
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_coeffs_lv_map
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_coeffs (inlined)
                       <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
                       |          
                       |--0.86%--rav1e::context::cdf_context::CDFContextLog::push (inlined)
                       |          rav1e::context::cdf_context::CDFContextLogPartition<_>::push (inlined)
                       |          
                        --0.84%--<rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol (inlined)
                                  |          
                                   --0.60%--<rav1e::ec::WriterBase<rav1e::ec::WriterCounter> as rav1e::ec::StorageBackend>::store (inlined)

           2.32%        [.] rav1e::asm::aarch64::transform::forward::daala_fdct_ii_16
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               |          
               |--1.57%--rav1e::asm::aarch64::transform::forward::daala_fdct16
               |          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_16
               |          |          
               |          |--0.63%--rav1e::asm::aarch64::transform::forward::daala_fdct_ii_8_asym (inlined)
               |          |          
               |           --0.63%--rav1e::asm::aarch64::transform::forward::daala_fdst_iv_8_asym (inlined)
               |          
                --0.75%--rav1e::asm::aarch64::transform::forward::daala_fdct64
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32_asym (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_16

           2.30%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_coeffs_lv_map
            |
            ---rav1e::encoder::encode_tx_block
               rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_coeffs_lv_map
               |          
               |--0.92%--rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_coeffs (inlined)
               |          |          
               |           --0.54%--<core::iter::adapters::rev::Rev<I> as core::iter::traits::iterator::Iterator>::next (inlined)
               |                     <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::double_ended::DoubleEndedIterator>::next_back (inlined)
               |                     <core::iter::adapters::zip::Zip<A,B> as core::iter::traits::double_ended::DoubleEndedIterator>::next_back (inlined)
               |                     <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next_back (inlined)
               |          
                --0.70%--rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_coeff_signs (inlined)

           2.23%        [.] core::ops::function::impls::<impl core::ops::function::FnMut<A> for &mut F>::call_mut
            |
            ---rav1e::api::internal::ContextInner<T>::receive_packet
               rav1e::api::internal::ContextInner<T>::compute_block_importances (inlined)
               <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::for_each (inlined)
               rav1e::api::internal::ContextInner<T>::compute_block_importances::{{closure}} (inlined)
               rav1e::api::internal::ContextInner<T>::update_block_importances (inlined)
               core::iter::traits::iterator::Iterator::for_each (inlined)
               <core::iter::adapters::flatten::FlatMap<I,U,F> as core::iter::traits::iterator::Iterator>::fold (inlined)
               <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold (inlined)
               <core::iter::adapters::fuse::Fuse<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::traits::iterator::Iterator::fold (inlined)
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
               core::iter::adapters::map::map_fold::{{closure}} (inlined)
               core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold::flatten::{{closure}} (inlined)
               <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold::flatten::{{closure}} (inlined)
               <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::traits::iterator::Iterator::fold (inlined)
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
               core::iter::adapters::map::map_fold::{{closure}} (inlined)
               core::ops::function::impls::<impl core::ops::function::FnMut<A> for &mut F>::call_mut
               |          
                --2.17%--core::iter::traits::iterator::Iterator::for_each::call::{{closure}} (inlined)
                          rav1e::api::internal::ContextInner<T>::update_block_importances::{{closure}} (inlined)
                          |          
                           --0.73%--rav1e::api::internal::ContextInner<T>::update_block_importances::{{closure}}::{{closure}} (inlined)

           1.90%        [.] rav1e::quantize::QuantizationContext::quantize
            |          
             --1.88%--rav1e::encoder::encode_tx_block
                       rav1e::quantize::QuantizationContext::quantize
                       |          
                        --0.61%--core::iter::traits::iterator::Iterator::max (inlined)
                                  core::iter::traits::iterator::Iterator::max_by (inlined)
                                  core::iter::traits::iterator::Iterator::reduce (inlined)
                                  |          
                                   --0.61%--<core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold (inlined)
                                             core::iter::traits::iterator::Iterator::fold (inlined)

           1.72%        [.] rav1e::quantize::rust::dequantize
            |          
             --1.71%--rav1e::encoder::encode_tx_block
                       rav1e::quantize::rust::dequantize
                       |          
                        --1.22%--<core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::next (inlined)
                                  <core::iter::adapters::zip::Zip<A,B> as core::iter::traits::iterator::Iterator>::next (inlined)
                                  <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next (inlined)

           1.46%        [.] rav1e::encoder::encode_tx_block
            |          
             --1.42%--rav1e::encoder::encode_tx_block
                       |          
                        --0.92%--rav1e::encoder::diff (inlined)

           1.31%        [.] rav1e_sad32x32_neon
            |          
            |--0.76%--rav1e::api::internal::ContextInner<T>::send_frame
            |          rav1e::api::internal::ContextInner<T>::compute_frame_invariants (inlined)
            |          rav1e::api::internal::ContextInner<T>::compute_lookahead_motion_vectors (inlined)
            |          rav1e::api::lookahead::compute_motion_vectors
            |          rayon::iter::ParallelIterator::for_each (inlined)
            |          rayon::iter::for_each::for_each (inlined)
            |          <rayon::vec::IntoIter<T> as rayon::iter::ParallelIterator>::drive_unindexed (inlined)
            |          rayon::iter::plumbing::bridge (inlined)
            |          <rayon::vec::IntoIter<T> as rayon::iter::IndexedParallelIterator>::with_producer
            |          <rayon::vec::Drain<T> as rayon::iter::IndexedParallelIterator>::with_producer (inlined)
            |          <rayon::iter::plumbing::bridge::Callback<C> as rayon::iter::plumbing::ProducerCallback<I>>::callback (inlined)
            |          rayon::iter::plumbing::bridge_producer_consumer (inlined)
            |          rayon::iter::plumbing::bridge_producer_consumer::helper
            |          rayon::iter::plumbing::Producer::fold_with (inlined)
            |          <rayon::iter::for_each::ForEachConsumer<F> as rayon::iter::plumbing::Folder<T>>::consume_iter (inlined)
            |          core::iter::traits::iterator::Iterator::for_each (inlined)
            |          core::iter::traits::iterator::Iterator::fold (inlined)
            |          core::iter::traits::iterator::Iterator::for_each::call::{{closure}} (inlined)
            |          core::ops::function::impls::<impl core::ops::function::FnMut<A> for &F>::call_mut
            |          rav1e::api::lookahead::compute_motion_vectors::{{closure}} (inlined)
            |          rav1e::me::estimate_tile_motion
            |          rav1e::me::refine_subsampled_sb_motion (inlined)
            |          rav1e::me::refine_subsampled_motion_estimate (inlined)
            |          rav1e::me::full_search
            |          rav1e::me::compute_mv_rd (inlined)
            |          rav1e::asm::aarch64::dist::get_sad (inlined)
            |          sad32x32_neon (inlined)
            |          
             --0.55%--rav1e::me::estimate_motion
                       rav1e::me::full_pixel_me (inlined)
                       rav1e::me::full_pixel_me::{{closure}}

           1.14%        [.] rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
            |          
             --0.90%--rav1e::encoder::encode_partition_topdown
                       |          
                        --0.89%--rav1e::rdo::rdo_partition_decision
                                  |          
                                   --0.63%--rav1e::rdo::rdo_partition_simple (inlined)
                                             rav1e::rdo::rdo_mode_decision
                                             |          
                                              --0.57%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                                                        <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                                                        core::iter::traits::iterator::Iterator::try_fold (inlined)
                                                        <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                                                        rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                                                        rav1e::rdo::luma_chroma_mode_rdo
                                                        rav1e::rdo::luma_chroma_mode_rdo::{{closure}}

           1.13%        [.] rav1e::asm::aarch64::transform::forward::daala_fdct_ii_8
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               |          
                --0.93%--rav1e::asm::aarch64::transform::forward::daala_fdct32
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32 (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_16_asym (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_8

           1.06%        [.] rav1e::me::get_fullpel_mv_rd
            |
            ---rav1e::me::estimate_motion
               |          
                --1.04%--rav1e::me::full_pixel_me (inlined)
                          |          
                           --1.01%--rav1e::me::full_pixel_me::{{closure}}
                                     |          
                                      --0.55%--rav1e::me::get_best_predictor (inlined)
                                                rav1e::me::get_fullpel_mv_rd

           0.99%        [.] rav1e::cdef::cdef_filter_superblock
            |          
             --0.92%--rav1e::encoder::encode_frame
                       rav1e::encoder::encode_tile_group (inlined)
                       rav1e::encoder::FrameState<T>::apply_tile_state_mut (inlined)
                       rav1e::encoder::encode_tile_group::{{closure}} (inlined)
                       rav1e::cdef::cdef_filter_tile
                       rav1e::cdef::cdef_filter_superblock

           0.94%        [.] rav1e::asm::aarch64::transform::forward::daala_fdst_iv_16
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               |          
                --0.93%--rav1e::asm::aarch64::transform::forward::daala_fdct64
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32_asym (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdst_iv_16

           0.91%        [.] rav1e::lrf::rust::sgrproj_box_ab_r1
           0.86%        [.] rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_contexts
            |          
             --0.85%--rav1e::encoder::encode_tx_block
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_coeffs_lv_map
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_coeffs (inlined)
                       rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_contexts
                       |          
                        --0.80%--rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_ctx (inlined)
                                  |          
                                   --0.64%--rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_ctx_from_stats (inlined)

           0.81%        [.] rav1e::deblock::sse_size14
           0.81%        [.] rav1e::asm::aarch64::transform::forward::daala_fdst_iv_8
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               |          
                --0.81%--rav1e::asm::aarch64::transform::forward::daala_fdct32
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32 (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_16_asym (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdst_iv_8

           0.80%        [.] prep_neon
            |          
             --0.59%--rav1e::rdo::rdo_partition_decision

           0.79%        [.] rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_mag
            |          
             --0.77%--rav1e::encoder::encode_tx_block
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_coeffs_lv_map
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_coeffs (inlined)
                       rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_contexts
                       rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_ctx (inlined)
                       rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_mag

           0.74%        [.] rav1e::me::full_pixel_me::{{closure}}
            |
            ---rav1e::me::estimate_motion
               rav1e::me::full_pixel_me (inlined)
               rav1e::me::full_pixel_me::{{closure}}

           0.69%        [.] rav1e_sad16x16_neon
            |
            ---rav1e::me::estimate_motion
               rav1e::me::full_pixel_me (inlined)
               |          
                --0.66%--rav1e::me::full_pixel_me::{{closure}}

           0.60%        [.] rav1e::lrf::rust::sgrproj_box_f_r1
           0.59%        [.] rav1e::asm::aarch64::transform::forward::daala_fdct16
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               rav1e::asm::aarch64::transform::forward::daala_fdct16

           0.56%        [.] rav1e::predict::rust::pred_directional
           0.53%        [.] rav1e::predict::PredictionMode::predict_inter
           0.52%        [.] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
            |
            ---rav1e::rdo::rdo_mode_decision
               |          
                --0.51%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                          <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                          core::iter::traits::iterator::Iterator::try_fold (inlined)
                          <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                          rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                          rav1e::rdo::luma_chroma_mode_rdo
                          rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
                          rav1e::rdo::compute_distortion
                          rav1e::rdo::sse_wxh (inlined)
                          rav1e::asm::aarch64::dist::sse::get_weighted_sse (inlined)
                          rav1e::asm::aarch64::dist::sse::get_weighted_sse::{{closure}} (inlined)
                          rav1e::dist::rust::get_weighted_sse
                          core::iter::traits::iterator::Iterator::sum (inlined)
                          <u64 as core::iter::traits::accum::Sum>::sum (inlined)
                          <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                          core::iter::traits::iterator::Iterator::fold (inlined)
                          core::iter::adapters::map::map_fold::{{closure}} (inlined)
                          rav1e::dist::rust::get_weighted_sse::{{closure}} (inlined)
                          core::iter::traits::iterator::Iterator::sum (inlined)
                          <u64 as core::iter::traits::accum::Sum>::sum (inlined)
                          <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold

           0.51%        [.] rav1e_sad64x64_neon
            |
            ---rav1e::me::estimate_motion
               rav1e::me::full_pixel_me (inlined)
               rav1e::me::full_pixel_me::{{closure}}

           0.51%        [.] rav1e_avg_8bpc_neon
           0.50%        [.] rav1e::partition::BlockSize::from_width_and_height_opt
            |
            ---rav1e::api::internal::ContextInner<T>::receive_packet
               rav1e::api::internal::ContextInner<T>::compute_block_importances (inlined)
               <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::for_each (inlined)
               rav1e::api::internal::ContextInner<T>::compute_block_importances::{{closure}} (inlined)
               rav1e::api::internal::ContextInner<T>::update_block_importances (inlined)
               core::iter::traits::iterator::Iterator::for_each (inlined)
               <core::iter::adapters::flatten::FlatMap<I,U,F> as core::iter::traits::iterator::Iterator>::fold (inlined)
               <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold (inlined)
               <core::iter::adapters::fuse::Fuse<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::traits::iterator::Iterator::fold (inlined)
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
               core::iter::adapters::map::map_fold::{{closure}} (inlined)
               core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold::flatten::{{closure}} (inlined)
               <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold::flatten::{{closure}} (inlined)
               <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::traits::iterator::Iterator::fold (inlined)
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
               core::iter::adapters::map::map_fold::{{closure}} (inlined)
               rav1e::api::internal::ContextInner<T>::update_block_importances::{{closure}}::{{closure}} (inlined)
               rav1e::asm::aarch64::dist::get_satd (inlined)
               rav1e::partition::BlockSize::from_width_and_height_opt

           0.49%        [.] rav1e::predict::PredictionMode::predict_inter_single
           0.48%        [.] rav1e::me::get_subset_predictors
           0.48%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.48%        [.] rav1e::me::estimate_motion
           0.44%        [.] prep_8tap_neon
           0.42%        [.] rav1e::encoder::encode_block_pre_cdef
           0.42%        [.] rav1e::deblock::filter_v_edge
           0.40%        [.] rav1e::deblock::sse_size6
           0.40%        [.] put_neon
           0.38%        [.] rav1e::lrf::rust::sgrproj_box_ab_r2
           0.36%        [.] rav1e::deblock::filter_h_edge
           0.36%        [.] rav1e::deblock::sse_v_edge
           0.31%        [.] rav1e::encoder::write_tx_tree
           0.30%        [.] rav1e::deblock::filter_wide14_12
           0.30%        [.] rav1e::rdo::rdo_mode_decision
           0.29%        [.] rav1e::encoder::motion_compensate
           0.27%        [.] rav1e::context::block_unit::BlockContext::get_txb_ctx
           0.27%        [.] rav1e::partition::BlockSize::largest_chroma_tx_size
           0.26%        [.] rav1e::encoder::encode_block_post_cdef
           0.26%        [.] rav1e::deblock::sse_h_edge
           0.25%        [.] rav1e_weighted_sse_16x16_neon
           0.24%        [.] rav1e_inv_dct_8h_x16_neon
           0.23%        [.] rav1e::lrf::rust::sgrproj_box_f_r2
           0.22%        [.] rav1e::deblock::deblock_size14_inner
           0.22%        [.] rav1e_weighted_sse_32x32_neon
           0.20%        [.] rav1e::lrf::sgrproj_solve
           0.19%        [.] rav1e::partition::BlockSize::from_width_and_height_opt
           0.19%        [.] rav1e::context::block_unit::FrameBlocks::new
           0.19%        [.] rav1e::me::estimate_tile_motion
           0.19%        [.] rav1e_inv_dct32_odd_8h_x16_neon
           0.17%        [.] rav1e::rdo::rdo_tx_size_type
           0.17%        [.] rav1e::context::frame_header::<impl rav1e::context::cdf_context::ContextWriter>::write_ref_frames
           0.17%        [.] rav1e::partition::get_intra_edges
           0.16%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::fill_neighbours_ref_counts
           0.16%        [.] rav1e::context::partition_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_segmentation
           0.16%        [.] rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_br_ctx
           0.16%        [.] rav1e::me::full_search
           0.15%        [.] rav1e::lrf::sgrproj_stripe_filter
           0.14%        [.] rav1e::deblock::deblock_size6_inner
           0.14%        [.] rav1e::api::lookahead::estimate_importance_block_difference
           0.13%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::find_mvrefs
           0.13%        [.] rav1e::context::partition_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_segment_pred
           0.12%        [.] rav1e::quantize::QuantizationContext::update
           0.11%        [.] rav1e::activity::variance_8x8
           0.11%        [.] rav1e::rdo::rdo_loop_plane_error
           0.10%        [.] rav1e::transform::forward_shared::Txfm2DFlipCfg::fwd
           0.10%        [.] rav1e::context::partition_unit::<impl rav1e::context::block_unit::BlockContext>::reset_skip_context
           0.10%        [.] rav1e::deblock::deblock_size
           0.09%        [.] rav1e::lrf::setup_integral_image
           0.09%        [.] rav1e::context::block_unit::BlockContext::set_coeff_context
           0.09%        [.] inv_txfm_add_vert_dct_8x32_neon
           0.09%        [.] memset@plt
           0.09%        [.] rav1e::rdo::rdo_loop_decision
           0.09%        [.] rav1e::rdo::spatiotemporal_scale
           0.08%        [.] rav1e::context::block_unit::BlockContext::intra_inter_context
           0.08%        [.] rav1e::predict::rust::filter_edge
           0.08%        [.] memcpy@plt
           0.08%        [.] rav1e::context::partition_unit::<impl rav1e::context::block_unit::BlockContext>::skip_context
           0.08%        [.] rav1e::me::MotionEstimationSubsets::all_mvs
           0.07%        [.] rav1e::partition::BlockSize::from_width_and_height_opt
           0.07%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_eob
           0.07%        [.] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
           0.07%        [.] inv_txfm_horz_dct_32x8_neon
           0.07%        [.] inv_dct64_step2_neon
           0.06%        [.] inv_dct64_step1_neon
           0.06%        [.] rav1e::predict::PredictionMode::predict_intra
           0.06%        [.] rav1e::asm::aarch64::transform::forward::daala_fdct8
           0.06%        [.] rav1e::deblock::deblock_filter_optimize
           0.06%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::scan_col_mbmi
           0.05%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::add_ref_mv_candidate
           0.05%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::scan_row_mbmi
           0.05%        [.] rav1e::rdo::clip_visible_bsize
           0.05%        [.] rav1e::api::lookahead::estimate_inter_costs
           0.05%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_comp_ref_type_ctx
           0.05%        [.] core::ops::function::impls::<impl core::ops::function::FnMut<A> for &F>::call_mut
           0.05%        [.] rav1e::context::partition_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_partition
           0.05%        [.] rav1e::rdo::luma_chroma_mode_rdo
           0.05%        [.] rav1e::encoder::save_block_motion
           0.04%        [.] rav1e::api::lookahead::estimate_intra_costs
           0.04%        [.] inv_txfm_add_vert_dct_8x64_neon
           0.04%        [.] rav1e::asm::aarch64::mc::mc_avg
           0.04%        [.] rav1e::context::transform_unit::get_tx_set
           0.04%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.04%        [.] v_frame::plane::Plane<T>::downsampled
           0.04%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.04%        [.] rav1e::api::internal::ContextInner<T>::receive_packet
           0.04%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_coeffs_lv_map
           0.04%        [.] rav1e_weighted_sse_4x4_neon
           0.04%        [.] __aarch64_ldadd4_rel
           0.04%        [.] rav1e::partition::BlockSize::from_width_and_height
           0.04%        [.] rav1e::ec::rust::update_cdf
           0.03%        [.] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
           0.03%        [.] rav1e::ec::rust::update_cdf
           0.03%        [.] rav1e::encoder::write_tx_blocks
           0.03%        [.] rav1e::rdo::rdo_cfl_alpha::{{closure}}::{{closure}}
           0.03%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.03%        [.] rav1e::asm::aarch64::transform::forward::daala_fdct4
           0.03%        [.] rav1e_inv_txfm_dct_8h_x64_neon
           0.03%        [.] rav1e::context::partition_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_segmentation
           0.03%        [.] rav1e::deblock::sse_size8
           0.03%        [.] rav1e::context::partition_unit::<impl rav1e::context::block_unit::BlockContext>::partition_plane_context
           0.03%        [.] rav1e::asm::aarch64::transform::inverse::inverse_transform_add
           0.03%        [.] core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
           0.03%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.03%        [.] rav1e::asm::aarch64::dist::cdef_dist::cdef_dist_kernel
           0.03%        [.] rav1e::tiling::plane_region::PlaneRegionMut<T>::scratch_copy
           0.03%        [.] rav1e::encoder::encode_block_pre_cdef
           0.03%        [.] rav1e::encoder::encode_partition_bottomup
           0.03%        [.] core::cmp::PartialOrd::lt
           0.03%        [.] rav1e::rdo::rdo_partition_decision
           0.03%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.03%        [.] inv_txfm_add_vert_8x16_neon
           0.03%        [.] core::cmp::PartialOrd::ge
           0.03%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_inter_mode
           0.02%        [.] inv_txfm_horz_16x8_neon
           0.02%        [.] rav1e::encoder::encode_block_with_modes
           0.02%        [.] rav1e_sad8x8_neon
           0.02%        [.] __aarch64_cas4_acq
           0.02%        [.] rav1e_weighted_sse_8x8_neon
           0.02%        [.] inv_txfm_horz_dct_64x8_neon
           0.02%        [.] rav1e::encoder::CodedFrameData<T>::compute_spatiotemporal_scores
           0.02%        [.] rav1e::deblock::sse_size4
           0.02%        [.] rav1e_inv_dct_8h_x8_neon
           0.02%        [.] rav1e::cdef::cdef_analyze_superblock
           0.02%        [.] rav1e::encoder::encode_partition_topdown
           0.02%        [.] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
           0.02%        [.] <arrayvec::arrayvec::ArrayVec<T,_> as core::iter::traits::collect::FromIterator<T>>::from_iter
           0.02%        [.] core::slice::sort::insertion_sort_shift_left
           0.02%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_compound_mode
           0.02%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_mv
           0.02%        [.] rav1e::partition::has_tr
           0.02%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_comp_mode_ctx
           0.02%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.02%        [.] rav1e::encoder::encode_tx_block
           0.02%        [.] rav1e_cdef_find_dir_8bpc_neon
           0.02%        [.] cdef_filter8_sec_edged_8bpc_neon
           0.02%        [.] rav1e_ipred_paeth_8bpc_neon
           0.01%        [.] rav1e::partition::supersample_chroma_bsize
           0.01%        [.] rav1e::context::frame_header::<impl rav1e::context::cdf_context::ContextWriter>::write_ref_frames
           0.01%        [.] rav1e::partition::BlockSize::largest_chroma_tx_size
           0.01%        [.] rav1e_ipred_smooth_8bpc_neon
           0.01%        [.] rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_tx_type
           0.01%        [.] <rav1e::ec::WriterBase<rav1e::ec::WriterEncoder> as rav1e::ec::StorageBackend>::store
           0.01%        [.] rav1e::encoder::check_lf_queue
           0.01%        [.] inv_txfm_add_8x8_neon
           0.01%        [.] rav1e::asm::aarch64::cdef::cdef_filter_block
           0.01%        [.] rav1e::ec::rust::update_cdf
           0.01%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.01%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.01%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.01%        [.] cdef_filter8_sec_8bpc_neon
           0.01%        [.] rav1e::deblock::deblock_size8_inner
           0.01%        [.] rav1e_inv_txfm_dct_clear_8h_x64_neon
           0.01%        [.] rav1e_ipred_dc_128_8bpc_neon
           0.01%        [.] rav1e_ipred_cfl_128_8bpc_neon
           0.01%        [.] rav1e::activity::ActivityMask::fill_scales
           0.01%        [.] rav1e::dist::rust::get_weighted_sse
           0.01%        [.] rav1e_ipred_smooth_h_8bpc_neon
           0.01%        [.] rav1e_inv_txfm_add_dct_dct_32x32_8bpc_neon
           0.01%        [.] rav1e_inv_adst_8h_x16_neon
           0.01%        [.] rav1e::context::<impl rav1e::context::cdf_context::ContextWriter>::encode_mv_component
           0.01%        [.] v_frame::plane::Plane<T>::pad
           0.01%        [.] rav1e::activity::ActivityMask::from_plane
           0.01%        [.] rav1e::util::logexp::blog32_q11
           0.01%        [.] <T as alloc::vec::spec_from_elem::SpecFromElem>::from_elem
           0.01%        [.] rav1e::api::internal::ContextInner<T>::send_frame
           0.01%        [.] rav1e::tiling::tile_state::TileStateMut<T>::new
           0.01%        [.] core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
           0.01%        [.] core::slice::sort::merge_sort
           0.01%        [.] rav1e::asm::aarch64::transform::forward::daala_fdst16
           0.01%        [.] rav1e::partition::BlockSize::subsize
           0.01%        [.] alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::Edge>::insert_recursing
           0.01%        [.] rav1e::context::block_unit::BlockContext::checkpoint
           0.01%        [.] rav1e::lrf::rust::sgrproj_box_f_r0
           0.01%        [.] rav1e_ipred_smooth_v_8bpc_neon
           0.01%        [.] rav1e::deblock::deblock_size4_inner
           0.01%        [.] rav1e::predict::rust::upsample_edge
           0.01%        [.] rav1e::context::partition_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_partition
           0.01%        [.] <alloc::boxed::Box<[I]> as core::iter::traits::collect::FromIterator<I>>::from_iter
           0.01%        [.] rav1e::context::partition_unit::<impl rav1e::context::block_unit::BlockContext>::update_partition_context
           0.01%        [.] rav1e::deblock::deblock_adjusted_level
           0.01%        [.] rav1e_inv_txfm_add_dct_dct_64x64_8bpc_neon
           0.01%        [.] <arrayvec::arrayvec::ArrayVec<T,_> as core::clone::Clone>::clone
           0.01%        [.] rav1e::ec::rust::update_cdf
           0.01%        [.] crossbeam_epoch::default::with_handle
           0.00%        [.] rav1e_ipred_dc_8bpc_neon
           0.00%        [.] rav1e::asm::aarch64::transform::forward::daala_fdst_vii_4
           0.00%        [.] idct_dc_w32_neon
           0.00%        [.] rav1e::context::frame_header::<impl rav1e::context::cdf_context::ContextWriter>::write_lrf
           0.00%        [.] rav1e_cdef_padding4_edged_8bpc_neon
           0.00%        [.] core::cmp::PartialOrd::le
           0.00%        [.] alloc::raw_vec::RawVec<T,A>::reserve_for_push
           0.00%        [.] rav1e::ec::rust::update_cdf
           0.00%        [.] idct_dc_w64_neon
           0.00%        [.] rav1e::ec::rust::update_cdf
           0.00%        [.] rav1e::asm::aarch64::predict::dispatch_predict_intra::{{closure}}
           0.00%        [.] rav1e_prep_8tap_regular_8bpc_neon
           0.00%        [.] cdef_filter4_sec_edged_8bpc_neon
           0.00%        [.] rav1e::lrf::RestorationState::lrf_filter_frame
           0.00%        [.] rav1e_inv_adst_8h_x8_neon
           0.00%        [.] core::slice::sort::insertion_sort_shift_left
           0.00%        [.] rav1e_ipred_cfl_8bpc_neon
           0.00%        [.] idct_dc_w16_neon
           0.00%        [.] core::cmp::PartialOrd::gt
           0.00%        [.] cdef_filter4_sec_8bpc_neon
           0.00%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.00%        [.] <T as alloc::vec::spec_from_elem::SpecFromElem>::from_elem
           0.00%        [.] crossbeam_deque::deque::Stealer<T>::steal
           0.00%        [.] rav1e::do_encode
           0.00%        [.] <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::try_fold
           0.00%        [.] std::sys::unix::locks::futex_mutex::Mutex::lock_contended
           0.00%        [.] inv_txfm_add_4x4_neon
           0.00%        [.] core::cmp::PartialOrd::ge
           0.00%        [.] core::slice::sort::merge_sort
           0.00%        [.] rav1e_inv_txfm_dct_clear_scale_8h_x64_neon
           0.00%        [.] arrayvec::arrayvec::ArrayVec<T,_>::push
           0.00%        [.] idct_dc_w8_neon
           0.00%        [.] cdef_filter4_pri_8bpc_neon
           0.00%        [.] rav1e::util::logexp::bexp64
           0.00%        [.] core::slice::sort::insertion_sort_shift_left
           0.00%        [.] cdef_filter8_pri_edged_8bpc_neon
           0.00%        [.] rav1e_ipred_h_8bpc_neon
           0.00%        [.] inv_txfm_add_16x16_neon
           0.00%        [.] alloc::collections::btree::remove::<impl alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::KV>>::remove_leaf_kv
           0.00%        [.] alloc::raw_vec::finish_grow
           0.00%        [.] rayon_core::registry::global_registry
           0.00%        [.] __aarch64_ldadd8_rel
           0.00%        [.] rayon_core::registry::WorkerThread::wait_until_cold
           0.00%        [.] rayon_core::registry::Registry::in_worker_cold
           0.00%        [.] crossbeam_deque::deque::Injector<T>::steal
           0.00%        [.] <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
           0.00%        [.] rav1e_inv_txfm_add_dct_dct_8x8_8bpc_neon
           0.00%        [.] core::slice::sort::insertion_sort_shift_left
           0.00%        [.] rav1e_inv_dct_4h_x4_neon
           0.00%        [.] rav1e::predict::rust::dr_intra_derivative
           0.00%        [.] rav1e::recon_intra::has_top_right
           0.00%        [.] inv_txfm_horz_scale_dct_32x8_neon
           0.00%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_inter_mode
           0.00%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::count_signed_subexp_with_ref
           0.00%        [.] rav1e::context::cdf_context::CDFContext::new
           0.00%        [.] core::fmt::write
           0.00%        [.] <bitstream_io::write::BitWriter<W,bitstream_io::BigEndian> as rav1e::header::UncompressedHeader>::write_frame_header_obu
           0.00%        [.] rav1e::encoder::encode_frame
           0.00%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_intra_mode_kf
           0.00%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_compound_mode
           0.00%        [.] rav1e::predict::luma_ac
           0.00%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.00%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_eob
           0.00%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.00%        [.] rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_tx_type
           0.00%        [.] pow@plt
           0.00%        [.] cdef_filter8_pri_8bpc_neon
           0.00%        [.] cdef_filter4_pri_edged_8bpc_neon
           0.00%        [.] rav1e::asm::aarch64::transform::forward::daala_fdst8
           0.00%        [.] alloc::collections::btree::map::entry::OccupiedEntry<K,V,A>::remove_kv
           0.00%        [.] rav1e_inv_txfm_add_dct_dct_4x4_8bpc_neon
           0.00%        [.] rav1e_cdef_padding8_edged_8bpc_neon
           0.00%        [.] rav1e::encoder::FrameInvariants<T>::set_ref_frame_sign_bias
           0.00%        [.] rav1e::util::logexp::blog64
           0.00%        [.] rav1e::encoder::FrameInvariants<T>::new_inter_frame
           0.00%        [.] std::env::_var_os
           0.00%        [.] crossbeam_deque::deque::Worker<T>::pop
           0.00%        [.] rayon_core::join::join_context::{{closure}}
           0.00%        [.] crossbeam_epoch::internal::Global::try_advance
           0.00%        [.] rayon_core::sleep::Sleep::sleep
           0.00%        [.] __aarch64_ldadd8_acq_rel
           0.00%        [.] rav1e::encoder::FrameInvariants<T>::new_key_frame
           0.00%        [.] core::cmp::PartialOrd::ge
           0.00%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.00%        [.] rav1e_inv_txfm_add_dct_adst_16x16_8bpc_neon
           0.00%        [.] rav1e::partition::BlockSize::largest_chroma_tx_size
           0.00%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.00%        [.] rav1e_satd16x16_neon
           0.00%        [.] rav1e_inv_adst_4h_x4_neon
           0.00%        [.] rav1e_put_8tap_regular_8bpc_neon
           0.00%        [.] rav1e::util::kmeans::scan
           0.00%        [.] <rav1e::tiling::tiler::TileContextIterMut<T> as core::iter::traits::iterator::Iterator>::next
           0.00%        [.] rav1e::quantize::select_dc_qi
           0.00%        [.] core::slice::sort::merge_sort
           0.00%        [.] rav1e::context::<impl rav1e::context::cdf_context::ContextWriter>::encode_mv_component
           0.00%        [.] <std::io::stdio::Stdin as std::io::Read>::read
           0.00%        [.] rav1e_inv_txfm_add_dct_dct_16x16_8bpc_neon
           0.00%        [.] core::num::flt2dec::strategy::grisu::format_exact_opt
           0.00%        [.] rav1e_cdef_padding8_8bpc_neon
           0.00%        [.] <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,alloc::vec::into_iter::IntoIter<T>>>::spec_extend
           0.00%        [.] alloc::collections::btree::map::entry::VacantEntry<K,V,A>::insert
           0.00%        [.] <rav1e::muxer::ivf::IvfMuxer as rav1e::muxer::Muxer>::flush
           0.00%        [.] <bitstream_io::write::BitWriter<W,E> as bitstream_io::write::BitWrite>::write
           0.00%        [.] rav1e::encoder::encode_show_existing_frame
           0.00%        [.] rav1e::lrf::RestorationPlane::restoration_unit_by_stripe
           0.00%        [.] rav1e_satd64x32_neon
           0.00%        [.] rav1e::cdef::cdef_filter_tile
           0.00%        [.] rav1e::context::cdf_context::CDFContext::reset_counts
           0.00%        [.] alloc::fmt::format::format_inner
           0.00%        [.] rav1e::scenechange::SceneChangeDetector<T>::run_comparison
           0.00%        [.] rav1e::encoder::FrameState<T>::new_with_frame
           0.00%        [.] rayon_core::job::StackJob<L,F,R>::run_inline
           0.00%        [.] rav1e::tiling::tile::Tile<T>::subregion::{{closure}}
           0.00%        [.] crossbeam_deque::deque::Worker<T>::pop
           0.00%        [.] alloc::sync::Arc<T,A>::make_mut
           0.00%        [.] rayon_core::join::join_context::{{closure}}
           0.00%        [.] <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
           0.00%        [.] core::ptr::drop_in_place<core::iter::adapters::map::Map<rayon::vec::SliceDrain<(rav1e::tiling::tiler::TileContextMut<u16>,&mut rav1e::context::cdf_context::CDFContext)>,&rav1e::encoder::encode_tile_group<u16>::{{closure}}>>
           0.00%        [.] rav1e_satd64x64_neon
           0.00%        [.] rav1e_cdef_padding4_8bpc_neon
           0.00%        [.] <bitstream_io::write::BitWriter<W,E> as bitstream_io::write::BitWrite>::write_bit
           0.00%        [.] rav1e::recon_intra::has_bottom_left
           0.00%        [.] rayon_core::latch::LockLatch::wait_and_reset
           0.00%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_intra_mode_kf
           0.00%        [.] rav1e_sad16x8_neon
           0.00%        [.] core::ptr::drop_in_place<rav1e::encoder::ReferenceFramesSet<u8>>
           0.00%        [.] rav1e_ipred_v_8bpc_neon
           0.00%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.00%        [.] rav1e_weighted_sse_8x4_neon
           0.00%        [.] rav1e_ipred_cfl_ac_420_8bpc_neon
           0.00%        [.] rav1e::quantize::select_ac_qi
           0.00%        [.] rav1e::stats::build_frame_summary
           0.00%        [.] rayon::iter::collect::collect_with_consumer
           0.00%        [.] rav1e_sad64x32_neon
           0.00%        [.] rav1e::ec::rust::update_cdf
           0.00%        [.] rav1e_satd32x32_neon
           0.00%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::write_golomb
           0.00%        [.] malloc@plt
           0.00%        [.] core::num::flt2dec::strategy::grisu::format_exact_opt::possibly_round
           0.00%        [.] <rayon::vec::IntoIter<T> as rayon::iter::IndexedParallelIterator>::with_producer
           0.00%        [.] <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
           0.00%        [.] rayon::iter::plumbing::Folder::consume_iter
           0.00%        [.] <fern::log_impl::Stderr as log::Log>::log
           0.00%        [.] __aarch64_cas8_acq
           0.00%        [.] crossbeam_deque::deque::Injector<T>::push
           0.00%        [.] core::option::Option<&T>::cloned
           0.00%        [.] rav1e::cdef::cdef_analyze_superblock_range
           0.00%        [.] inv_txfm_horz_scale_16x8_neon
           0.00%        [.] __aarch64_ldset8_rel
           0.00%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_mv
           0.00%        [.] rav1e_inv_txfm_add_adst_dct_8x8_8bpc_neon
           0.00%        [.] free@plt
           0.00%        [.] alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::Edge>::insert_recursing
           0.00%        [.] <bitstream_io::write::BitWriter<W,bitstream_io::BigEndian> as rav1e::header::ULEB128Writer>::write_uleb128
           0.00%        [.] rav1e_sad16x4_neon
           0.00%        [.] alloc::raw_vec::RawVec<T,A>::reserve::do_reserve_and_handle
           0.00%        [.] alloc::raw_vec::RawVec<T,A>::reserve::do_reserve_and_handle
           0.00%        [.] <fern::log_impl::Dispatch as log::Log>::log
           0.00%        [.] alloc::raw_vec::RawVec<T,A>::reserve::do_reserve_and_handle
           0.00%        [.] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
           0.00%        [.] <bitstream_io::write::BitWriter<W,E> as bitstream_io::write::BitWrite>::write
           0.00%        [.] alloc::collections::btree::map::entry::OccupiedEntry<K,V,A>::remove_kv
           0.00%        [.] core::fmt::float::float_to_decimal_common_exact
           0.00%        [.] <bitstream_io::BigEndian as bitstream_io::Endianness>::write_signed
           0.00%        [.] memmove@plt
           0.00%        [.] rav1e::encoder::update_rec_buffer
           0.00%        [.] <bitstream_io::write::BitWriter<W,bitstream_io::BigEndian> as rav1e::header::UncompressedHeader>::write_segment_data
           0.00%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
           0.00%        [.] alloc::collections::btree::map::BTreeMap<K,V,A>::insert
           0.00%        [.] <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute
           0.00%        [.] rav1e::encoder::build_raw_tile_group
           0.00%        [.] <rayon_core::latch::LatchRef<L> as rayon_core::latch::Latch>::set
           0.00%        [.] <rav1e::stats::ProgressInfo as core::fmt::Display>::fmt
           0.00%        [.] rav1e_cdef_filter8_8bpc_neon
           0.00%        [.] core::ptr::drop_in_place<rayon::vec::DrainProducer<rav1e::tiling::tiler::TileContextMut<u16>>>
           0.00%        [.] v_frame::plane::Plane<T>::copy_from_raw_u8
           0.00%        [.] <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute
           0.00%        [.] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
           0.00%        [.] core::fmt::Formatter::write_formatted_parts
           0.00%        [.] __aarch64_cas8_acq_rel
           0.00%        [.] <rayon::vec::IntoIter<T> as rayon::iter::IndexedParallelIterator>::with_producer
           0.00%        [.] <alloc::boxed::Box<[T],A> as core::clone::Clone>::clone
           0.00%        [.] core::ops::function::impls::<impl core::ops::function::FnMut<A> for &F>::call_mut
           0.00%        [.] <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
           0.00%        [.] rayon_core::sleep::Sleep::wake_specific_thread
           0.00%        [.] core::ptr::drop_in_place<rayon::vec::Drain<rav1e::tiling::tiler::TileContextMut<u16>>>
           0.00%        [.] __aarch64_swp4_rel
           0.00%        [.] crossbeam_epoch::sync::queue::Queue<T>::try_pop_if
           0.00%        [.] rav1e::api::lookahead::compute_motion_vectors
           0.00%        [.] rav1e::decoder::y4m::<impl rav1e::decoder::Decoder for y4m::Decoder<alloc::boxed::Box<dyn std::io::Read+core::marker::Send>>>::read_frame
           0.00%        [.] __aarch64_ldadd4_relax
           0.00%        [.] alloc::collections::btree::navigate::LeafRange<BorrowType,K,V>::perform_next_checked
           0.00%        [.] alloc::collections::btree::navigate::<impl alloc::collections::btree::node::NodeRef<BorrowType,K,V,alloc::collections::btree::node::marker::LeafOrInternal>>::find_leaf_edges_spanning_range
           0.00%        [.] rav1e::scenechange::SceneChangeDetector<T>::analyze_next_frame
        7.46%        libc.so.6            
           2.33%        [.] 0x0000000000099e48
            |          
             --2.23%--rav1e::asm::aarch64::transform::forward::forward_transform_neon
                       rav1e::asm::aarch64::transform::forward::daala_fdct64
                       0xffff864e9e48

           2.30%        [.] 0x0000000000099e50
            |          
             --2.16%--rav1e::asm::aarch64::transform::forward::forward_transform_neon
                       rav1e::asm::aarch64::transform::forward::daala_fdct64
                       0xffff864e9e50

           0.60%        [.] 0x0000000000099e44
            |          
             --0.53%--rav1e::asm::aarch64::transform::forward::forward_transform_neon
                       rav1e::asm::aarch64::transform::forward::daala_fdct64
                       0xffff864e9e44

@codecov
Copy link

codecov bot commented Dec 7, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (8202c7d) 88.22% compared to head (4c70193) 88.24%.

Files Patch % Lines
src/asm/shared/dist/sse.rs 99.27% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3300      +/-   ##
==========================================
+ Coverage   88.22%   88.24%   +0.01%     
==========================================
  Files          87       88       +1     
  Lines       28221    28210      -11     
==========================================
- Hits        24898    24893       -5     
+ Misses       3323     3317       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@barrbrain barrbrain force-pushed the aarch64-weighted-sse branch 2 times, most recently from 72304ab to 07631a1 Compare December 8, 2023 09:36
@barrbrain barrbrain force-pushed the aarch64-weighted-sse branch 3 times, most recently from ec30325 to 33321c9 Compare December 8, 2023 15:45
@barrbrain barrbrain force-pushed the aarch64-weighted-sse branch from 33321c9 to 4c70193 Compare December 8, 2023 16:22
@barrbrain barrbrain marked this pull request as ready for review December 8, 2023 16:30
@barrbrain barrbrain requested a review from lu-zero December 8, 2023 16:30
@barrbrain barrbrain merged commit 84a25cf into xiph:master Dec 8, 2023
@barrbrain barrbrain deleted the aarch64-weighted-sse branch December 8, 2023 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants