Skip to content

feat: Add SVE kernels for TopKV#1256

Open
morgolock wants to merge 1 commit intomainfrom
pr/topkv_sve_kernels
Open

feat: Add SVE kernels for TopKV#1256
morgolock wants to merge 1 commit intomainfrom
pr/topkv_sve_kernels

Conversation

@morgolock
Copy link
Contributor

Change-Id: I7a0c7bd1154b9cb7f35c7fd1c3b8ad54698f8799

filelist.json Outdated
"src/cpu/kernels/topkv/generic/neon/qasymm8_signed.cpp"
]
}
"files": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the indentation change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in next patch

@morgolock morgolock force-pushed the pr/topkv_sve_kernels branch from 1deb127 to 7944860 Compare February 6, 2026 16:12
@morgolock morgolock force-pushed the pr/topkv_sve_kernels branch from 7944860 to 21ba8e9 Compare March 2, 2026 12:56
Resolves MLCE-1719

Change-Id: I7a0c7bd1154b9cb7f35c7fd1c3b8ad54698f8799
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
inline uint32_t count_gt_block<float16_t>(const float16_t *ptr, float16_t thr, uint32_t block_elems)
{
const svbool_t pg = svwhilelt_b16(static_cast<uint64_t>(0), static_cast<uint64_t>(block_elems));
const svfloat16_t v = svld1_f16(pg, ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have two questions to ask:

  • Why do we convert to Fp32 in the Neon(TM) implementation?
  • We should incorporate epsilon in both implementations.

@morgolock morgolock force-pushed the pr/topkv_sve_kernels branch from 21ba8e9 to 36f1ac1 Compare March 9, 2026 14:28
@morgolock morgolock requested a review from gunes-arm March 9, 2026 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants