Skip to content

Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.#835

Open
copybara-service[bot] wants to merge 1 commit intodevfrom
test_868146247

Conversation

@copybara-service
Copy link

Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.

…ask distribution, increase parallelism on decode, and use double the registers for the core of flash attention.

PiperOrigin-RevId: 868146247
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant