Transformer.from_folder - Can we specify FlashAttention ?

### Python -VV

```shell
-
```

### Pip Freeze

```shell
pip freeze | grep mistral
mistral_common==1.5.1
mistral_inference==1.5.0
```

### Reproduction Steps

```python
self.model_path = model_path
        try:
            from mistral_inference.transformer import Transformer
            from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
        except ImportError as err:
            logging.critical('Please install `mistral-inference` and `mistral_common`')
            raise err

        if os.path.exists(model_path):
            cache_path = model_path
        else:
            if get_cache_path(model_path) is None:
                snapshot_download(repo_id=model_path)
            cache_path = get_cache_path(self.model_path, repo_type='models')

        self.tokenizer = MistralTokenizer.from_file(f'{cache_path}/tekken.json')
        model = Transformer.from_folder(cache_path, device='cpu')
        model.cuda()
        self.model = model
        self.max_tokens = 2048
``` 

### Expected Behavior

1. The inference for Pixtral is super slow. Is their a way to specify to use flash-attention2 ?

### Additional Context

_No response_

### Suggested Solutions

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transformer.from_folder - Can we specify FlashAttention ? #238

Python -VV

Pip Freeze

Reproduction Steps

Expected Behavior

Additional Context

Suggested Solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Transformer.from_folder - Can we specify FlashAttention ? #238

Description

Python -VV

Pip Freeze

Reproduction Steps

Expected Behavior

Additional Context

Suggested Solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions