Tools and Utilities

This section covers the utility tools provided with FlagEvalMM to enhance your evaluation workflow.

Batch Model Execution

The run_models.py script allows you to run multiple models in parallel on a multi-GPU system with automatic GPU management. This is particularly useful for evaluating multiple models on the same benchmark tasks.

Features

  • Dynamic GPU allocation based on model requirements

  • Parallel execution of models to maximize resource utilization

  • Automatic logging of model outputs

  • Graceful handling of interrupted runs

  • Support for specifying a common model directory (required)

Usage

The script requires specifying both a configuration file and the model directory:

python tools/run_models.py --config tools/configs/example_batch.py --models-base-dir model_path

You can also specify a custom model configuration directory (defaults to model_configs/open):

python tools/run_models.py --config tools/configs/example_batch.py --models-base-dir model_path --cfg-dir model_configs/open

Configuration Format

The configuration file should be in Python format with the following structure:

# List of models and their backends
model_info = [
    ["Model1-Name", "backend_type"],
    ["Model2-Name", "custom/adapter.py"],
    # ...
]

# List of tasks to evaluate
tasks = [
    "tasks/task1.py",
    "tasks/task2.py",
    # ...
]

# Optional output directory
output_dir = "./results/batch_eval"

Model Configuration Files

Each model also needs a JSON configuration file in the specified cfg-dir directory matching the model name. For example:

{
    "backend": "vllm",
    "extra_args": "--tensor-parallel-size 4 --max-model-len 32768"
}

The script will automatically add the model path using the specified --model-dir. For example, if you run with --model-dir /path/to/models, the actual model path will be set to /path/to/models/Model-Name.

Note

HuggingFace model references (like “Qwen/Qwen2-VL-72B-Instruct”) will be preserved as-is.

GPU Requirements

The script automatically allocates GPUs based on predefined requirements in the GPU_REQUIREMENTS dictionary. Models not listed there will use 1 GPU by default.

Example

Run multiple models on MMMU and MMVET tasks:

# Run with models located in a common directory
python tools/run_models.py --config tools/configs/example_batch.py --model-dir models/vlm

This will evaluate all models specified in the config on all the listed tasks with appropriate GPU allocation, using the model files from the specified model directory.