We adopt an experiment-oriented perspective to investigate two essential characteristics – expressiveness and robustness – of multiple criteria sorting methods. We focus on the approaches from the family of UTADIS, learning the parameters of a value-driven threshold-based model from the Decision Maker’s assignment examples. Even if the considered properties are crucial for the methods’ reliability and usefulness in real-world scenarios, their verification through explicit numerical tests has been so far neglected. On the one hand, expressiveness captures the models’ flexibility to reproduce different preferences, including simple and complex ones, meaningfully and accurately. On the other hand, robustness reflects the ability to deliver valid recommendations and ensure proper conclusiveness given the multiplicity of compatible preference model instances. We consider different variants of UTADIS, from assuming monotonic and preferentially independent criteria to more advanced settings that relax the monotonicity constraints or represent interactions. The experimental results capture the trade-off between the considered quality dimensions, indicating that richer models are characterized by greater expressiveness and lesser robustness. We also formulate a comprehensive framework indicating when some variant should be used, given the nature of supplied preferences or problem characteristics. These findings aid decision analysts in making robust recommendations in different contexts and help refine preference modeling assumptions. The framework’s practical use is illustrated in a case study involving sorting mobile phone models into pre-defined preference-ordered classes.