Beemo introduces a novel benchmark featuring 6.5k expert-edited machine-generated texts across diverse domains from creative writing to summarization. Through comprehensive evaluation of 33 MGT detector configurations, we demonstrate that expert editing effectively evades detection while LLM-edited texts remain distinguishable from human writing, highlighting critical gaps in current detection methods for multi-author scenarios.