- Biology is encoded into the space of natural proteins through over three billion years of evolution.
- Language models trained on tokens generated by evolution can act as evolutionary simulators to generate functional proteins.
- ESM3 is a frontier multimodal generative language model that reasons over the sequence, structure, and function of proteins.
- ESM3 can follow complex prompts combining its modalities and is highly responsive to biological alignment.
- ESM3 was prompted to generate fluorescent proteins with a chain of thought.
- Among the generations synthesized, a bright fluorescent protein was found at a far distance (58% identity) from known fluorescent proteins.
- Similarly distant natural fluorescent proteins are separated by over five hundred million years of evolution.