Release Notes - 2023.06.29

This release sees the addition of Bark, a text-to-speech model. As well as a NSFW filter to ensure your generated images are ready for professional use.


Bark is a transformer-based text-to-audio model created by Suno. It can generate highly realistic, multilingual speech, as well as other types of audio such as music, background noise, and simple sound effects. The model can also produce nonverbal communications, including laughing, sighing, and crying. Pretrained model checkpoints are provided to support the research community. These are ready for inference and available for commercial use​.

In Takomo, you can use Bark to generate text-to-speech, using several voice presets.

More information on Bark:

NSFW Filter

When using Takomo for commercial or other professional purpose, it's important to filter adult content and prevent it from being generated. We now have a node available to do just that. Simply insert the node after the image output from Stable Diffusion to detect explicit content.

