Deepgram
p/deepgram
Voice AI platform for developers.
Sam Altman
DeepGram — Search engine for speech
Featured
87
Replies
Ryan Hoover
As more videos are created, there's a growing opportunity to indexing this content and make it searchable. @stephensonsco -- Audiosear.ch is doing some very interesting things in the podcasting space. You and @annewootton might be able to work together.
Adam Marx
@rrhoover @stephensonsco @annewootton I agree. The thing that's always fascinated me about video content indexing in particular is the ability to index based on content not searchable in the video's title or tags. Being able to search based on the mood or the actual content of the video (which is inherently more subjective than anything else) is something that I think holds a lot of opportunity in the future.
Noah Shutty
@adammarx13 @rrhoover @stephensonsco @annewootton Absolutely! We need content-aware search to navigate the ever-widening torrent of online media. Podcasts are a great example where the audio contains 1000s of words but there's little metadata. We're big fans of what audiosear.ch is doing and would love to get in touch with @annewootton !
Jonathon Triest
QA/Compliance just got 100000x more efficient!
Noah Shutty
thanks @jtriest ! We aim to make audio 100000x more useful.
Jonathon Triest
@noajshu 1000000000000000000000000000000000000000x
Abe Storey
I love this. Freakin' brilliant and seamless.. Can you do product analysis @stephensonsco ?
Scott Stephenson
@abe_storey We might be able to. What kind of product analysis are you looking for?
Abe Storey
@stephensonsco What's your email? I'll dm you more.
Noah Shutty
@abe_storey @stephensonsco Reach out to contact -at- deepgram -dot- com
Noah Shutty
@abe_storey @stephensonsco Or just leave us a note at deepgram.com/contact :)
Malcolm Ocean
Huh, I'm kind of confused. Is it parsing it into words, and then comparing those with a fuzzy match? Or something much subtler, using phonemes rather than words. It seems that the words shown aren't actually that accurate, but the fuzzy match *is*. Given that searching "conquer" matches the spoken word "concrete" which is close-captioned as "conquering", but "concrete" doesn't seem to find this snippet, I'm guessing that it's more like the first thing I guessed... Curious also at which stage(s) in the process the AI/ML is used.
Scott Stephenson
@malcolm_ocean Great point! There are circumstances where the match will be imperfect. We are constantly working on improving the algorithm by increasing both precision and recall (check out https://en.wikipedia.org/wiki/Pr... for a very detailed discussion on precision and recall in search). It's useful to think of the results in the Google sense where your first result may not necessarily be the best result. You might have to venture into the next few results (keep clicking 'next' and the engine will 'lower its standards') to really find what you are looking for. The UI for this isn't super simple yet since it's a new idea—how do you present search results in video and audio? We have achieved much better accuracy than traditional search based on transcriptions, but we always are trying to improve!
Scott Stephenson
@malcolm_ocean I noticed I missed a few points you asked about. The AI is used in the indexing stage and in prediction layers that are built on top of the search (the prediction is a 'special' thing that customers have to request). The search does work based on a fuzzy model—matching of words and sounds are all weighted and compared based on their probability of being correct and how far away it is from your query.
Malcolm Ocean
@stephensonsco hm, I still don't quite understand. Are you using the neural networks directly on the audio data? Like with the audio as input nodes? Or both that and the transcription? Or...
Scott Stephenson
@malcolm_ocean The NN is used (more or less) directly on the audio waveform to produce a searchable index. When you hit the search button, the index is queried and the most relevant results are returned back to you. The NN is not active during that query stage though, just the indexing. I hope that helps!
Spencer Schoeben
Wow. This is really great execution. Can't wait to see this sort of indexing of AV content to become more commonplace.
Chris Strom
I love the concept - eager to use it! I tried to index a YouTube video and got a Failed status. Any ideas why this could happen?
Scott Stephenson
@marketplicity We'll check this out right away. What's the YouTube URL? You can also drop us a line at https://www.deepgram.com/contact with any problems. Thanks for letting us know!
Ouriel Ohayon
that would be killer to use this to built a true search engine for podcasts.
Scott Stephenson
@ourielohayon I totally agree! Which platforms would you want to search on?
Kenny
Someone needs to go redesign the site. The search doesn't seem to work. Just keeps playing the same video. I think it skips to parts of the video where the search keyword can be found? No idea. I searched "apple" and nothing happened.
Scott Stephenson
@topcities Sorry that it's annoying. There is only a single video to search through as a demo and 'apple' isn't mentioned in the video. You can upload videos from YouTube or your personal audio/video stash by creating an account and dropping it into the console. Let us know how that goes for you!
Kenny
@stephensonsco The technology is cool. With better messaging on the site I think you'd increase your signup and engagement rates a lot.
Noah Shutty
@topcities @stephensonsco Thanks! Do you have any tips for improving our messaging?
Kenny
@noajshu @stephensonsco Sure. 1. Don't make users sign up before trying it. Let users play with it a bit then ask them to register. 2. The video demo has many usability issues. The search box insinuates a user can search for a video, not text within the video. The pink buttons feel like they let users paginate to the next video search result. Might be better if you just made a video explaining the value proposition right above the fold. 3. I didn't realize there was more stuff below the video, because I got stuck at the video section and left shortly after because I didn't really get it right away. 4. The slogan didn't hook me because I didn't get it right away. I was just scanning so probably that's the reason. Perhaps make it even simpler for the average joe to get it. Something like "Search through speech within videos" might be easier to understand. You can always explain in more detail later after the user is hooked.
Janardhan
Nice. Searches speech but wont accept speech as input :)
Scott Stephenson
@nj_raju I know right?! This is something we want to add soon. Getting it working on different platforms while not frustrating users is a challenge, though.
Tom Charde
Not working on Firefox 44.0.2.
Imran Ghory
Speech analysis is something I've been looking at recently for a fintech company but from testing the standard API transcription services (those from IBM, AT&T, etc.) the quality hasn't been great. Is exposing raw transcript something you're looking at, even if it's a probabilistic transcript ? - search is one use case but we've also got other use cases for which we'd want raw data (predicting conversion, risk, fraud, etc.)
Scott Stephenson
@imranghory We definitely can expose the transcript (there's a DeepGram API call for that), but the error rate in the transcript is highly dependent on the input audio quality (better quality audio has better transcripts, phone calls in noisy cars don't). However, our analysis techniques don't rely solely on the transcript being perfect, which is a feature that really sets us apart—especially for medium to poor quality audio. On prediction: We've built AI prediction layers to do what you are mentioning (predicting outcomes) but we don't rely on the text in the transcript being perfect, we build it on top of our fuzzy key phrase search. Contact us if you need that sort of thing!
Imran Ghory
@stephensonsco our current dataset is recorded phone calls in mp3 so the greatest, most models that are trained on phone data (i.e. dealing with narrowband) fail on our data due to the lossy compression on mp3. We're looking at switching to uncompressed call recording though. What's the pricing structure for search and the API ?
Noah Shutty
@imranghory @stephensonsco We'd be happy to talk about our pricing and set up a demo for your call audio! Shoot us a quick message at deepgram.com/contact and we'll be in touch.
Kartik Parija
Hey this is really interesting and given that we @adorilabs work on audio experiences; can totally see phenomenal use cases around search, annotation, highlighting etc within audio. This is really close to our heart, so putting this on our watchlist of potential use/collaboration. Congrats @noajshu @stephensonsco and team.
Noah Shutty
@kartikparija @adorilabs @stephensonsco Thanks Kartik! Checking out Adori now.
Kartik Parija
@noajshu @stephensonsco Oh our website is *ahem* vague. Ignore it please. We are busy building product and will update it soon. But would love to connect directly and tell you more. Since my original comment, there has already been an animated discussion within our small team about how we can use Deepgram!!! 😄
Noah Shutty
@kartikparija @stephensonsco We'd love to get in touch! send us a message at deepgram.com/contact :D
Kartik Parija
@noajshu @stephensonsco Did that. Look forward to connecting directly. And I love @rrhoover suggestion that you connect with @annewootton This came up in our internal discussion about Deepgram as well. She and her team are doing some smashing work with podcast and radio transcription.
Angad Singh
@kartikparija @adorilabs Would encourage you to also check out (https://www.producthunt.com/post...). We have a bunch of audio producers using the product already :)
Uri Eliabayev
Wow it can help some many people out there! Can you please explain about the technical techniques that you used? I can get from the name of the company that you use Deep learning in some way, but i will be happy to hear more about it. thanks!
Bill Doerrfeld
If I was Youtube, I would buy you.
Scott Stephenson
@doerrfeldbill We were thinking of buying them.
Pedro Ruíz
Very interesting concept, what's the hardest challenge you've faced when dealing with accents? I can see an application in Education when searching online and being able to find results from KhanAcademy or TED right on my search without the need for transcription.
Noah Shutty
@piero_ruiz Definitely agree--navigating recorded lectures is a huge pain point we want to tackle. For DeepGram to handle accents, we had to train it with a large audio corpus created by many different speakers. Sometimes it's still helpful to 'type out' how it sounds--e.g.: 'wah ter mell un' instead of 'watermelon'
Scott Stephenson
@piero_ruiz TED and Khan academy are great applications! If you know some peeps working on those products, definitely send 'em our way ;). We started working on DeepGram while indexing our lives in college (@noajshu made a wearable -- lectures were a big impetus!) so we definitely see the value of being able to discover and navigate content like this.
Tom Pryor
@stephensonsco Tom from the product team at Khan Academy here. Definitely thought the same as @imranghory when I saw this - would be keen to chat about it more
Noah Shutty
@thomaspryor @stephensonsco @imranghory We'd be keen as well! I'll reach out shortly.
Ben Myers
Not sure what's up but no matter what I search, it only plays back Hoover Dam related video. I'm on Chrome 49 Mac OSX
Scott Stephenson
@iamhabitat Hey Ben! Thanks for the feedback. Are you searching on our homepage? That demo is only for searching within that single creative commons video about the Hoover Dam—we don't yet provide search that is akin to Google for Audio/Video (but we certainly work on it). You can search through other files files (YouTube videos, your own recorded memos, things like that) by creating an account and uploading them to the DeepGram console. Let us know if you have more feedback (also, we have a slack channel—https://www.hamsterpad.com/chat/...!)
Manoj Shinde
@stephensonsco Does search API also work in identifying the keywords in video's and start the video from a particular keyword ?
Scott Stephenson
@msolstice If you search for a term, then the API takes you to the best match in the file. So, yep, the API can definitely be used to navigate around video & audio based on keywords/phrases. You could make a fifteen-hour-long video scavenger hunt if you wanted—oh no, ideas for a demo.
Mo
Hey guys, this is amazing!!! I think this will be very useful for analysing sporting videos. For example, analyzing a football/soccer match. Normally football clubs have teams that watch the opponents matches and manually record stats of different situations. For example, number of touches for a player (how many times the player name is mentioned), fouls, red cards, offsides, mistakes and so on. Imagine having your tech do this automatically based on the parameters that the team wants! I'd be interested in helping out if you decide to work on this use case :). Cheers.
Noah Shutty
@burrewoo Whoa--this is a use case we never thought of. Analytics for football competitors. I'd love to talk more about this. Are the fouls, red cards, etc. marked by specific phrases?
wahome
@burrewoo i can see this working as an extra layer/aide to scouting, with proper training, perhaps not a replacement. think there are too many variables and limitations within normal videos that might limit the output value. i have an active interest in the footie+data space - happy to chat if you're a keen fan
Mo
@noajshu Hi Noah, awesome :), you can follow me on Twitter @burrewoo. Some of the actions are but not all of them such as red cards and major events like scoring a goal, controversial goals (offside/foul causing a goal) and so on ...
Omri Shabi
Nice concept. What's your next milestone?
Scott Stephenson
@omrishabi I bet @noajshu and I will have a couple things we want to get done! We love to make the product more accurate and faster. A big part of bumping up accuracy is training our neural networks to be resilient to different types of audio. Got a jackhammer going off in the background audio? We want to make sure that has no effect on the search performance. Also, searching massive datasets is a challenge that we actively pursue. Right now we can search through millions of minutes in a second, but why not billions?
Noah Shutty
@stephensonsco @omrishabi definitely agree on these two goals. Also want to improve the content navigation UI and crawl media data on the web.
Hicham Tahiri
As a Voice enthousiast, I really love what you guys are doing ! Best of luck !