OpenAI code generation breakthrough
May. 22nd, 2020 08:20 pmIn this video Microsoft CTO is interviewing OpenAI CEO starting from 25:00 mark (right before this mark he is talking about a huge computer system Microsoft created for OpenAI; the style of this overall Microsoft video does feel quite weird to my taste, but this fragment with Sam Altman is good):
twitter.com/matvelloso/status/1263193089310461952
At about 29:00 mark OpenAI demos their new transformer-based code-generating system trained on a large subset of GitHub. I'd say, it's quite impressive, it does feel like a breakthrough in coding-assisting tools. Some discussion here:
news.ycombinator.com/item?id=23250379
Generally speaking, people are saying lately that large modern transformer models only pretend to be sequence-to-sequence, but in reality they learn tons of structured linguistic information, see e.g. this informal essay-style paper and references therein:
arxiv.org/abs/2005.06420 "The Unstoppable Rise of Computational Linguistics in Deep Learning"
(This is not yet a artificial junior software engineer one can hire, but this OpenAI prototype is a considerable step in that direction. May 20, 2020 will be remembered as an important milestone.)
twitter.com/matvelloso/status/1263193089310461952
At about 29:00 mark OpenAI demos their new transformer-based code-generating system trained on a large subset of GitHub. I'd say, it's quite impressive, it does feel like a breakthrough in coding-assisting tools. Some discussion here:
news.ycombinator.com/item?id=23250379
Generally speaking, people are saying lately that large modern transformer models only pretend to be sequence-to-sequence, but in reality they learn tons of structured linguistic information, see e.g. this informal essay-style paper and references therein:
arxiv.org/abs/2005.06420 "The Unstoppable Rise of Computational Linguistics in Deep Learning"
(This is not yet a artificial junior software engineer one can hire, but this OpenAI prototype is a considerable step in that direction. May 20, 2020 will be remembered as an important milestone.)
no subject
Date: 2020-05-23 01:15 am (UTC)Wow. That's about the first part (MSFT)Now I feel like it's a bullshit. The guys have a huge code database, with comments (maybe some written by the "data engineering team").
And then they "translate" from English to Python, using that corpus. Then they find several examples that worked, and show them to toe public.
It has nothing to do with programming.
But another wow. That's about computational linguistics. When I talked with Dima Gensel, he was adamant regarding using any linguistics at all, just stats. Well, ok, that was his PhD, so. It kind of worked. Except that it worked after it was repaired, I guess.
Cool, cool.
no subject
Date: 2020-05-23 02:53 am (UTC)He did not have anything to gain from showing something which was totally dead-end, and a reputation to be damaged. So, based on his personal track record, I think it's real (which does not mean that it is ready-ready for people to use in production; but it's unlikely to be just PR fake either; with OpenAI track record, it'll probably become something ready quite soon).
Yes, my judgement here is based on who Sam Altman is, not just on the demo itself.
no subject
Date: 2020-05-23 03:09 am (UTC)no subject
Date: 2020-05-23 04:08 am (UTC)no subject
Date: 2020-05-23 04:37 am (UTC)