Pull to refresh
56.42

GitHub *

Web-service for hosting and development of IT projects

Show first
Rating limit
Level of difficulty

C# or Java? TypeScript or JavaScript? Machine learning based classification of programming languages

Reading time6 min
Views1.5K
GitHub hosts over 300 programming languages—from commonly used languages such as Python, Java, and Javascript to esoteric languages such as Befunge, only known to very small communities.


Figure 1: Top 10 programming languages hosted by GitHub by repository count 

One of the necessary challenges that GitHub faces is to be able to recognize these different languages. When some code is pushed to a repository, it’s important to recognize the type of code that was added for the purposes of search, security vulnerability alerting, and syntax highlighting—and to show the repository’s content distribution to users.

Linguist is the tool we currently use to detect coding languages at GitHub. Linguist a Ruby-based application that uses various strategies for language detection, leveraging naming conventions and file extensions and also taking into account Vim or Emacs modelines, as well as the content at the top of the file (shebang). Linguist handles language disambiguation via heuristics and, failing that, via a Naive Bayes classifier trained on a small sample of data. 

Although Linguist does a good job making file-level language predictions (84% accuracy), its performance declines considerably when files use unexpected naming conventions and, crucially, when a file extension is not provided. This renders Linguist unsuitable for content such as GitHub Gists or code snippets within README’s, issues, and pull requests.

In order to make language detection more robust and maintainable in the long run, we developed a machine learning classifier named OctoLingua based on an Artificial Neural Network (ANN) architecture which can handle language predictions in tricky scenarios. The current version of the model is able to make predictions for the top 50 languages hosted by GitHub and surpasses Linguist in accuracy and performance.
Read more →
Total votes 6: ↑5 and ↓1+4
Comments0

A drawing bot for realizing everyday scenes and even stories

Reading time6 min
Views1.5K

Drawing bot


If you were asked to draw a picture of several people in ski gear, standing in the snow, chances are you’d start with an outline of three or four people reasonably positioned in the center of the canvas, then sketch in the skis under their feet. Though it was not specified, you might decide to add a backpack to each of the skiers to jibe with expectations of what skiers would be sporting. Finally, you’d carefully fill in the details, perhaps painting their clothes blue, scarves pink, all against a white background, rendering these people more realistic and ensuring that their surroundings match the description. Finally, to make the scene more vivid, you might even sketch in some brown stones protruding through the snow to suggest that these skiers are in the mountains.


Now there’s a bot that can do all that.

Read more →
Total votes 5: ↑4 and ↓1+3
Comments0

Introducing GitHub Package Registry

Reading time2 min
Views2.1K

Today, we’re excited to introduce GitHub Package Registry, a package management service that makes it easy to publish public or private packages next to your source code.


GitHub Package Registry is fully integrated with GitHub, so you can use the same search, browsing, and management tools to find and publish packages as you do for your repositories. You can also use the same user and team permissions to manage code and packages together. GitHub Package Registry provides fast, reliable downloads backed by GitHub’s global CDN. And it supports familiar package management tools: JavaScript (npm), Java (Maven), Ruby (RubyGems), .NET (NuGet), and Docker images, with more to come.


You can try GitHub Package Registry today in limited beta. It will always be free to use for open source—more pricing details will be announced soon.


Sign up for the beta


Announcing GitHub Package Registry

Read more →
Total votes 9: ↑8 and ↓1+7
Comments0

Announcing the Open Sourcing of Windows Calculator

Reading time2 min
Views1K

Today, we’re excited to announce that we are open sourcing Windows Calculator on GitHub under the MIT License. This includes the source code, build system, unit tests, and product roadmap. Our goal is to build an even better user experience in partnership with the community. We are encouraging your fresh perspectives and increased participation to help define the future of Calculator.


Image of Windows Calculator

Read more →
Total votes 8: ↑8 and ↓0+8
Comments0

Ethicality of automatic contributions

Reading time2 min
Views1.8K
Hey Habr! Today I would like to talk about ethics, namely ethics in the professional field. It will be a question of services that perform 'fake' (automated human-like) activity and of those doubts in which they can result both the ordinary ordinary user, and the professional of development sphere.



So, let's start. What I mean by the phrase «fake activity» is not difficult to guess: it is the manipulation and compromising of the data that are responsible for the indicator of your activity, or more simply, of actions on the Internet. With this, of course, every one of you who used social networks at least once came across: Facebook, Instagram, and so on.

I will describe this scheme on the example of Instagram: each person has his own account, and for developers API access is provided. And what did we do? We started to launch bots that can perform all sorts of activity through a person’s account (such as like, subscribe, comment on other people's posts, or even independently manage their (or owners) page, for example @neuralcat ). And soon this opportunity began to be actively used in the business sphere. Attracting a new audience by targeting according to certain criteria and carrying out activity on their page. Everything would be fine, but over time it went beyond all limits. Every day dozens of incomprehensible accounts like your photos, leave spam comments, tag you on advertising posts and so on.

Bot activity has gone beyond all limits of prudence that today Instagram closes its API, and here is one of the reasons: “Most of the services that work with auto-posting, likes and OML-like likes — do it through private api — login / password, but not through the official API.”
Read more →
Total votes 18: ↑17 and ↓1+16
Comments0

Configure the development environment for learning HTML, CSS, PHP in Windows

Reading time8 min
Views3.6K

Configure the development environment for learning HTML, CSS, PHP in Windows.


(article is published on linkedin.com (Alexander Panov) )


web - developer PHPStorm and Vagrant


Beginning programmers always ask the question, where to start learning about creating websites? Search on the Internet gives a lot of various information, but unfortunately only more confuses novice developers. Having more than 5 years of teaching experience, I have seen many times how students, making breaks in computer games and deciding to become web developers, tried to customize development environments. They began to sort out the programs for layout. They grabbed Sublime Text, Notepad ++, Visual Studio Code, NetBeans or PHPStorm. We tried to install on our local computer WAMP (stands for Windows, Apache, MySQL and PHP) platforms, such as Denver, Xampp, Open Server and many others. But since there is no systematic knowledge in the head of a novice developer, there is no experience of customization, it was rare for some of them to set up a convenient development environment, and whoever got it, understood that it became more difficult to play, since many additional programs, smoothness in the game.


This article was born because of explaining for many years to each student how to do everything correctly and conveniently.


Once again I want to say that we will set up a working environment in the Windows system to study the creation of sites in HTML, CSS and their links to PHP. We will call this computer working, so that beginning web developers get used to the idea that their computer is their source of income.


Begin

Читать дальше →
Total votes 19: ↑17 and ↓2+15
Comments0
2

Authors' contribution