Tuesday, August 14, 2018

Brief run through CT3 Interview questions

A brief run through CT3 Interview questions, Points to remember and Quick tips for the exam. 


Q. What is the difference between data and information? 

Ans: Data is facts or figures from which conclusions can be drawn. When the data is processed and transformed in such a way that it becomes useful to the users, it is known as ‘information’. For example, the weight of each individual in your classroom is data, whereas, the number of people in each weight category is information.

Q. What is a random variable?

Ans: Random (associated with a probability) variable (it takes different values). To put it neatly, it is a variable whose value is subject to variations due to chance. 
If the random variable can take a countable number of distinct values, then it is termed as a discrete random variable. For example, consider tossing of two coins and consider the random variable, X to be the number of heads observed. The possible values taken by the random variable are 0, 1, and 2 which is discrete.

If the random variable can take an infinite number of values in an interval, then it is termed as a continuous random variable. For example, the height and weight of the students in a class, annual sales of a firm, the temperature of a city.

Q. What are generating functions? 

Generating functions provide a neat way of working out various properties of probability distributions without having to use integration repeatedly. They can be used to find mean, variance, higher moments of a probability distribution, distribution of a linear combination of independent random variables and determining properties of compound distributions.

Q. What is the difference between probability generating function (PGF) and moment generating function (MGF)?

Ans: The names give the game away: PGFs are used to generate probabilities, MGFs are used to generate moments.
A probability generating function (PGF) can be used to generate a set of probabilities, namely the probabilities associated with the values 0, 1, 2, 3, … assumed by a counting variable which assumes non-negative integer values.
A moment generating function (MGF) can be used to generate moments of the distribution of a random variable (discrete or continuous).

Q. Explain the concept of the p-value in layman terms?

Ans: Suppose a restaurant claims that their delivery times are 30 minutes or less on average but you think it’s more than that. You conduct a hypothesis test because you believe the null hypothesis, Ho, that the mean delivery time is 30 minutes max, is incorrect.
Your alternative hypothesis (H1) is that the mean time is greater than 30 minutes. You randomly sample 100 delivery times and observe that delivery times are more than 30 minutes only twice. So your p-value (probability value) turns out to be 0.02, which is less than your significance level, 0.05. In real terms, there is a probability of 0.02 that you will mistakenly reject the pizza place’s claim that their delivery time is less than or equal to 30 minutes.

Q. What do you mean by 95% confidence interval?

Ans: Confidence interval tells you how confident you can be that the results from a poll or survey reflect what you would expect to find if it were possible to survey the entire population. Confidence intervals are intrinsically connected to the confidence level. Confidence levels are expressed as a percentage (for example, a 95% confidence level). It means that should you repeat an experiment or survey over and over again, 95% of the time your results will match the results you get from a population.
For example, if we measure the heights of 40 randomly chosen men and get a mean height of 175 cm and a standard deviation of 20 cm. Suppose the 95% confidence interval is (168.8,182.2), then it means that 95% of experiments like we just did will include the true mean, but 5% won’t.

Q. What is the difference between t-test and ANOVA?

Ans: When the population means of only two groups are to be compared, the t-test is used, but when means of more than two groups are to be compared, ANOVA is used.

Q. What is the use of R2 in regression?

Ans: R-squared is a goodness-of-fit measure for linear regression models. It indicates the percentage of the variation in the dependent variable that the independent variables explain collectively.
It’s said that practice makes a man perfect, but in reality, no one is perfect. Look at some commonly made mistakes and some important points to remember while attempting the exam: CLICK HERE
For further studies, latest updates or interview tips on data science and machine learning, subscribe to our emails.

Reference: CT3 Probability And Mathematical Statistics Stepup Analytics